SLaM

Small Language Model Evaluation Tool

Chandra Irugalbandara, Ashish Mahendra, Roland Daynauth, Tharuka Kasthuri Arachchige, Krisztian Flautner, Lingjia Tang, Yiping Kang, Jason Mars

Jaseci Labs, University of Michigan

The SLaM is a helper tool to evaluate the performance of Large Language Models (LLMs) for your personal use cases with the help of Human Evaluation and Automatic Evaluation. You can deploy the application on your local machine or use Docker to generate responses for a given prompt with different LLMs (Proprietary or OpenSource), and then evaluate the responses with the help of human evaluators or automated methods.

Features

SLaM: Small Language Model Evaluation Tool

Admin Panel

Set up the Human Evaluation UI and manage the human evaluators.

Ability to change how the Human evaluators will see the information
Ability to change different attributes of Eval Config (which will be used in the human evaluation and also in the auto evaluator after)

Realtime Insights and Analytics

Get insights and analytics on the performance of the LLMs.

Cost Analysis
Performance Analysis in different Metrics (ELO, Markov)
Consensus Analysis
Realtime Human Evaluation Progress

Human Evaluation

Evaluate the LLMs' responses with human evaluators' help.

Simple User Friendly UI for easy but precise execution of human evaluation
Time Tracking
Feedback Tracking

Automatic Evaluation

Evaluate the responses of the LLMs with the help of LLMs and using embedding similarity.

We use LLM as an Evaluator to automate the human evaluation process
The semantic similarity module assesses the proximity of each model to the anchor model, gauging their closeness in terms of semantic properties.

Multiple Model Support

Generate responses for a given prompt with different LLMs (Proprietary or OpenSource(Ollama)

OpenAI/Cluade/Groq etc.
Ollama Models (Llama, COdellama, etc.)

Multiple Evaluation Methods

A/B Testing
A/B Testing with Criteria
More are coming

SLaM

Features

Admin Panel

Realtime Insights and Analytics

Human Evaluation

Automatic Evaluation

Multiple Model Support

Multiple Evaluation Methods

Begin building your next AI product today

Bleeding Edge AI at Scale

Quick Links

Get in Touch

COPYRIGHT © 2022 - JASECI LABS, LLC.