Looking for experts to help build your AI products/features? Visit Jaseci Labs


Small Language Model Evaluation Tool

Chandra Irugalbandara, Ashish Mahendra, Roland Daynauth, Tharuka Kasthuri Arachchige, Krisztian Flautner, Lingjia Tang, Yiping Kang, Jason Mars

Jaseci Labs, University of Michigan

The SLaM is a helper tool to evaluate the performance of Large Language Models (LLMs) for your personal use cases with the help of Human Evaluation and Automatic Evaluation. You can deploy the application on your local machine or use Docker to generate responses for a given prompt with different LLMs (Proprietary or OpenSource), and then evaluate the responses with the help of human evaluators or automated methods.


SLaM: Small Language Model Evaluation Tool

Admin Panel

Set up the Human Evaluation UI and manage the human evaluators.
  • Ability to change how the Human evaluators will see the information
  • Ability to change different attributes of Eval Config (which will be used in the human evaluation and also in the auto evaluator after)

Realtime Insights and Analytics

Get insights and analytics on the performance of the LLMs.
  • Cost Analysis
  • Performance Analysis in different Metrics (ELO, Markov)
  • Consensus Analysis
  • Realtime Human Evaluation Progress

Human Evaluation

Evaluate the LLMs' responses with human evaluators' help.
  • Simple User Friendly UI for easy but precise execution of human evaluation
  • Time Tracking
  • Feedback Tracking

Automatic Evaluation

Evaluate the responses of the LLMs with the help of LLMs and using embedding similarity.
  • We use LLM as an Evaluator to automate the human evaluation process
  • The semantic similarity module assesses the proximity of each model to the anchor model, gauging their closeness in terms of semantic properties.

Multiple Model Support

Generate responses for a given prompt with different LLMs (Proprietary or OpenSource(Ollama)
  • OpenAI/Cluade/Groq etc.
  • Ollama Models (Llama, COdellama, etc.)

Multiple Evaluation Methods

  • A/B Testing
  • A/B Testing with Criteria
  • More are coming

Begin building your next AI product today