Chandra Irugalbandara, Ashish Mahendra, Roland Daynauth, Tharuka Kasthuri Arachchige, Krisztian Flautner, Lingjia Tang, Yiping Kang, Jason Mars
Jaseci Labs, University of Michigan
The SLaM is a helper tool to evaluate the performance of Large Language Models (LLMs) for your personal use cases with the help of Human Evaluation and Automatic Evaluation. You can deploy the application on your local machine or use Docker to generate responses for a given prompt with different LLMs (Proprietary or OpenSource), and then evaluate the responses with the help of human evaluators or automated methods.
Features
SLaM: Small Language Model Evaluation Tool
Admin Panel
Set up the Human Evaluation UI and manage the human evaluators.
Ability to change how the Human evaluators will see the information
Ability to change different attributes of Eval Config (which will be used in the human evaluation and also in the auto evaluator after)
Realtime Insights and Analytics
Get insights and analytics on the performance of the LLMs.
Cost Analysis
Performance Analysis in different Metrics (ELO, Markov)
Consensus Analysis
Realtime Human Evaluation Progress
Human Evaluation
Evaluate the LLMs' responses with human evaluators' help.
Simple User Friendly UI for easy but precise execution of human evaluation
Time Tracking
Feedback Tracking
Automatic Evaluation
Evaluate the responses of the LLMs with the help of LLMs and using embedding similarity.
We use LLM as an Evaluator to automate the human evaluation process
The semantic similarity module assesses the proximity of each model to the anchor model, gauging their closeness in terms of semantic properties.
Multiple Model Support
Generate responses for a given prompt with different LLMs (Proprietary or OpenSource(Ollama)