Evaluate model performance: run benchmarks with lighteval, serve models with vLLM, compare results across models, and create evaluation tables.
When to use this skill
- Evaluating a model on benchmarks
- Comparing model performance
- Setting up vLLM for inference
Core concepts
This skill provides guidance on hf model evaluation best practices, patterns, and common pitfalls. It is designed to be loaded on demand when a relevant task is detected.
Installation
curl -LO https://opencode-skills.example/downloads/ai-ml/hf-evaluation.zip
unzip hf-evaluation.zip -d ~/.config/opencode/skills/
Restart OpenCode — the skill loads automatically.
When it triggers
- evaluating a model on benchmarks
- comparing model performance
- setting up vLLM for inference