Evaluate model performance: run benchmarks with lighteval, serve models with vLLM, compare results across models, and create evaluation tables.

When to use this skill

Evaluating a model on benchmarks
Comparing model performance
Setting up vLLM for inference

Core concepts

This skill provides guidance on hf model evaluation best practices, patterns, and common pitfalls. It is designed to be loaded on demand when a relevant task is detected.

Installation

curl -LO https://opencode-skills.example/downloads/ai-ml/hf-evaluation.zip
unzip hf-evaluation.zip -d ~/.config/opencode/skills/

Restart OpenCode — the skill loads automatically.

When it triggers

evaluating a model on benchmarks
comparing model performance
setting up vLLM for inference

HF Model Evaluation

When to use this skill

Core concepts

Installation

When it triggers