← Catalog

No. 071 · ai-ml

HF Model Evaluation

Evaluate models with vLLM and lighteval

Version 1.0.0 License MIT Format SKILL.md

Evaluate model performance: run benchmarks with lighteval, serve models with vLLM, compare results across models, and create evaluation tables.

When to use this skill

  • Evaluating a model on benchmarks
  • Comparing model performance
  • Setting up vLLM for inference

Core concepts

This skill provides guidance on hf model evaluation best practices, patterns, and common pitfalls. It is designed to be loaded on demand when a relevant task is detected.

Installation

curl -LO https://opencode-skills.example/downloads/ai-ml/hf-evaluation.zip
unzip hf-evaluation.zip -d ~/.config/opencode/skills/

Restart OpenCode — the skill loads automatically.

When it triggers

  • evaluating a model on benchmarks
  • comparing model performance
  • setting up vLLM for inference