📏

Hugging Face Evaluation

available

Add and manage evaluation results in model cards — extract eval tables, import benchmark scores, and run custom evaluations with vLLM.

ai-ml

What This Skill Does

Add and manage evaluation results in model cards — extract eval tables, import benchmark scores, and run custom evaluations with vLLM.

Capabilities

  • Eval Table Extraction — Extract evaluation tables from model READMEs
  • Benchmark Import — Import scores from Artificial Analysis API
  • Custom Evaluations — Run evaluations with vLLM and lighteval
  • Model-Index Format — Structure results in standard metadata format

How It Works

  1. Select a model card to add evaluations to
  2. Import existing benchmarks or run new evaluations
  3. Results are structured in model-index metadata format
  4. Evaluation tables are added to the model card

Example Usage

"Add MMLU scores to this model card"
"Import benchmarks from Artificial Analysis"
"Run a custom evaluation on my fine-tuned model"