SKILL · Ai Ml

📏

Hugging Face Evaluation

available

Add and manage evaluation results in model cards — extract eval tables, import benchmark scores, and run custom evaluations with vLLM.

What This Skill Does

Add and manage evaluation results in model cards — extract eval tables, import benchmark scores, and run custom evaluations with vLLM.

Capabilities

Eval Table Extraction — Extract evaluation tables from model READMEs
Benchmark Import — Import scores from Artificial Analysis API
Custom Evaluations — Run evaluations with vLLM and lighteval
Model-Index Format — Structure results in standard metadata format

How It Works

Select a model card to add evaluations to
Import existing benchmarks or run new evaluations
Results are structured in model-index metadata format
Evaluation tables are added to the model card

Example Usage

"Add MMLU scores to this model card"
"Import benchmarks from Artificial Analysis"
"Run a custom evaluation on my fine-tuned model"