📏
Hugging Face Evaluation
availableAdd and manage evaluation results in model cards — extract eval tables, import benchmark scores, and run custom evaluations with vLLM.
ai-mlWhat This Skill Does
Add and manage evaluation results in model cards — extract eval tables, import benchmark scores, and run custom evaluations with vLLM.
Capabilities
- Eval Table Extraction — Extract evaluation tables from model READMEs
- Benchmark Import — Import scores from Artificial Analysis API
- Custom Evaluations — Run evaluations with vLLM and lighteval
- Model-Index Format — Structure results in standard metadata format
How It Works
- Select a model card to add evaluations to
- Import existing benchmarks or run new evaluations
- Results are structured in model-index metadata format
- Evaluation tables are added to the model card
Example Usage
"Add MMLU scores to this model card"
"Import benchmarks from Artificial Analysis"
"Run a custom evaluation on my fine-tuned model"