harness/
EVALS · 12 defined

Evals

agent-harness
EvalCategoryDifficultyRunsPass RateAvg ScoreTrend
add-dashboard-routedashboardmedium1100%1.00
add-db-columndatabasemedium1100%1.00
add-mcp-toolmcp-servermedium1100%1.00
homelab
EvalCategoryDifficultyRunsPass RateAvg ScoreTrend
add-ci-workflow0
add-deployment-repo-appeasy1100%1.00
add-helm-servicemedium10%0.94
add-infisical-secretmedium1100%1.00
add-internal-serviceeasy2100%1.00
add-observability-servicehard1100%1.00
fix-application-manifestmedium1100%1.00
fix-appprojectmedium1100%1.00
new-dockerfileeasy1100%1.00