EVALS · 12 defined
Evals
agent-harness
| Eval | Category | Difficulty | Runs | Pass Rate | Avg Score | Trend | |
|---|---|---|---|---|---|---|---|
| add-dashboard-route | dashboard | medium | 1 | 100% | 1.00 | ||
| add-db-column | database | medium | 1 | 100% | 1.00 | ||
| add-mcp-tool | mcp-server | medium | 1 | 100% | 1.00 |
homelab
| Eval | Category | Difficulty | Runs | Pass Rate | Avg Score | Trend | |
|---|---|---|---|---|---|---|---|
| add-ci-workflow | — | — | 0 | — | — | — | |
| add-deployment-repo-app | — | easy | 1 | 100% | 1.00 | ||
| add-helm-service | — | medium | 1 | 0% | 0.94 | ||
| add-infisical-secret | — | medium | 1 | 100% | 1.00 | ||
| add-internal-service | — | easy | 2 | 100% | 1.00 | ||
| add-observability-service | — | hard | 1 | 100% | 1.00 | ||
| fix-application-manifest | — | medium | 1 | 100% | 1.00 | ||
| fix-appproject | — | medium | 1 | 100% | 1.00 | ||
| new-dockerfile | — | easy | 1 | 100% | 1.00 |