Is Claude Dumb Today?

Daily HumanEval-CC40 benchmark for Claude Code's default model

...

Loading latest results…

Score
 
Model
 
Cost
 
Runtime
 

Score History (last 30 days)

Per-Task Results

Task Function Result Attempts Turns Cost Error
Loading…