Your Questions. Better Benchmarks

Your Questions. Better Benchmarks

Jun 26, 2025

1· Introduction

The Crypto AI Benchmark Alliance (CAIBA) is excited to receive a first batch of high-quality, community-generated crypto questions from our founding-member Surf. Collected through Surf’s public waitlist and selected from 30,000+ entries, these real-world inquiries now form part of CAIBA’s open benchmark suite for evaluating autonomous agents in crypto.

2· Why it matters

Authentic test material – Every question comes from an active crypto user, reflecting real-world demand rather than hypothetical examples. This matters because benchmark scores now offer a clearer view of how well an autonomous agent can answer the kinds of questions traders, researchers, and builders actually face in production.

Stronger evaluation signals – User-submitted questions help surface both subtle reasoning gaps and real strengths in how models retrieve data, plan tasks, and execute actions. Since each question has a clear answer or a verifiable onchain source, it's easier to measure how a model compares to a human. This lets teams pinpoint exactly where agents struggle, instead of relying on vague accuracy scores.

Accelerated agent R&D – Consistent, domain-specific scores allow researchers to compare different model architectures under the same conditions. This speeds up feedback loops, making it faster to improve and deploy agents in wallets, exchanges, or governance platforms. In short, better benchmarks lead to clearer insights and a faster path from prototype to reliable production use.

ImpactWhat it means for builders
Authentic signalQuestions come from the people crypto agents serve.
Evolving coverageTopics shift with the market and datasets must consistently update to reflect this.
Better agents, fasterModels evaluated on real user pain-points learn to plan, reason, and execute onchain with higher precision.

3· Rewards for the question authors

Surf is providing an extra 750pts on leaderboard and CAIBA wanted to do more than hand out points:

  • Public and lasting recognition – Top contributors will be credited on the public CAIA dataset page and the CAIBA site.
  • Academic authorship – The first community-contributed research paper describing the expanded benchmark will list qualifying handles as contributors or acknowledgements (subject to opt-in).

4· How you can still contribute

Surf’s waitlist is open at ask.surf ahead of its July early access launch. After you sign up:

  1. Go to “Drop your crypto question” section on the waitlist page.
  2. Write something specific, verifiable, and actionable based on your daily experience in crypto, be creative!!

Need inspiration? Here is one community example that have already accepted:

“Calculate the potential token rewards for staking 100 ETH on Lido Finance for 12 months, assuming: 

  • Current stETH/ETH ratio is 0.975

  • Average annual APR is 4.5%

  • Compound interest applied quarterly Include projected rewards in both ETH and stETH”.

    from X handle @SureHarvesst

Submit your own. If it makes the cut, you’ll climb the Surf leaderboard and shape benchmarks that will advance AI’s performance on crypto-specific task.

5· Looking ahead

A community-first approach is foundational in crypto crypto. It enables open data, shared upside, and permissionless innovation. We’re thrilled to welcome news participants to CAIBA and can’t wait to see your question.