The Crypto AI Benchmark Alliance (CAIBA) is excited to receive a first batch of high-quality, community-generated crypto questions from our founding-member Surf. Collected through Surf’s public waitlist and selected from 30,000+ entries, these real-world inquiries now form part of CAIBA’s open benchmark suite for evaluating autonomous agents in crypto.
Authentic test material – Every question comes from an active crypto user, reflecting real-world demand rather than hypothetical examples. This matters because benchmark scores now offer a clearer view of how well an autonomous agent can answer the kinds of questions traders, researchers, and builders actually face in production.
Stronger evaluation signals – User-submitted questions help surface both subtle reasoning gaps and real strengths in how models retrieve data, plan tasks, and execute actions. Since each question has a clear answer or a verifiable onchain source, it's easier to measure how a model compares to a human. This lets teams pinpoint exactly where agents struggle, instead of relying on vague accuracy scores.
Accelerated agent R&D – Consistent, domain-specific scores allow researchers to compare different model architectures under the same conditions. This speeds up feedback loops, making it faster to improve and deploy agents in wallets, exchanges, or governance platforms. In short, better benchmarks lead to clearer insights and a faster path from prototype to reliable production use.
Impact | What it means for builders |
---|---|
Authentic signal | Questions come from the people crypto agents serve. |
Evolving coverage | Topics shift with the market and datasets must consistently update to reflect this. |
Better agents, faster | Models evaluated on real user pain-points learn to plan, reason, and execute onchain with higher precision. |
Surf is providing an extra 750pts on leaderboard and CAIBA wanted to do more than hand out points:
Surf’s waitlist is open at ask.surf ahead of its July early access launch. After you sign up:
Need inspiration? Here is one community example that have already accepted:
“Calculate the potential token rewards for staking 100 ETH on Lido Finance for 12 months, assuming:
Current stETH/ETH ratio is 0.975
Average annual APR is 4.5%
Compound interest applied quarterly Include projected rewards in both ETH and stETH”.
from X handle @SureHarvesst
Submit your own. If it makes the cut, you’ll climb the Surf leaderboard and shape benchmarks that will advance AI’s performance on crypto-specific task.
A community-first approach is foundational in crypto crypto. It enables open data, shared upside, and permissionless innovation. We’re thrilled to welcome news participants to CAIBA and can’t wait to see your question.