How the cofounder of Zapier recruited me to run a $1M AI competition
"We gotta blow this up."
That's what Mike Knoop (co-founder of Zapier) says to me in early 2024.
"ARC-AGI, we gotta make it huge. It's too important."
"Wait, ARC? What are you talking about?" I quickly reply.
"It's the most important benchmark and unsolved problem in AI, its solution will change the world."
Before I knew it, I was on a squad of 4 people traveling to the top AI programs in the world, running a $1M competition, and finding out why current benchmarks are inadequate to lead us towards AGI.
From left to right: Greg Kamradt, Mike Knoop, Francois Chollet, Bryan Landers at a16z NY
Let's take a step back.
Meeting Mike
I saw Mike speak about Zapier's strategy for internal AI adoption.
If you have 1,000 employees, and you want them to be "AI enabled", how would you do it? Mike had the answer.
I reached out to him to see if he was up for a recorded conversation.
We had a great chat. It was on of my favorite interviews. He shared how Zapier earns an extra $100K ARR per month (!) via AI. Wild.
We kept in touch ever since.
Starting ARC Prize
"Hey, I have a project I'm starting and need someone to help run it, interested?" - Mike DM's me one day
Naturally when you hear something like that from someone like Mike you want to hear more.
"I'm starting a $1M competition to solve the ARC-AGI benchmark. Francois (creator of ARC-AGI) is on board. I need someone to help run it."
"...Yes! I'm in."
I couldn't pass up the opportunity to work with Mike & Francois on a small team of highly motivated, skilled, and autonomous people. He teamed me up with the amazing Bryan Landers to help run the competition.
The competition was modeled after the successful Scroll Prize by Nat Friedman and Daniel Gross. Prize money is a great incentive to spark action from people and teams who wouldn't have otherwise participated.
On launch day we had Francois/Mike featured on the Dwarkesh podcast and threw a mini launch party.
ARC Prize Launch Party
Wait, what is so important about an AI benchmark?
Before we go further, let's talk about what makes this benchmark special and why it deserves a competition.
Benchmarks are like tests for AI. They tell us not only how good an AI system is, but what kind of "good" it is.
Benchmarks also serve as a compass for AI research. They tell us what directions to explore. Coming up with a good benchmark is hard.
ARC-AGI is an AI benchmark, but it's sort of a meta benchmark.
You see, it doesn't test for skill (how well an AI system does on any one test), but rather for skill acquisition. How effectively can your system learn new things outside of its training data?
An example. The famous chess program, Deep Blue, beat the world #1 chess player, Garry Kasparov, in 1997. So that means Deep Blue is intelligent right?
Well, not exactly. It depends how you define intelligence.
Francois defines intelligence as, "skill-acquisition efficiency" or simply, how quickly can your system learn new skills. (If you want to go deep on this definition of intelligence I can't recommend reading the first 25 pages of Francois's paper On The Measure Of Intelligence enough.)
While Deep Blue is very good at chess, it can't play checkers, go, any other game or make me a coffee in a random kitchen. We'd argue by Francois's definition of intelligence that Deep Blue has high skill, but low intelligence.
How do you formally measure a system's ability to learn new skills? That is where ARC-AGI comes in.
Because each task is unique and requires different skills from other tasks, the system that solves ARC-AGI will have had no choice but to learn a new combination of skills along the way. The AI that beats this will have generalized to a new set of skills outside of its training data.
Importantly, ARC-AGI is easy for humans (because we have general intelligence), but hard for AI.
If you haven't seen an ARC task before, check out arcprize.org/play.
Can you spot the "rule" that maps the input to the output? (Hint: Where does the yellow fill in?)
The whole goal is to look at the 3 input/output "training" pairs, find the "rule" or "program" that turns the input into the output, and then apply that rule to the 4th "test" input. Goal: Correctly predict the test output.
If you want to find out more about ARC-AGI, technical approaches, or how to compete for the prize, check out the ARC Prize website.
Behind the scenes of the Dwarkesh podcast
How do you run an AI competition?
So what goes into running an AI competition? Here's what the team and I have been up to:
- Website - Bryan Landers did the amazing design and frontend work of the ARC Prize website
- Technical guide - We put together a quick start technical guide for approaches, submitting to the competition, and data FAQs
- Data Management - The ARC-AGI dataset requires updating, cleaning
- Testing Latest - People want to know how ARC-AGI does on new models
- Largest competition on Kaggle - We partner with Kaggle to run the competition. It's the largest one with over $1M and 10K submissions (and counting!)
- Public Leaderboard - For submissions which aren't eligable for Kaggle, we work to validate, verify, and publish results. See Ryan Greenblatt's submission for an example
- Podcasts, videos, speaking - We partner with podcasts to get the word out. Like Dwarkesh, Machine Learning Street Talk, No Priors
- Community Management - There is a large community of ARC-AGI competitors. We manage the Discord, Twitter, Kaggle, YouTube, Newsletter and Blog.
- Mindshare - Blog posts about hot topics like o1 testing
- University Tour - We've toured the top AI programs in the world with Mike & Francois to talk about the competition & ARC-AGI
- Analytics - Treating our competition like a start up and analyzing the funnel
One of the ARC Prize Analytics Dashboards. Each dot is a submission on Kaggle.
What This Means for Me
- Mission Driven is fun - I traditionally haven't done mission driven work in my career. It has been refreshing to join a non-profit and embrace the "open source work for humanity" angle. Having an alternative northstar (beside revenue) is nice.
- AI Theory Change Of Pace - My AI work has typically spanned the API layer through customer value and applications. ARC Prize has been a nice change of pace to engage with AI's more fundamental concepts.
- Small teams are great - After doing so much solo work, I re-realized how great it is being a part of a small team.
But in the end, the thing that drives me is the potential for my contribution to ARC Prize to have a meaningful impact on the world. The solution to ARC-AGI won't be proper AGI, but there will be an aspect of the solution might be the "1-2 missing pieces" of AGI.
If you want to hear more about what is like to run a competition, collaborating with AI programs, or why ARC-AGI is a worthy AI benchmark, feel free to reach out.
Written by
Greg Kamradt
At
Mon Oct 21 2024