ARC Prize

How the cofounder of Zapier recruited me to run a $1M AI competition

How the cofounder of Zapier recruited me to run a $1M AI competition

"We gotta blow this up."

That's what Mike Knoop (co-founder of Zapier) says to me in early 2024.

"ARC-AGI, we gotta make it huge. It's too important."

"Wait, ARC? What are you talking about?" I quickly reply.

"It's the most important benchmark and unsolved problem in AI, its solution will change the world."

Before I knew it, I was on a squad of 4 people traveling to the top AI programs in the world, running a $1M competition, and finding out why current benchmarks are inadequate to lead us towards AGI.

ARC Prize Team From left to right: Greg Kamradt, Mike Knoop, Francois Chollet, Bryan Landers at a16z NY

Let's take a step back.

Meeting Mike

I saw Mike speak about Zapier's strategy for internal AI adoption.

If you have 1,000 employees, and you want them to be "AI enabled", how would you do it? Mike had the answer.

I reached out to him to see if he was up for a recorded conversation.

Mike Email

We had a great chat. It was on of my favorite interviews. He shared how Zapier earns an extra $100K ARR per month (!) via AI. Wild.

We kept in touch ever since.

Starting ARC Prize

"Hey, I have a project I'm starting and need someone to help run it, interested?" - Mike DM's me one day

Naturally when you hear something like that from someone like Mike you want to hear more.

"I'm starting a $1M competition to solve the ARC-AGI benchmark. Francois (creator of ARC-AGI) is on board. I need someone to help run it."

"...Yes! I'm in."

I couldn't pass up the opportunity to work with Mike & Francois on a small team of highly motivated, skilled, and autonomous people. He teamed me up with the amazing Bryan Landers to help run the competition.

The competition was modeled after the successful Scroll Prize by Nat Friedman and Daniel Gross. Prize money is a great incentive to spark action from people and teams who wouldn't have otherwise participated.

On launch day we had Francois/Mike featured on the Dwarkesh podcast and threw a mini launch party.

ARC Prize Launch ARC Prize Launch Party

Wait, what is so important about an AI benchmark?

Before we go further, let's talk about what makes this benchmark special and why it deserves a competition.

Benchmarks are like tests for AI. They tell us not only how good an AI system is, but what kind of "good" it is.

Benchmarks also serve as a compass for AI research. They tell us what directions to explore. Coming up with a good benchmark is hard.

ARC-AGI is an AI benchmark, but it's sort of a meta benchmark.

You see, it doesn't test for skill (how well an AI system does on any one test), but rather for skill acquisition. How effectively can your system learn new things outside of its training data?

An example. The famous chess program, Deep Blue, beat the world #1 chess player, Garry Kasparov, in 1997. So that means Deep Blue is intelligent right?

Well, not exactly. It depends how you define intelligence.

Francois defines intelligence as, "skill-acquisition efficiency" or simply, how quickly can your system learn new skills. (If you want to go deep on this definition of intelligence I can't recommend reading the first 25 pages of Francois's paper On The Measure Of Intelligence enough.)

While Deep Blue is very good at chess, it can't play checkers, go, any other game or make me a coffee in a random kitchen. We'd argue by Francois's definition of intelligence that Deep Blue has high skill, but low intelligence.

How do you formally measure a system's ability to learn new skills? That is where ARC-AGI comes in.

Because each task is unique and requires different skills from other tasks, the system that solves ARC-AGI will have had no choice but to learn a new combination of skills along the way. The AI that beats this will have generalized to a new set of skills outside of its training data.

Importantly, ARC-AGI is easy for humans (because we have general intelligence), but hard for AI.

If you haven't seen an ARC task before, check out arcprize.org/play.

ARC Task Can you spot the "rule" that maps the input to the output? (Hint: Where does the yellow fill in?)

The whole goal is to look at the 3 input/output "training" pairs, find the "rule" or "program" that turns the input into the output, and then apply that rule to the 4th "test" input. Goal: Correctly predict the test output.

If you want to find out more about ARC-AGI, technical approaches, or how to compete for the prize, check out the ARC Prize website.

ARC Prize Website Behind the scenes of the Dwarkesh podcast

How do you run an AI competition?

So what goes into running an AI competition? Here's what the team and I have been up to:

ARC Prize Analytics One of the ARC Prize Analytics Dashboards. Each dot is a submission on Kaggle.

What This Means for Me

But in the end, the thing that drives me is the potential for my contribution to ARC Prize to have a meaningful impact on the world. The solution to ARC-AGI won't be proper AGI, but there will be an aspect of the solution might be the "1-2 missing pieces" of AGI.

If you want to hear more about what is like to run a competition, collaborating with AI programs, or why ARC-AGI is a worthy AI benchmark, feel free to reach out.

Written by

Greg Kamradt

At

Mon Oct 21 2024