Google DeepMind’s Game-Playing AI Tackles a Chatbot Blind Spot

Several years before ChatGPT began jibber-jabbering away, Google developed a very different kind of artificial intelligence program called AlphaGo that learned to play the board game Go with superhuman skill through tireless practice.

Researchers at the company have now published research that combines the abilities of a large language model (the AI behind today’s chatbots) with those of AlphaZero, a successor to AlphaGo also capable of playing chess, to solve very tricky mathematical proofs.

Their new Frankensteinian creation, dubbed AlphaProof, has demonstrated its prowess by tackling several problems from the 2024 International Math Olympiad (IMO), a prestigious competition for high school students.

AlphaProof uses the Gemini large language model to convert naturally phrased math questions into a programming language called Lean. This provides the training fodder for a second algorithm to learn, through trial and error, how to find proofs that can be confirmed as correct.

Earlier this year, Google DeepMind revealed another math algorithm called AlphaGeometry that also combines a language model with a different AI approach. AlphaGeometry uses Gemini to convert geometry problems into a form that can be manipulated and tested by a program that handles geometric elements. Google today also announced a new and improved version of AlphaGeometry.

The researchers found that their two math programs could provide proofs for IMO puzzles as well as a silver medalist could. Out of six problems total, AlphaProof solved two algebra problems and a number theory one, while AlphaGeometry solved a geometry problem. The programs got one problem in minutes but took up to several days to figure out others. Google DeepMind has not disclosed how much computer power it threw at the problems.

Google DeepMind calls the approach used for both AlphaProof and AlphaGeometry “neuro-symbolic” because they combine the pure machine learning of an artificial neural network, the technology that underpins most progress in AI of late, with the language of conventional programming.

“What we’ve seen here is that you can combine the approach that was so successful, and things like AlphaGo, with large language models and produce something that is extremely capable,” says David Silver, the Google DeepMind researcher who led work on AlphaZero. Silver says the techniques demonstrated with AlphaProof should, in theory, extend to other areas of mathematics.

Indeed, the research raises the prospect of addressing the worst tendencies of large language models by applying logic and reasoning in a more grounded fashion. As miraculous as large language models can be, they often struggle to grasp even basic math or to reason through problems logically.

In the future, the neural-symbolic method could provide a means for AI systems to turn questions or tasks into a form that can be reasoned over in a way that produces reliable results. OpenAI is also rumored to be working on such a system, codenamed “Strawberry.”

There is, however, a key limitation with the systems revealed today, as Silver acknowledges. Math solutions are either correct or incorrect, allowing AlphaProof and AlphaGeometry to work their way toward the right answer. Many real-world problems—coming up with the ideal itinerary for a trip, for instance—have many possible solutions, and which one is ideal may be unclear. Silver says the solution for more ambiguous questions may be for a language model to try to determine what constitutes a “right” answer during training. “There’s a spectrum of different things that can be tried,” he says.

Silver is also careful to note that Google DeepMind won’t be putting human mathematicians out of jobs. “We are aiming to provide a system that can prove anything, but that’s not the end of what mathematicians do,” he says. “A big part of mathematics is to pose problems and find what are the interesting questions to ask. You might think of this as another tool along the lines of a slide rule or calculator or computational tools.”

Updated 7/25/24 1:25 pm ET: This story has been updated to clarify how many problems AlphaProof and AlphaGeometry solved, and of what type.

Services Marketplace – Listings, Bookings & Reviews

Entertainment blogs & Forums

EV range anxiety could be a thing of the past thanks to Mercedes’ new solar paint – which promises thousands of free miles a year

Microsoft could tempt PC gamers away from Chrome with nifty new ‘Game Assist’ mode for its Edge browser

NYT Strands today — hints, answers and spangram for Tuesday, November 26 (game #268)

Zoom 2.0 relaunches as an AI-first company without video in its name

EV range anxiety could be a thing of the past thanks to Mercedes’ new solar paint – which promises thousands of free miles a year

The Science Fiction and Fantasy Books You Can’t Afford to Miss in September!

Send a newsletter? This $100 list-building tool is just $12 right now.

There’s officially a snake named after Salazar Slytherin now

EV range anxiety could be a thing of the past thanks to Mercedes’ new solar paint – which promises thousands of free miles a year

Microsoft could tempt PC gamers away from Chrome with nifty new ‘Game Assist’ mode for its Edge browser

NYT Strands today — hints, answers and spangram for Tuesday, November 26 (game #268)

Zoom 2.0 relaunches as an AI-first company without video in its name

Google DeepMind’s Game-Playing AI Tackles a Chatbot Blind Spot

Bydls

Related Post

EV range anxiety could be a thing of the past thanks to Mercedes’ new solar paint – which promises thousands of free miles a year

Microsoft could tempt PC gamers away from Chrome with nifty new ‘Game Assist’ mode for its Edge browser

NYT Strands today — hints, answers and spangram for Tuesday, November 26 (game #268)

You missed

EV range anxiety could be a thing of the past thanks to Mercedes’ new solar paint – which promises thousands of free miles a year

Microsoft could tempt PC gamers away from Chrome with nifty new ‘Game Assist’ mode for its Edge browser

NYT Strands today — hints, answers and spangram for Tuesday, November 26 (game #268)

Zoom 2.0 relaunches as an AI-first company without video in its name