AI in mathematics is forcing big questions

(spectrum.ieee.org)

108 points by rbanffy 10 hours ago|78 comments

•

skybrian 5 hours ago

Here’s one way to think about the difference between coming up with a formal proof and having something other mathematicians can use:

> A clear explanation can be found in Alex Kontorovich’s account of his own learning curve with formalized mathematics. In a nutshell: Mathlib, the dominant Lean library, is a human-curated formalization of an ever-growing fraction of existing human mathematics. It exposes clean APIs and abstractions, without which no autoformalization could take place. By contrast, Math Inc’s autoformalized proof of Viazovska’s results exposes no intelligible interface. Who in their right mind would merge a 200,000-line unaudited vibe-coded blob into the master branch of global human science?

https://davidbessis.substack.com/p/the-fall-of-the-theorem-e...

•

teiferer 49 minutes ago

> Here’s one way to think about the difference between coming up with a formal proof and having something other mathematicians can use:

Here is a other one: hello_world.c versus hello_world.exe (apologies for windows extensions, just for illustration).

One is made by a human for human consumption and extension (though legible by a machine). The source code.

One is made by a machine for a machine. Unreadable by a human. The "binary", though that's a terrible misnomer. (Sure you can disassemble but any nontrivial program is too much to cope with as a whole).

Source vs binary. Both are useful but only one is useful for human consumption.

•

dr_dshiv 15 minutes ago

Binaries are executed by machines but are not yet understandable by machines. (Now that we live in an era where machines can understand, imperfectly, like us)

•

hiAndrewQuinn 31 minutes ago

Imo, the proved theorem is the API. And that's really all it has to be.

If there are other lemmas, etc buried inside that 200k blob that can be factored out and proved and used themselves, so much the better. But denying a machine-valid proof just because it's incomprehensible with what a human being considers a reasonable effort made to unpack it just seems odd to me. I see no reason not to accept the vibe coded blob if Lean says it's kosher except anthropocentrism.

•

messe 14 minutes ago

But part of the point of mathematics is human understanding. I think most would be willing to accept the proof. They just wouldn't think it's nearly as useful as one that could be understood.

•

galaxyLogic 3 hours ago

> Mathlib, the dominant Lean library, is a human-curated formalization of an ever-growing fraction of existing human mathematics.

So why don;t they use AI to write Lean programs? That should make the AI-proofs more readily human undersrndable.

•

crystal_revenge 4 hours ago

> Who in their right mind would merge a 200,000-line unaudited vibe-coded blob

Anyone who understands type theory and how theorem provers work? It's sort of akin to saying "how do you know that a massive C++ program that compiles to machine code compiled to the correct machine code that will actually run and it's just not a random string of bits!?!?!", you know because the compilation would have failed otherwise(this is different than saying the program behaves correctly, but that's precisely the difference between formal proofs and compiled programs).

The entire argument both you and Bessis are implying is that mathematics must be human intelligible. But there's absolutely no reason to assume that every mathematical statement must have a human intelligible representation. There is also not reason to assume that if we restrict ourselves to the subset of mathematical statements that are human intelligible that this is of any use.

Just because people who don't want this to be true, and I can understand the motivation, doesn't mean that it isn't still the case.

•

teiferer 43 minutes ago

Imagine tomorrow all source code for all software disappears.

Would we still be able to use computers? Of course! They don't need the source code to run.

Would we nevertheless be in big trouble? Oh definitely. We'd need to write all software again, from scratch. Some critical parts we could reverse engineer. Maybe even derive some structures that translate back into source code, but only because a human wrote that source code in the first place.

Hopefully the point is clear: A proof, even if it is correct, that is totally obscure and unintelligable by humans is not very useful for mathematics. It's a black box that doesn't further understanding of the structures and approaches to think about them, and that's what math is all about.Just having a big binary blob of a program doesn't help much if you want to add a feature.

That's also why biology is so hard. There is no source code. It's just millions and millions of years of evolution and things have evolved in weird ways that don't really make it easy to understand them, even though they clearly work.

•

zmgsabst 4 hours ago

You didn’t answer why merge it into a library focused on humans developing mathematics though.

It remains all of those things, sitting alone in its own repository of 200kLOC; what benefit comes from merging it into mathlib?

> There is also not reason to assume that if we restrict ourselves to the subset of mathematical statements that are human intelligible that this is of any use.

This is obviously silly:

Things that aren’t human intelligible aren’t human usable, so the restriction is necessary to have a collection of things humans can utilize.

•

crystal_revenge 4 hours ago

> Things that aren’t human intelligible aren’t human usable

This is objectively false, people use things every single day they don't understand. We still have plenty of things about the world we don't understand but still find useful.

You are saying anything we know to be the case, but cannot understand why cannot be used? Can we just stop sleeping because we haven't reasoned why sleep is necessary even though we know it is necessary? I mean we still don't really understand gravity (we know how but not why)

•

FridgeSeal 2 hours ago

If you’re fine with a future like Warhammer 40k where we all have to be Tech Priests making prayers and performing opaque rituals to get things out of the machine god because we no longer understand things, that’s fine, but that’s not a future the rest of us want.

•

zmgsabst 3 hours ago

No — people don’t successfully use things they don’t understand every day.

They approximately use them with varying degrees of success, but also mistakes, broken inferences, etc.

My exact point is that your view reduces our ability to do mathematics to that broken, flawed usage and thereby undermines its utility for logical precision: mathematics is only useful because we cleanly understand it.

When you try to use mathematics without understanding, you cause disasters: stock market crashes from mispricing options, Amazon’s 2018 hiring freeze from misallocating $1B, etc.

Note: neither of your examples (sleep, gravity) are things that people intentionally use. They just happen to people.

I think it’s very telling you couldn’t think of an example.

•

sdenton4 49 minutes ago

How many people drive cars without knowing how an engine works? Or make a phone call without knowing how voice compression for a cellular network does it's thing? Or eats food without knowing how it came together from the supply chain?

•

VBprogrammer 23 minutes ago

This feels like a stretch. It would be impossible for someone who didn't know how an engine worked to repair or improve the design of it.

•

whattheheckheck 3 hours ago

And theyre ignorant. You want to be ignorant? They had a term for that in ancient Greece.

Reasoning by analogy...

•

greenavocado 3 hours ago

My brother in Christ, you didn't need x86 opcodes to be intelligible to use this web site.

•

dripdry45 3 hours ago

No, but SOMEONE did.

•

jdw64 3 hours ago

Someday, human mathematicians might end up doing proofs for proofs.

When a codebase gets too large, you eventually can't understand all of it. Even code I wrote myself, I can't fully grasp it.

In those cases, we usually write tests.

But when tests get too big, we end up writing tests for the tests.

Eventually, it feels like we're heading into an era of proofs for proofs.

For me, this problem usually unfolds like this:

1.I can't trust SDKs or Stack Overflow code.

2.So I write tests.

3.But I can't trust the tests either.

4.So I use test coverage, mutation testing, property testing, and fuzzing.

5.If that's still not enough, I add formal verification.

6.And then the problem becomes: can I trust the verifier?

That's how it ends up. Wouldn't human work shift toward verifying the verification systems themselves?

•

bodzioney 2 hours ago

Interactive theorem provers are what verifies (or proves) the proof here. This means you don’t even have to look at the actual proof to check its correctness. You just have to make sure the theorem definition is what you wanted (not to say this is trivial) and that no nonsense axioms were defined etc. Now for verifying the theorem prover itself, this is kind of a chicken and the egg problem. We know the mathematical foundations are solid. But to check the implementation of said foundations would require… another theorem prover. In practice, most theorem provers try to make their kernel as small as possible so it can be reasoned about by humans. Coming back to the original topic, mathematics isn’t all proofs though. Someone has to come up with new theories and models. I guess this is what mathematicians would be doing in the future if AI becomes better at proving things than humans. But I could see it narrowing the field, just like it’s doing in others.

•

mockerell 3 hours ago

That’s an interesting way to think about it. While tests don’t satisfy mathematicians‘ standards for rigor one could instead look at interactive proofs from complexity theory. These are of interest if a problem doesn’t allow for short proofs, i.e. when the problem is not in NP [1]. In your scenario an adapted AI-assisted theorem prover would be the prover, and a mathematician the verifier.

[1] https://en.wikipedia.org/wiki/Interactive_proof_system

•

jdw64 2 hours ago

Thank you for explaining my point more logically and coherently. I'll read it over.

•

anon-3988 2 hours ago

What _should_ happen is that we create abstraction. We create a proof for a concept or algorithms. Then we can move on. We need to develop artifacts that can carry proof. We cannot keep writing piles and piles of texts and Python code to build infrastructure. We would be rewriting the same thing 1000x.

•

teiferer 36 minutes ago

> proofs for proofs.

What does that even mean? Sorry, this is just a nonsensical term.

The issue is not that the proofs could be wrong. It's that humans don't understand them even if they are verifibly correct.

In contrast, with software you don't know if it's correct. That's what you have a test for. Even if you understand it, there could be a bug. And tests could have bugs too, so you can have tests for tests. But proofs that are verified are correct. That's it.

Imagine you have a program that is formally proved to be correct. You don't need a test for it. However you might not understand it. Having a test or not does not change that.

TL;DR: Correctness and understandability are (mostly) independent properties.

•

fiforpg 8 hours ago

The use of computers in mathematics has been somewhat controversial from the very start.

There are of course all the computer-assisted proofs (see 4 color theorem), as well as the partially-assisted ones (see Viazovska et al on packing problems in dimensions 8, 24). But even finding a solution numerically, then rigorously verifying its properties can leave a lingering sense of incompleteness, of a gap in understanding. I like this one quote by (allegedly) Wigner that illustrates it well:

"It is nice to know that the computer understands the problem, but I would like to understand the problem, too."

•

rdedev 7 hours ago

Reminded me of this quote: the problem with machine learning is that it's the machine that does the learning

•

smitty1e 4 hours ago

A montage is a fantastic device in a movie.

But a montage about weight lifting does not a body builder make.

•

crystal_revenge 4 hours ago

> but I would like to understand the problem, too

But why should it be the case that this is always possible?

It's entirely reasonable that the set of useful mathematical proofs is a proper superset of human intelligible useful proofs.

In fact, to argue the contrary would imply there is something incredibly remarkable about human cognition.

•

zmgsabst 4 hours ago

No, it doesn’t imply that.

Just that the set of proofs a human can interpret and the set of statements a human can understand overlap; conversely, you require that the statements/theorems humans can understand be a larger class than the proofs they can understand.

To me, it’s not obvious which of those should be true:

- can we only understand theorems for which we comprehend their proof?

- or can we understand theorems despite not comprehending the proof structure?

Within the mathematics community, opinions differ. But you’re elevating your perspective on that question into a law, without any evidence.

•

crystal_revenge 3 hours ago

> understand theorems for which we comprehend

I don't know what your distinction between "understand" and "comprehend" but my point was not about these words, but about being "useful" and being "understandable".

I'm saying there's no relationship between a mathematical statement being useful and it being understandable.

If it is true that "understanding is a prerequisite for usefulness" (where "understanding" means that a statement can be proven in a way that is intelligible to humans) was a property of mathematical expressions, then this fact would certainly be useful (we could exclude any statements that no human understand from the world of useful mathematical expression). But, by that definition, we would need to understand that statement, so you would need to be able to prove that "understanding is a prerequisite for usefulness" in a human intelligible way.

Now what I just wrote is in itself not a proof that we can't know, but proving the above statement would involve expressing the claim in a mathematically verifiable way that was also understandable by humans, which does imply something remarkable about human cognition (something that would be intelligible no less!)

•

schmuhblaster 2 hours ago

Mathematics has always been an experimental science to some extent. While Newton, Euler and Gauss would spend a lot time calculating numerical approximations by hand, modern mathematicians have been doing the same using computers and software. And once an a clear picture emerge about what’s going, you can start to formalize that and attempt to prove and communicate your results in the standard definition, proposition, lemma, theorem scheme. (Btw there is even a journal called Experimental Mathematics devoted to this approach).

I don’t see that LLMs will fundamentally change this, but rather accelerate the speed of mathematical research.

Some computer generated proofs might of course be hard to understand, but at least their existence gives another data point work with.

Doing Mathematics is more than proving something, that’s just the end of a long road spent pondering at one’s desk about how things could work out.

•

jackyinger 7 hours ago

To bluntly put it in a nutshell, and state the obvious:

If you don’t understand the problem you can’t be sure that the computer does.

•

avaer 7 hours ago

As a programmer I definitely get annoyed when I see code and I don't understand what it does.

But I also definitely don't understand the problem if I can't get the computer to understand it, with tests.

In some sense I always considered programming to be more trustworthy than maths arguments without the certainty of a solver proof.

With all of these questions in the air, epistemology might be making a comeback.

•

godelski 4 hours ago

  > In some sense I always considered programming to be more trustworthy than maths arguments without the certainty of a solver proof.

But programming is a subset of mathematics. They are both formal languages. I suspect the trustworthiness is more in your comfort level than the ability to verify

•

zmgsabst 4 hours ago

That depends on who you ask.

Type theory can also be an independent synthetic foundation atop which you build mathematics.

•

therobots927 6 hours ago

Tests only work for a limited set of programming verification. In many cases you don’t actually know what the output for any given input should be, so there’s no way of verifying the AI-generated code. You just kind of have to trust it. The only exception I can think of is robotics and quantitative trading. Which have already been extensively utilizing AI.

•

akoboldfrying 6 hours ago

Well, if you can formalise the problem statement (this is the hard part) sufficiently well that the computer can produce a proof, you can be very sure the proof is sound.

A fundamental property of any formal proof is that it can be checked by a fairly stupid machine, automatically, because every step is a simple mechanical operation that names one of a handful of axioms and refers to a handful of earlier steps, the truth of which has already been established. So while coming up with a proof may require genius-level thinking, checking an existing fully fleshed out proof is simple -- just potentially very tedious because of the sheer number of steps.

That said, a typical human-written proof omits many steps considered "obvious" to a trained mathematician. Converting this to a formal proof involves interpreting what the original author "must have meant", which requires a lot of expertise and can go wrong -- or it may reveal that there is some inconsistency in the original claim itself.

•

whattheheckheck 3 hours ago

Complexity theorists are in a good spot

•

bsder 5 hours ago

> checking an existing fully fleshed out proof is simple

The controversy around Mochizuki and the "abc Conjecture" proof is a contrary example.

•

akoboldfrying 2 hours ago

How does this involve computer checking of a formal proof?

Last time I checked, it was a disagreement over whether an informal proof is sound, which is exactly the problem with informal proofs.

ETA: There might be a misunderstanding about what "formal proof" means. Even a very detailed, precise English-language description of a proof is generally not a formal proof. The bar is essentially: "It could be checked by a machine that follows simple rules." If different interpretations of a "proof" are possible, the "proof" is by definition informal. Informal proofs are valuable because they are strong evidence that there's a corresponding formal proof "underneath" that would establish the theorem's truth, and because they are (usually) much easier to understand.

•

seanmcc 7 hours ago

Almost another layer in the peer review process in the best case right? Just a different kind of peer you have to review.

•

wbl 5 hours ago

Look up the story of Flyspeck for this taking an entire career.

•

therobots927 6 hours ago

So… more peer review backlog. That sounds fun. Oh, you want someone to review your paper, Mr phd in mathematics with 20 years of experience? Get in line behind chatGPT.

•

kimjune01 5 hours ago

lean compiles or it doesnt

•

whattheheckheck 3 hours ago

You can also pass pytest with assert 1 = 1...

•

slopinthebag 4 hours ago

Reminds me of a quote from Tsoding

> “Programming is understanding. If you don't understand what you are doing, you are not programming. You are generating text.”

Perhaps a proof without understanding is just generating numbers.

•

nok22kon 4 hours ago

programming is also solving problems

in medicine they use all kinds of drugs which they don't really understand how they work. anesthetics is a great example

•

glouwbug 9 hours ago

Turns out you have to be Terence Tao to know when an LLM is right or wrong

•

gerdesj 9 hours ago

"I imagine my work could be completed with AI assistance in a matter of days—maybe hours."

Would some one with tokens to burn mind checking that statement out and post back. Be sure to use long dashes too.

•

bijowo1676 7 hours ago

is the similar statement true for coding as well?

i.e. You have to be a good engineer to understand the well generated LLM code and a program

•

glouwbug 6 hours ago

Yes, that's the point I'm making

•

esafak 4 hours ago

You don't have to be as good as the model you are overseeing, but it sure helps, otherwise you will only be able to evaluate partial claims, missing mistakes, and potentially the big picture.

•

paulpauper 8 hours ago

Yeah, so much for AI making mathematicians obsolete.

•

ggm 2 hours ago

Argument started when 4 colour map proof was machine assisted. So 1976. (Which the fine article says: I remember my dad being in this argument at the time from the comp. sci side, albiet as a mathematicianby training)

•

cpard 6 hours ago

Human mathematicians could become “priests to oracles.”

Priests were interpreting the oracles (at least at a place like Delphi) according to the context of the people asking the questions aka participating in politics of that ancient times.

Subjectivity was a feature and I’m not sure that fits to mathematics though.

I wonder if mathematics as a science field moves more into engineering or if a different branch will emerge that is closer to that because to the point of the article, science is about understanding not just results.

•

therobots927 6 hours ago

Human mathematicians could become “priests to oracles.”

This is a decidedly anti-enlightenment statement.

•

rirze 5 hours ago

I think we’re going to find out the hard way that the proofs left to solve are very much not elegant.

•

stringfood 4 hours ago

couldn't God have created a more orderly universe for us? this is ridiculous

•

lubujackson 9 hours ago

The article poses if AI will be a tool, a collaborator or an oracle. Why not all 3?

If mathematics is human understanding of logical consequences, understanding is the priority. But if AI proves something we can't understand but can utilize, that is a different sort of useful.

We are getting awfully close to "the answer of the universe is 42" and having it not be a joke...

•

fn-mote 8 hours ago

I don’t know about “close”, but there are certainly results in math that are considered deep because they require the use of a “Hard Theorem” at some point. That kind of building on top of something Very Difficult is still possible without understanding the “Very Difficult” part. I’d say a lot of not-amazing math is built by believing the platform works but not being able to built it yourself.

I couldn’t build an internal combustion engine or even a plastic box, so maybe there’s nothing wrong with this approach.

•

andai 5 hours ago

>more recently, a new general-purpose AI system from OpenAI disproved an important conjecture in combinatorial geometry. This result would have been worthy of publication in a major mathematics journal if humans had been the authors

The quality of the mathematics is a function of who has authored it?

•

epihelix 5 hours ago

I suspect it's more that LLMs are not allowed under current journal rules to be authors.

•

recursive 4 hours ago

Worthiness of publication in a journal.

•

morpheos137 5 hours ago

Much can be resolved when it is understood math is discovered not created. AI is a tool. if it makes discovery or proof easier that is still mathematics. A proof stands on its own logic regardless how it is derived. The root concern is how ai may provide uplift for mathematical discovery outside of socially expected channels.

•

frabcus 36 minutes ago

I only did undergraduate level in Maths, and to me there is a key aesthetic element which makes it created. The choice of axioms to use, the choice of with theorems are interesting.

Yes the "truth" (doesn't exist, see Gödels theorem) is discovered in a vast, wild landscape that Mathematicians explore.

But which areas are worth exploring is a critical question. Partly driven by application, partly aesthetic. It's a quest for simple things that are a bit surprising, or that were hard to make the statements so simple.

•

esafak 4 hours ago

You're not concerned about mathematics disappearing as a profession?

•

therobots927 6 hours ago

It’s a well known problem in higher mathematics that even if you’ve solved a problem, often the proofs are incredibly long and complex and require an extensive amount of time spent by peers to review it.

It would be great if someone could explain to me how AI improves this situation. Even if AI thinks it’s solved a problem, unless the proof is incredibly efficient and well explained, it will be difficult to verify the correctness. One hallucination in 300 steps of logic is enough to destroy the entire proof.

•

hilbertseries 5 hours ago

In 2012 Mochizuki claimed to have proved the abc conjecture by developing a new branch of mathematics. He was a respected mathematician, but the theories he had developed were so complex no one could determine if he was correct. It took six years until two number theorists dissected the proof and found a fatal flaw in it.

•

jonahx 6 hours ago

> It would be great if someone could explain to me how AI improves this situation.

It's main utility is in the search step, not the verification step. The search is the bulk of the work and creativity. Separately, as the sibling commenter pointed out, it will likely get better at the verification step as well, with integrations of tools like Lean.

> One hallucination in 300 steps of logic is enough to destroy the entire proof.

The situation with human mathematicians is not much different. Eg, Wiles original proof of Fermat's Last Theorem contained errors found by reviewers, which he later repaired.

•

tacomonstrous 4 hours ago

>The situation with human mathematicians is not much different. Eg, Wiles original proof of Fermat's Last Theorem contained errors found by reviewers, which he later repaired.

In fact, it was Wiles himself who realized there was an error.

•

jstanley 3 hours ago

This is what Lean is for: https://lean-lang.org/

If you have the LLM generate Lean code, and it compiles, then the proof is correct and you don't need to bother checking its working. (You still need to check that it is proving the theorem you asked it to prove).

•

skipkey 6 hours ago

I would imagine that in the future AI will be doing proofs in Lean or whatever the successor to it, which gives you a pretty good confidence it’s correct.

•

paulpauper 8 hours ago

It's amazing how much attention this issue has gotten. What is lost in the hype is no AI can tell you if a proof is correct. An AI can produce a convincing looking proof, but it can have a subtle but critical error or make an assumption that is unfounded. Thus, it ultimately comes down to humans. A mathematician has to craft the prompt, and mathematician to interpret/check the results. Also, these programs are very expensive and propitiatory. They are not like the commercial AI that regular people use. It takes considerable prompting and trial an error to solve even Olympiad/Putnam problems, and tons of work by humans pouring over the results to see if it's correct. For every Erdos problem that captures the headlines, there are many where it failed or untold hours of prompting and token burn to get that result, and manhours verify it.

•

treyd 7 hours ago

I don't think you understand the workflow. Terrence Tao has done a lot of work using them in conjunction with LEAN.

You aren't having the AI check the proof, you interactively work on the same LEAN proof, handing off between the AI assistant and having LEAN check it and provide feedback for both of you when there's a mistake.

•

golly_ned 6 hours ago

Please read the article. You've ignored proof checkers.

•

ares623 8 hours ago

But just imagine...

(edit: lol didn't realize the sibling comment below is essentially my comment)

•

hackermailman 7 hours ago

AI can't yet come up with any new ideas to make the inductive leap to solve a math problem. New ideas are what get the accolades and using an old idea just means the original author missed something. We are still at the author missed something stage that AI is doing today.

It can definitely be a good research assistant though

•

mmooss 5 hours ago

There's yet another major issue of the centralization of power and knowledge:

> Some worry about the accessibility of AI tools. Traditionally, mathematicians have required little more than intuition, training, and a pen and paper to advance their field. If this slow, deliberative process is no longer valued by society, and particularly by research funders, then mathematics could become an elitist activity, only practiced by select organizations that can afford to work with proprietary AI models.

This can be true of anything LLMs do better than existing options. Because the best LLMs require enormous resources to develop, access to them can be very limited. Right now they are priced for market share. What happens when your small law firm attorney, or self-representation, goes up against a large firm with LLM expertise? Can the kid from the poor high school compete with the kid from the rich school with premium LLM access, in mathematics for example?

•

nok22kon 4 hours ago

always has been

the poor kid always had disadvantages, had to help the family, while the rich kid could focus on the math, and maybe get into a good math place with family help