Claude Code Shows Promise in Formal Theorem Proving, But Community Warns of the "90% Problem"

BigGo Community Team
Claude Code Shows Promise in Formal Theorem Proving, But Community Warns of the "90% Problem"

Interactive theorem proving has long been considered one of the most challenging areas in computer science. Tools like Lean require users to write mathematical proofs in a formal language that computers can verify with absolute certainty. Unlike regular programming where bugs might slip through, theorem provers either accept your proof as mathematically sound or reject it entirely. This all-or-nothing nature has made the field accessible only to experts with deep mathematical training.

Recent experiments with Claude Code, Anthropic's AI coding agent, suggest this barrier might be lowering. The AI has demonstrated surprising capability in writing Lean proofs, completing complex mathematical formalization tasks that would typically require specialized knowledge. The key advantage appears to be Lean's strict feedback system - when the AI makes an error, it receives detailed, actionable information about what went wrong, allowing it to iterate and improve its approach.

Interactive Theorem Proving Tools: Lean is highlighted as the primary tool, with nearly half a million lines of formalized mathematical code

The Familiar Pattern of AI Progress Limitations

However, the community has raised significant concerns about a pattern they've seen repeatedly with AI tools. Many developers report that AI can handle the initial 80% of a project brilliantly, but struggles enormously with the final 20%. This phenomenon, sometimes called the 90% problem, becomes even more pronounced as projects grow in complexity and require increasingly specific solutions.

Anyone who has used AI tools has seen cases where the first 80% of a project came together like a bolt of lightning, but the last 20% is next to impossible for the AI to accomplish, even if it doesn't seem more complex than the rest of the code.

The concern is particularly relevant for theorem proving, where incomplete work has no value. Unlike software development where partial solutions can still be useful, mathematical proofs must be complete to be meaningful. This creates a high-stakes environment where the AI's tendency to struggle with final details could prove especially problematic.

AI Development Pattern: Community reports consistent "80/20 rule" where AI completes 80% of projects easily but struggles with final 20%

The Specification Challenge Remains

Beyond the completion problem lies a deeper issue that AI hasn't solved: the challenge of creating proper specifications. As one community member noted, writing programs isn't the hard part - figuring out exactly what a program should do is where the real difficulty lies. This problem becomes even more critical in formal verification, where you must precisely capture not just what your code does, but what it should do mathematically.

The risk is that users might end up with formally verified proofs that don't actually represent the problems they intended to solve. Even if Claude Code can write perfect Lean syntax, someone still needs the expertise to ensure the mathematical statements being proven are the right ones. This suggests that while AI might lower some barriers to entry, deep domain knowledge remains essential.

A Promising Direction Despite Limitations

Despite these concerns, the combination of AI and formal verification represents an intriguing development. Theorem proving provides exactly what AI needs to improve: immediate, detailed feedback about errors. When Claude Code makes a mistake in Lean, it gets precise information about what went wrong and can iterate toward a solution. This creates a natural learning loop that doesn't exist in many other domains.

The community seems cautiously optimistic about this direction, particularly for users who already understand the mathematical concepts they want to formalize. For these experts, AI assistance could significantly reduce the tedious aspects of proof writing while still requiring human oversight for the conceptual work. The technology might not eliminate the need for mathematical expertise, but it could make that expertise more productive.

The key will be managing expectations and understanding the tool's limitations. As with other AI applications, the most successful users will likely be those who use it to augment their existing skills rather than replace their need to understand the underlying domain.

Reference: Claude Can (Sometimes) Prove It