Community Skeptical of Claims Made by Formal Verification of ML Models in Lean Project

BigGo Editorial Team

Community Skeptical of Claims Made by Formal Verification of ML Models in Lean Project

A recently announced project claiming to provide formal verification for machine learning models has drawn significant skepticism from the technical community. The Formal Verification of Machine Learning Models in Lean project purports to offer a framework for proving properties such as robustness, fairness, and interpretability of ML models using the Lean 4 theorem prover.

The project presents itself as a comprehensive solution with features including a Lean library with formal definitions for various neural network architectures, a model translator for converting trained models to Lean code, and a web interface for visualization and proof management. However, experts in the field have raised substantial concerns about both the feasibility and authenticity of the claims.

Technical Limitations and Overpromising

Community members with expertise in formal verification have pointed out fundamental limitations in the approach. Formal verification of complex ML models faces inherent challenges that the project appears to gloss over. Proving meaningful properties about neural networks is notoriously difficult, especially for real-world applications.

This seems pointless... the actual hard part is of course the proofs themselves, which they don't seem to solve. Theorems of the kind model X always does this desirable thing are almost always false (because it's an imprecise model), and theorems of the kind model X always does this desirable thing Y% of the time seem incredibly hard to prove.

The repository's examples have been criticized for being trivial and not representative of real-world verification challenges. One commenter noted that the fairness example merely proves that a classifier that always returns yes will have the same classification percentage across all demographics—a mathematically trivial result with no practical significance.

Questions About Authenticity

Several commenters have raised concerns about the authenticity of the project itself. The Lean code in the repository has been described as an AI-generated mix of Lean 3 and 4 that likely doesn't compile properly. The absence of any cited scientific work or associated research papers further undermines the project's credibility.

One community member expressed surprise that the project reached the front page of Hacker News despite appearing to be filled with AI-generated stubs and promises rather than substantive content. The lack of scientific foundation is particularly concerning given the complexity of the problem domain.

Key Issues Raised by Community:

Code Quality Concerns: Described as "AI-generated mix of Lean 3 and 4" that likely doesn't compile
Limited Proofs: Examples prove trivial properties (e.g., a classifier that always returns "yes" has 100% classification rate for all groups)
Missing Scientific Foundation: No cited research papers or theoretical framework
Unaddressed Challenges:
- Handling stochastic elements in generative AI
- Defining objective metrics for concepts like "fairness"
- Proving meaningful properties about complex neural networks

Alternative Approaches Mentioned:

DARPA's ANSR (Assured Neuro-Symbolic Reasoning)
DARPA's TIAMAT (Transfer from Imprecise Models for Assured Tactical Autonomy)

Fundamental Challenges in ML Verification

The discussion highlights several fundamental challenges in formally verifying ML models that the project doesn't adequately address. For stochastic models—which include most generative AI systems—verification becomes even more complicated. As one commenter asked, How do they deal with models with a stochastic element? Not sure how you intend to prove sampling.

Even defining what properties should be verified presents significant challenges. Concepts like fairness are inherently ambiguous and difficult to define objectively in mathematical terms. Meanwhile, proving that a language model does not output false statements appears to be an intractable problem.

Despite the skepticism surrounding this particular project, the community recognizes the importance of work in this direction. Alternative approaches mentioned include DARPA's ANSR (Assured Neuro-Symbolic Reasoning) and TIAMAT (Transfer from Imprecise Models for Assured Tactical Autonomy) programs, which are tackling related challenges in AI verification with more rigorous scientific backing.

The discussion serves as a reminder of the gap between ambitious claims in AI safety and the current technical reality. While formal verification of simple properties for specific ML models is possible, comprehensive verification frameworks that can handle modern deep learning systems remain an open research challenge rather than a solved problem.

Reference: Formal Verification of Machine Learning Models in Lean