The database development community is actively discussing DuckDB's planned transition from traditional YACC-based SQL parsing to a more modern Parser Expression Grammar (PEG) approach, while also sharing valuable insights about alternative parsing solutions currently available in the market.
Community Reception and Alternative Solutions
The announcement of DuckDB's parser modernization has sparked interesting discussions among developers. While some praise the project's continuous innovation, others point to existing mature solutions. One particularly notable mention is the sqlparser-rs from datafusion, which has gained recognition for its comprehensive support of various SQL dialects, especially in handling Microsoft SQL Server's unique syntax requirements.
From a practical standpoint, for anyone who needs to parse SQL today, I can recommend datafusion's sqlparser-rs... I don't know anything else that matches its level of support for all the crazy little-known syntax particularities of the various SQL dialects.
Technical Debate on Modernization
The community has raised thoughtful counterpoints to the modernization argument. Some developers argue that the age of a technology shouldn't be the primary reason for replacement. They emphasize that many computing concepts from the 1960s remain valuable and effective today. The discussion highlights that LALR(1) parsers can be made runtime-extensible, suggesting that the benefits of PEG should be evaluated on their technical merits rather than age alone.
Performance Comparison:
- YACC parsing time for TPC-H Query 1: ~0.03 ms
- cpp-peglib parsing time for same query: ~0.3 ms
- Large file (36,840 lines) parsing:
- Postgres (YACC): 24 ms
- cpp-peglib: 266 ms (without actions), 339 ms (with AST generation)
Educational Resources and Learning Opportunities
Community members have highlighted valuable learning resources for those interested in PEG implementations. The free book Janet for Mortals by Ian Henry has been recommended as an excellent introduction to PEG concepts, with some developers noting how it has influenced their programming perspective and understanding of parsing technologies.
Performance Considerations
While the article discusses a performance gap between YACC and PEG implementations, the community seems less concerned about the reported 10x slowdown, particularly for analytical queries where parsing time represents a minimal fraction of overall query processing time. The focus appears to be more on functionality and extensibility rather than raw parsing speed.
The discussion reflects a broader trend in database technology where developers are weighing the trade-offs between traditional tried-and-tested approaches and modern, more flexible solutions that can better accommodate future innovations in query language development.
Source Citations: Runtime-Extensible SQL Parsers Using PEG