The original C compiler, written by Dennis Ritchie (known as dmr) in the early 1970s, has become a fascinating historical artifact for programmers and computer scientists alike. This primeval code, now preserved in the legacy-cc repository, offers a glimpse into the humble beginnings of what would eventually grow into a trillion-dollar industry.
A Compiler That Can't Be Compiled
The repository contains the earliest versions of the C compiler, dating back to around 1972. Interestingly, these source files cannot be compiled with modern C compilers like GCC. This historical code represents a transitional phase in programming language development, with several commenters noting that the first C compiler was actually written in B, with the codebase evolving gradually through iterative changes until it became what we now recognize as C.
Probably one of my favorite pieces of software of all times. Learned so much from this!
Syntax From Another Era
The code reveals striking differences between early C and its modern counterpart. One commenter highlighted the unusual usage of keywords like extern and auto - terms that still exist in modern C but function differently today. In these early files, extern was used to bring global symbols into function scope, and everything defaulted to int type without explicit declaration. Array declarations sometimes specified sizes while others did not, with sizeless arrays functioning essentially as pointers.
This syntax represents what many refer to as K&R C (Kernighan and Ritchie C), which predates the standardized ANSI C/C89 that most programmers would recognize today. Despite being deprecated decades ago, this style was supported by GCC with the -traditional flag until relatively recently.
Key Information About legacy-cc
- Original Author: Dennis Ritchie (dmr)
- Time Period: Around 1972
- Hardware: Developed for PDP-11
- Notable Features:
- K&R style function declarations
- Default "int" types
- Unusual memory management techniques
- Two-phase compiler design
Useful Resources
- PDP-11 Emulator: http://pdp11.aiju.de/
- Research Unix Repository: https://www.tuhs.org/Archive/Distributions/Research/
- Original Source: https://www.bell-labs.com/usr/dmr/www/primevalC.html
Creative Memory Management
Perhaps the most intriguing aspect of the codebase is a function called waste() that has sparked considerable discussion. The function appears to deliberately consume space through recursive self-calls:
waste() /* waste space */
{
waste(waste(waste),waste(waste),waste(waste));
waste(waste(waste),waste(waste),waste(waste));
...
}
While the sparse comment simply states waste space, community analysis suggests this was actually a clever technique for reserving memory. One commenter explained that both compiler phases used this approach to ensure reserved memory regions had the same address across phases, allowing expression trees with pointers to be passed between phases efficiently. This demonstrates how hardware limitations of the era forced programmers to develop creative solutions that might seem bizarre by today's standards.
Historical Impact
The significance of this code extends far beyond its technical curiosities. As one commenter noted, Oracle Database's first publicly available version (v2, released in 1979) was written in assembly for the PDP-11. When Oracle later rewrote version 3 in C (1983) for cross-platform portability, they found that mainframes lacked C compilers. Rather than rewriting their database in COBOL or another mainframe language, Oracle created their own C compiler for mainframes.
This pattern repeated across the industry, with C becoming the lingua franca of systems programming and enabling software portability across diverse hardware platforms. The UNIX operating system itself, originally developed alongside C, was ported to IBM's System/370 mainframes by 1980.
Reflections on Simplicity and Complexity
The repository has prompted thoughtful discussions about C's nature. While many programmers appreciate C for its apparent simplicity, several commenters noted that this simplicity is somewhat illusory. The language may be small and close to the hardware, but it contains significant semantic complexity through implicit type conversions, aliasing rules, and memory management requirements.
As one commenter eloquently put it, C is small, but not simple. Its perceived simplicity comes from forcing programs to remain simple by limiting powerful abstractions, not from the language itself being straightforward to implement or fully understand.
Looking at this historical code provides valuable perspective on how far computing has evolved while reminding us that even the most sophisticated modern systems trace their lineage back to these humble beginnings. The legacy-cc repository stands as a testament to how a relatively small piece of software, created by one brilliant mind, could set the foundation for decades of technological advancement.
Reference: legacy-cc