The simplest optimizing compilers I've seen are in PeterNorvig
. They share several traits:
- They parasitize off of an existing language runtime.
- They compile to byte codes.
- They use an AbstractSyntaxTree to represent programs.
All of these compilers were written for instructional purposes, and all were built up from even simpler interpreters. Norvig does an especially spiffy job: he fits a compiler, an assembler, a peephole optimizer, a branch simplifier and a VM all into two chapters.
Furthermore, I suspect that any of these compilers could be built using XP and a bit of domain knowledge. Even better, you could improve them further without radical re-design--techniques are known for doing native code generation straight from an AbstractSyntaxTree
But what about a modern production compiler?
Modern production compilers generally don't rely on ASTs. Typically, they use something like (say) N-address code, possibly in StaticSingleAssignmentForm
. They use lots of funky algorithms. For an idea of how hairy it can get, compare the algorithms in StevenMuchnick?
, or even TheDragonBook
(older), to the books above.
I don't think you could build one of these beasts without some prior design work. I don't think you could start out with one of the simple compilers above, and transform it into one of these without breaking a lot of test suites for several programmer-weeks. I could be wrong.
Or does XP classify this kind of broad-sweeping, hard-won, architectural knowledge as a SystemMetaphor
, as was suggested by RalphJohnson
? -- EricKidd?
fits here in 2 different ways:
To get really fast applications, the compiler will, of course, have to compile all the way to MachineCode
. (To make it easier to test if it's working right, many compilers have a compiler option to save a ".cod" (MachineCode
) file, which shows the raw hex MachineCode
bytes, the AssemblyLanguage
, and the high-level source.).
If I get to pick the source language, the simplest possible compiler for me to write is an assembler for AssemblyLanguage
. If you get to pick the source language
and don't have to optimize, the simplest possible compiler is the null compiler that performs an identity transformation. Let the source language be the target runtime format. Done.
The simplest optimizing compiler I've written was 200 lines of C code, compiling ForthLanguage
to powerpc MachineCode
. Optimizations were limited, but included TailCallOptimization
, inlining, and a generic PeepholeOptimization?
mechanism that used something akin to sed's s/foo/bar/ to rewrite short segments of code. Oh, and it was hard RealTime
in the absense of ForthMacro
s (although, in Forth, macros are rarely absent); the amount of work for a character in the source code was bounded. -- AdamBerger
has the complete source code for a (non-optimizing) compiler.
also has a very small compiler, with limited optimizations. ForthLanguage
lends itself to simple compilers. It is more of a macro assembler for a virtual machine than a high-level language.
I remember that the compiler (to 68k) for the FalseLanguage
had only 1K size (written in AssemblyLanguage
). -- GunnarZarncke
The C compiler I wrote for the BbcMicro
while at Acornsoft ran in 20K - code and data. It barely deserved the title "optimizing", though. Paul Fellows also wrote a Pascal compiler for the same machine that also ran in 20K. -- PaulHudson
See also StepsTowardTheReinventionOfProgramming