Learning to write a compiler
Preferred languages: C/C++, Java, and Ruby.
I am looking for some helpful books/tutorials on how to write your own compiler simply for educational purposes. I am most familiar with C/C++, Java, and Ruby, so I prefer resources that involve one of those three, but any good resource is acceptable.
Big List of Resources:
- A Nanopass Framework for Compiler Education ¶
- Advanced Compiler Design and Implementation $
- An Incremental Approach to Compiler Construction ¶
- ANTLR 3.x Video Tutorial
- Basics of Compiler Design
- Building a Parrot Compiler
- Compiler Basics
- Compiler Construction $
- Compiler Design and Construction $
- Crafting a Compiler with C $
- Crafting Interpreters
- Compiler Design in C ¶
- Compilers: Principles, Techniques, and Tools $ — aka "The Dragon Book"; widely considered "the book" for compiler writing.
- Engineering a Compiler $
- Essentials of Programming Languages
- Flipcode Article Archive (look for "Implementing A Scripting Engine by Jan Niestadt")
- Game Scripting Mastery $
- How to build a virtual machine from scratch in C# ¶
- Implementing Functional Languages
- Implementing Programming Languages (with BNFC)
- Implementing Programming Languages using C# 4.0
- Interpreter pattern (described in Design Patterns $) specifies a way to evaluate sentences in a language
- Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages $
- Let's Build a Compiler by Jack Crenshaw — The PDF ¶ version (examples are in Pascal, but the information is generally applicable)
- Linkers and Loaders $ (Google Books)
- Lisp in Small Pieces (LiSP) $
- LLVM Tutorial
- Modern Compiler Implementation in ML $ — There is a Java $ and C $ version as well - widely considered a very good book
- Object-Oriented Compiler Construction $
- Parsing Techniques - A Practical Guide
- Project Oberon ¶ - Look at chapter 13
- Programming a Personal Computer $
- Programing Languages: Application and Interpretation
- Rabbit: A Compiler for Scheme¶
- Reflections on Trusting Trust — A quick guide
- Roll Your Own Compiler for the .NET framework — A quick tutorial from MSDN
- Structure and Interpretation of Computer Programs
- Types and Programming Languages
- Want to Write a Compiler? - a quick guide
- Writing a Compiler in Ruby Bottom Up
- ¶ Link to a PDF file
- $ Link to a printed book
Read more... Read less...
This is a pretty vague question, I think; just because of the depth of the topic involved. A compiler can be decomposed into two separate parts, however; a top-half and a bottom-one. The top-half generally takes the source language and converts it into an intermediate representation, and the bottom half takes care of the platform specific code generation.
Nonetheless, one idea for an easy way to approach this topic (the one we used in my compilers class, at least) is to build the compiler in the two pieces described above. Specifically, you'll get a good idea of the entire process by just building the top-half.
Just doing the top half lets you get the experience of writing the lexical analyzer and the parser and go to generating some "code" (that intermediate representation I mentioned). So it will take your source program and convert it to another representation and do some optimization (if you want), which is the heart of a compiler. The bottom half will then take that intermediate representation and generate the bytes needed to run the program on a specific architecture. For example, the the bottom half will take your intermediate representation and generate a PE executable.
Some books on this topic that I found particularly helpful was Compilers Principles and Techniques (or the Dragon Book, due to the cute dragon on the cover). It's got some great theory and definitely covers Context-Free Grammars in a really accessible manner. Also, for building the lexical analyzer and parser, you'll probably use the *nix tools lex and yacc. And uninterestingly enough, the book called "lex and yacc" picked up where the Dragon Book left off for this part.
I think Modern Compiler Implementation in ML is the best introductory compiler writing text. There's a Java version and a C version too, either of which might be more accessible given your languages background. The book packs a lot of useful basic material (scanning and parsing, semantic analysis, activation records, instruction selection, RISC and x86 native code generation) and various "advanced" topics (compiling OO and functional languages, polymorphism, garbage collection, optimization and single static assignment form) into relatively little space (~500 pages).
I prefer Modern Compiler Implementation to the Dragon book because Modern Compiler implementation surveys less of the field--instead it has really solid coverage of all the topics you would need to write a serious, decent compiler. After you work through this book you'll be ready to tackle research papers directly for more depth if you need it.
I must confess I have a serious soft spot for Niklaus Wirth's Compiler Construction. It is available online as a PDF. I find Wirth's programming aesthetic simply beautiful, however some people find his style too minimal (for example Wirth favors recursive descent parsers, but most CS courses focus on parser generator tools; Wirth's language designs are fairly conservative.) Compiler Construction is a very succinct distillation of Wirth's basic ideas, so whether you like his style or not or not, I highly recommend reading this book.
I concur with the Dragon Book reference; IMO, it is the definitive guide to compiler construction. Get ready for some hardcore theory, though.
If you want a book that is lighter on theory, Game Scripting Mastery might be a better book for you. If you are a total newbie at compiler theory, it provides a gentler introduction. It doesn't cover more practical parsing methods (opting for non-predictive recursive descent without discussing LL or LR parsing), and as I recall, it doesn't even discuss any sort of optimization theory. Plus, instead of compiling to machine code, it compiles to a bytecode that is supposed to run on a VM that you also write.
It's still a decent read, particularly if you can pick it up for cheap on Amazon. If you only want an easy introduction into compilers, Game Scripting Mastery is not a bad way to go. If you want to go hardcore up front, then you should settle for nothing less than the Dragon Book.
"Let's Build a Compiler" is awesome, but it's a bit outdated. (I'm not saying it makes it even a little bit less valid.)
Or check out SLANG. This is similar to "Let's Build a Compiler" but is a much better resource especially for beginners. This comes with a pdf tutorial which takes a 7 step approach at teaching you a compiler. Adding the quora link as it have the links to all the various ports of SLANG, in C++, Java and JS, also interpreters in python and java, originally written using C# and the .NET platform.
If you're looking to use powerful, higher level tools rather than building everything yourself, going through the projects and readings for this course is a pretty good option. It's a languages course by the author of the Java parser engine ANTLR. You can get the book for the course as a PDF from the Pragmatic Programmers.
The course goes over the standard compiler compiler stuff that you'd see elsewhere: parsing, types and type checking, polymorphism, symbol tables, and code generation. Pretty much the only thing that isn't covered is optimizations. The final project is a program that compiles a subset of C. Because you use tools like ANTLR and LLVM, it's feasible to write the entire compiler in a single day (I have an existence proof of this, though I do mean ~24 hours). It's heavy on practical engineering using modern tools, a bit lighter on theory.
LLVM, by the way, is simply fantastic. Many situations where you might normally compile down to assembly, you'd be much better off compiling to LLVM's Intermediate Representation instead. It's higher level, cross platform, and LLVM is quite good at generating optimized assembly from it.