Here are some of my notes reading up on compiler textbooks, specifically on intermediate representation.
References, slides from Comp 412, Rice University, Engineering a Compiler, 2nd edition.
Question: Where is the IR from? Who generates the IR in the first place?
Answer: Front End (Scanner & parser work from the syntax of the source code to emit IR).
Question: Who uses the IR?
Answer: The rest of the compiler, including Optimizer, Back End (Code generation) work with IR.
To summarize: (slide 3 from Comp 412, Fall 2016)
IR is the vehicle that carries information between phases.
Front end: produces an intermediate representation (IR)
Optimizer: transforms the IR into an equivalent IR that runs faster
Back end: transforms the IR into native code
IR determines the compiler’s ambition & its chances for success
The compiler’s knowledge of the code is encoded in the IR.
The compiler can only manipulate what is represented by the IR.
Three major categories of IR
- Structural IRs
- abstract syntac tree
- directed acyclic graph (good for expressing redundancy)
- control flow graph (nodes in the graph are basic blocks, edges are control flow)
- Linear IRs
- 3 address code
- Hybrid IRs
Often times, you can’t do all the levels of optimization with the same IR. As a result, you need to lower the IR,
Front End -> IR1 -> Optimizer1 -> IR2 -> Optimizer2 -> IR3 -> BackEnd -> Target Code
For example, LLVM is a form of IR. Quote the documentation page
“a powerful intermediate representation for efficient compiler transformations and analysis, while providing a natural means to debug and visualize the transformations.
The LLVM representation aims to be light-weight and low-level while being expressive, typed, and extensible at the same time. It aims to be a “universal IR” of sorts, by being at a low enough level that high-level ideas may be cleanly mapped to it. ”