This is the second post in preparation for the final exam for COMP 425 at Rice University. This post will be focusing on Very Long Instruction Word (VLIW).
What is VLIW?
One instruction with several operations each encoded with RISC style instruction formats.
- Multiple operations packed into one instruction
- Classically, each operation slot is for a fixed function
- Constant operation latencies are specified
- Architecture requires guarantee of:
- Parallelism within an instruction => no dependency checks
- Nod data use before data ready => no data interlocks
Superscalar versus VLIW
- In out-of-order execution, superscalar, scheduling is done in hardware
- VLIW: scheduling is done by the compiler.
- Schedules to maximize parallel execution
- Guarantees intra-instruction parallelism
- Schedules to avoid data hazards.
Compiler Support for Generating Instruction Level Parallelism
- Trace Scheduling (compiler optimization -> hardware)
- Find as much ILP (Instruction Level Parallelism) as possible
- Key steps
- Pick string of basic blocks, a trace, that represents most frequent branch path, using profiling feedback or compiler heuristics. (Note: a basic block is a a portion of the code within a program with only one entry point and only one exit point)
- Schedule whole “trace” at once
- add mixup code (compensation code0 to cope with branches jumping out of trace)
- Loop Unrolling (compiler optimization -> hardware)
- Good: increase the window of optimization
- Bad: the code size can grow considerably with loop unrolling
- VLIW Support for Compiler (architecture -> compiler)
- A VLIW can allow for a high degree of separation of confers between (1) Instruction Selection (2) Instruction Scheduling (3) Register Allocation [there is a large number of general purpose registers)
VLIW: Dependency analysis
The compiler’s job is to work with the processor’s instructions, dependencies, and latencies (blocks) to implement a program (original shape). VLIW enable flexibility of packing together instructions together.
C6x Instruction Format
In order to reduce no op in a VLIW, we can pack together instructions for multiple cycles in one instruction with the help of a p bit. The p bit determines what groups of instructions are executed in parallel. The instruction is scanned from left to right, if p bit of instruction i is 1, then instruction i+1 is executed in parallel with instruction i.