Every executable on a Linux system begins its life as human-readable source code. But the CPU does not understand C, Rust, or Go. It speaks only one language: machine code. The journey from source to a running program is a multi-stage transformation, each step converting your intent into something closer to what the hardware can execute.
The Four Stages
When you type gcc hello.c -o hello, the compiler driver orchestrates four distinct stages. Each stage reads a specific input format and produces a specific output format. Understanding this pipeline is essential to understanding how ELF binaries are structured.
Stage 1: Preprocessing
The preprocessor (cpp) is a text-processing engine that runs before any actual compilation. It handles #include directives by literally pasting the contents of header files into your source. It expands #define macros, resolves #ifdef conditionals, and strips comments. A 20-line source file can easily expand to thousands of lines after preprocessing, because standard headers like stdio.h bring along an enormous amount of type definitions and declarations.
You can see the preprocessor output yourself with gcc -E hello.c -o hello.i. The resulting .i file is still valid C, but with every macro resolved and every header inlined.
Stage 2: Compilation
The compiler proper (cc1) takes the preprocessed C and transforms it into assembly language for your target architecture. This is where the heavy lifting happens: parsing, semantic analysis, optimization passes, and code generation. The compiler converts high-level constructs like loops, function calls, and pointer arithmetic into sequences of machine instructions expressed in human-readable assembly syntax.
At this point, external function calls (like printf) remain as symbolic references. The compiler does not know where printf lives in memory; it just emits a call printf instruction and trusts that someone else will fill in the address later.
Stage 3: Assembly
The assembler (as) translates assembly mnemonics into actual machine code bytes. The instruction mov $0x1, %eax becomes the byte sequence b8 01 00 00 00. This is the stage that produces real machine code, the binary instructions that the CPU will ultimately execute.
The output is an ELF object file (.o). It already has ELF structure with sections like .text and .data, but contains relocation entries where addresses need to be patched. The object file is not yet executable because external references are still unresolved.
Stage 4: Linking
The linker (ld) is the final and most complex stage. It takes one or more object files, resolves all symbolic cross-references, and combines everything into a single executable. The linker merges .text sections from multiple objects into one, resolves function calls between translation units, and assigns final virtual addresses.
For dynamically linked executables, the linker also generates the PLT (Procedure Linkage Table) and GOT (Global Offset Table) stubs, creates the .interp section that names the dynamic linker, and builds the .dynamic section listing shared library dependencies. It also creates program headers that tell the OS kernel how to map the file into memory.
.o object file), it is the linker that constructs the full ELF executable with all its segments, dynamic linking infrastructure, and program headers. Every structural detail we explore in this course, from the ELF header to program headers to the PLT/GOT machinery, is the linker's handiwork.Interactive Pipeline
Click through each stage below to explore what happens at every step of the compilation process. Pay attention to how the file changes format and size at each stage, and note the sections that the linker adds in the final step.
Compilation Pipeline
Source
Human-readable C/C++ source code with preprocessor directives
editorPreprocessed
Macros expanded, headers included, conditionals resolved
cpp (preprocessor)Assembly
Architecture-specific assembly instructions, human readable
cc1 (compiler)Object
Machine code with relocations, not yet linked
as (assembler)Executable
Fully linked ELF binary with all segments and runtime support
ld (linker)Source Code
- •Written by the programmer in C, C++, or other languages
- •Contains #include, #define, and other preprocessor directives
- •May reference external library functions like printf
Which stage of the compilation pipeline produces machine code?