Phases of translation

From cppreference.com
 
 
C++ language
General topics
Flow control
Conditional execution statements
Iteration statements
Jump statements
Functions
function declaration
lambda function declaration
function template
inline specifier
exception specifications (deprecated)
noexcept specifier (C++11)
Exceptions
Namespaces
Types
decltype specifier (C++11)
Specifiers
cv specifiers
storage duration specifiers
constexpr specifier (C++11)
auto specifier (C++11)
alignas specifier (C++11)
Initialization
Literals
Expressions
alternative representations
Utilities
Types
typedef declaration
type alias declaration (C++11)
attributes (C++11)
Casts
implicit conversions
const_cast conversion
static_cast conversion
dynamic_cast conversion
reinterpret_cast conversion
C-style and functional cast
Memory allocation
Classes
Class-specific function properties
Special member functions
Templates
class template
function template
template specialization
parameter packs (C++11)
Miscellaneous
Inline assembly
 

The C++ source file is processed by the compiler as if the following phases take place, in this exact order:

Contents

[edit] Phase 1

1) The individual bytes of the source code file are mapped (in implementation defined manner) to the characters of the basic source character set. In particular, OS-dependent end-of-line indicators are replaced by newline characters.
The basic source character set consists of 96 characters:
a) 5 whitespace characters (space, horizontal tab, vertical tab, form feed, new-line)
b) 10 digit characters from '0' to '9'
c) 52 letters from 'a' to 'z' and from 'A' to 'Z'
d) 29 punctuation characters: _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ’
2) Trigraph sequences are replaced by corresponding single-character representations.
3) Any source file character that cannot be mapped to a character in basic source character set, is replaced by its universal character name (\uXXX) or by some internal form that is handled equivalently.

[edit] Phase 2

1) Whenever backslash appears at the end of a line (immediately followed by the newline character), both backslash and newline are deleted, combining two physical source lines into one logical source line. This is a single-pass operation, a line ending in two backslashes followed by an empty line) does not combine three lines into one). If a universal character name (\uXXX) is formed on this phase, the behavior is undefined.
2) If a non-empty source file does not end with a newline character after this step (whether it had no newline originally, or it ended with a backslash)
  • the behavior is undefined (until C++11)
  • a terminating newline character is added (since C++11)

[edit] Phase 3

1) The source file is decomposed into comments, sequences of whitespace characters (space, horizontal tab, new-line, vertical tab, and form-feed), and preprocessing tokens, which are the following
a) header names: <iostream> or "myfile.h"
c) numbers
d) character and string literals, including user-defined
e) operators and punctuators (including alternative tokens), such as +, <<=, new, <%, ##, or and.
f) individual non-whitespace characters that do not fit in any other category
2) Each comment is replaced by one space character
3) Newlines are kept, and it's implementation-defined whether non-newline whitespace sequences may be collapsed into single space characters.

[edit] Phase 4

1) Preprocessor is executed.
2) Each file introduced with the #include directive goes through phases 1 through 4, recursively.
3) At the end of this phase, all preprocessor directives are removed from the source.

[edit] Phase 5

1) All characters in character literals and string literals are converted from source character set to execution character set.
2) Escape sequences and universal character names in character literals and non-raw string literals are expanded and converted to execution character set. If the character specified by universal character name isn't a member of the execution character set, the result is implementation-defined, but is guaranteed to not be a null (wide) character.

[edit] Phase 6

Adjacent string literals are concatenated.

[edit] Phase 7

Compilation takes place: the tokens are syntactically and semantically analyzed and translated as a translation unit.

[edit] Phase 8

Each translation unit is examined to produce a list of required template instantiations, including the ones requested by explicit instantiations). The definitions of the templates are located, the required instantiations are performed to produce instantiation units.

[edit] Phase 9

Translation units, instantiation units, and library components needed to satisfy external references are collected into a program image which contains information needed for execution in its execution environment.