5 Lexical conventions [lex]

In an integer-literal, the sequence of binary-digits, octal-digits, digits, or hexadecimal-digits is interpreted as a base N integer as shown in Table 7; the lexically first digit of the sequence of digits is the most significant.

[Note 1:

The prefix and any optional separating single quotes are ignored when determining the value.

— end note]

Table 7 — Base of integer-literals [tab:lex.icon.base]

🔗 Kind of integer-literal	base N
🔗 binary-literal	2
🔗 octal-literal	8
🔗 decimal-literal	10
🔗 hexadecimal-literal	16

The hexadecimal-digits a through f and A through F have decimal values ten through fifteen.

[Example 1:

The number twelve can be written 12, 014, 0XC, or 0b1100.

The integer-literals 1048576, 1'048'576, 0X100000, 0x10'0000, and 0'004'000'000 all have the same value.

— end example]

The type of an integer-literal is the first type in the list in Table 8 corresponding to its optional integer-suffix in which its value can be represented.

Table 8 — Types of integer-literals [tab:lex.icon.type]

🔗 *integer-suffix*	*decimal-literal*	**integer-literal other than decimal-literal**
🔗 none	int	int
🔗	long int	unsigned int
🔗	long long int	long int
🔗		unsigned long int
🔗		long long int
🔗		unsigned long long int
🔗 u or U	unsigned int	unsigned int
🔗	unsigned long int	unsigned long int
🔗	unsigned long long int	unsigned long long int
🔗 l or L	long int	long int
🔗	long long int	unsigned long int
🔗		long long int
🔗		unsigned long long int
🔗 Both u or U	unsigned long int	unsigned long int
🔗 and l or L	unsigned long long int	unsigned long long int
🔗 ll or LL	long long int	long long int
🔗		unsigned long long int
🔗 Both u or U	unsigned long long int	unsigned long long int
🔗 and ll or LL
🔗 z or Z	the signed integer type corresponding	the signed integer type
🔗	to std::size_t ([support.types.layout])	corresponding to std::size_t
🔗		std::size_t
🔗 Both u or U	std::size_t	std::size_t
🔗 and z or Z

Except for integer-literals containing a size-suffix, if the value of an integer-literal cannot be represented by any type in its list and an extended integer type ([basic.fundamental]) can represent its value, it may have that extended integer type.

If all of the types in the list for the integer-literal are signed, the extended integer type is signed.

If all of the types in the list for the integer-literal are unsigned, the extended integer type is unsigned.

If the list contains both signed and unsigned types, the extended integer type may be signed or unsigned.

If an integer-literal cannot be represented by any of the allowed types, the program is ill-formed.

[Note 2:

An integer-literal with a z or Z suffix is ill-formed if it cannot be represented by std::size_t.

— end note]

5.13.3 Character literals [lex.ccon]

character-literal:
encoding-prefix

_{o p t}

' c-char-sequence '

encoding-prefix: one of
u8 u U L

c-char-sequence:
c-char c-char-sequence

_{o p t}

c-char:
basic-c-char
escape-sequence
universal-character-name

basic-c-char:
any member of the translation character set except the U+0027 apostrophe,
U+005c reverse solidus, or new-line character

escape-sequence:
simple-escape-sequence
numeric-escape-sequence
conditional-escape-sequence

simple-escape-sequence:
\ simple-escape-sequence-char

simple-escape-sequence-char: one of
' " ? \ a b f n r t v

numeric-escape-sequence:
octal-escape-sequence
hexadecimal-escape-sequence

simple-octal-digit-sequence:
octal-digit simple-octal-digit-sequence

_{o p t}

octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
\o{ simple-octal-digit-sequence }

hexadecimal-escape-sequence:
\x simple-hexadecimal-digit-sequence
\x{ simple-hexadecimal-digit-sequence }

conditional-escape-sequence:
\ conditional-escape-sequence-char

conditional-escape-sequence-char:
any member of the basic character set that is not an octal-digit, a simple-escape-sequence-char, or the characters N, o, u, U, or x

A multicharacter literal is a character-literal whose c-char-sequence consists of more than one c-char.

A multicharacter literal shall not have an encoding-prefix.

If a multicharacter literal contains a c-char that is not encodable as a single code unit in the ordinary literal encoding, the program is ill-formed.

Multicharacter literals are conditionally-supported.

The kind of a character-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding-prefix and its c-char-sequence as defined by Table 9 .

Table 9 — Character literals [tab:lex.ccon.literal]

🔗 Encoding	Kind	Type	Associated char-	Example
🔗 prefix			acter encoding
🔗 none	ordinary character literal	char	ordinary literal	'v'
🔗	multicharacter literal	int	encoding	'abcd'
🔗 L	wide character literal	wchar_t	wide literal	L'w'
🔗			encoding
🔗 u8	UTF-8 character literal	char8_t	UTF-8	u8'x'
🔗 u	UTF-16 character literal	char16_t	UTF-16	u'y'
🔗 U	UTF-32 character literal	char32_t	UTF-32	U'z'

In translation phase 4, the value of a character-literal is determined using the range of representable values of the character-literal's type in translation phase 7.

A multicharacter literal has an implementation-defined value.

The value of any other kind of character-literal is determined as follows:

(3.1)
A character-literal with a c-char-sequence consisting of a single basic-c-char, simple-escape-sequence, or universal-character-name is the code unit value of the specified character as encoded in the literal's associated character encoding.

If the specified character lacks representation in the literal's associated character encoding or if it cannot be encoded as a single code unit, then the program is ill-formed.
(3.2)
A character-literal with a c-char-sequence consisting of a single numeric-escape-sequence has a value as follows:
- (3.2.1)
  Let v be the integer value represented by the octal number comprising the sequence of octal-digits in an octal-escape-sequence or by the hexadecimal number comprising the sequence of hexadecimal-digits in a hexadecimal-escape-sequence.
- (3.2.2)
  If v does not exceed the range of representable values of the character-literal's type, then the value is v.
- (3.2.3)
  Otherwise, if the character-literal's encoding-prefix is absent or L, and v does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the character-literal's type, then the value is the unique value of the character-literal's type T that is congruent to v modulo $2^{N}$ , where N is the width of T.
- (3.2.4)
  Otherwise, the program is ill-formed.
(3.3)
A character-literal with a c-char-sequence consisting of a single conditional-escape-sequence is conditionally-supported and has an implementation-defined value.

The character specified by a simple-escape-sequence is specified in Table 10 .

[Note 1:

Using an escape sequence for a question mark is supported for compatibility with C++ 2014 and C.

— end note]

Table 10 — Simple escape sequences [tab:lex.ccon.esc]

🔗 character		*simple-escape-sequence*
🔗 U+000a	line feed	\n
🔗 U+0009	character tabulation	\t
🔗 U+000b	line tabulation	\v
🔗 U+0008	backspace	\b
🔗 U+000d	carriage return	\r
🔗 U+000c	form feed	\f
🔗 U+0007	alert	\a
🔗 U+005c	reverse solidus	\\
🔗 U+003f	question mark	\?
🔗 U+0027	apostrophe	\''
🔗 U+0022	quotation mark	\"

5.13.4 Floating-point literals [lex.fcon]

floating-point-literal:
decimal-floating-point-literal
hexadecimal-floating-point-literal

decimal-floating-point-literal:
fractional-constant exponent-part

_{o p t}

floating-point-suffix

_{o p t}

digit-sequence exponent-part floating-point-suffix

_{o p t}

hexadecimal-floating-point-literal:
hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-point-suffix

_{o p t}

hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-point-suffix

_{o p t}

fractional-constant:
digit-sequence

_{o p t}

. digit-sequence
digit-sequence .

hexadecimal-fractional-constant:
hexadecimal-digit-sequence

_{o p t}

. hexadecimal-digit-sequence
hexadecimal-digit-sequence .

exponent-part:
e sign

_{o p t}

digit-sequence
E sign

_{o p t}

digit-sequence

binary-exponent-part:
p sign

_{o p t}

digit-sequence
P sign

_{o p t}

digit-sequence

sign: one of
+ -

digit-sequence:
digit
digit-sequence '

_{o p t}

digit

floating-point-suffix: one of
f l f16 f32 f64 f128 bf16 F L F16 F32 F64 F128 BF16

The type of a floating-point-literal ([basic.fundamental], [basic.extended.fp]) is determined by its floating-point-suffix as specified in Table 11 .

[Note 1:

The floating-point suffixes f16, f32, f64, f128, bf16, F16, F32, F64, F128, and BF16 are conditionally-supported.

See [basic.extended.fp].

— end note]

Table 11 — Types of floating-point-literals [tab:lex.fcon.type]

🔗 *floating-point-suffix*	type
🔗 none	double
🔗 f or F	float
🔗 l or L	long double
🔗 f16 or F16	std::float16_t
🔗 f32 or F32	std::float32_t
🔗 f64 or F64	std::float64_t
🔗 f128 or F128	std::float128_t
🔗 bf16 or BF16	std::bfloat16_t

The significand of a floating-point-literal is the fractional-constant or digit-sequence of a decimal-floating-point-literal or the hexadecimal-fractional-constant or hexadecimal-digit-sequence of a hexadecimal-floating-point-literal.

In the significand, the sequence of digits or hexadecimal-digits and optional period are interpreted as a base N real number s, where N is 10 for a decimal-floating-point-literal and 16 for a hexadecimal-floating-point-literal.

[Note 2:

Any optional separating single quotes are ignored when determining the value.

— end note]

If an exponent-part or binary-exponent-part is present, the exponent e of the floating-point-literal is the result of interpreting the sequence of an optional sign and the digits as a base 10 integer.

Otherwise, the exponent e is 0.

The scaled value of the literal is

s \times 10^{e}

for a decimal-floating-point-literal and

s \times 2^{e}

for a hexadecimal-floating-point-literal.

[Example 1:

The floating-point-literals 49.625 and 0xC.68p+2 have the same value.

The floating-point-literals 1.602'176'565e-19 and 1.602176565e-19 have the same value.

— end example]

If the scaled value is not in the range of representable values for its type, the program is ill-formed.

Otherwise, the value of a floating-point-literal is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.

[Example 2:

The following example assumes that std::float32_t is supported ([basic.extended.fp]).

std::float32_t x = 0.0f32; // value 0 is exactly representable std::float32_t y = 0.1f32; // rounded to one of two values nearest to 0.1 std::float32_t z = 1e1000000000f32; // either greatest finite value or positive infinity — end example]

5.13.5 String literals [lex.string]

string-literal:
encoding-prefix

_{o p t}

plain-string-literal
encoding-prefix

_{o p t}

R raw-string

plain-string-literal:
" s-char-sequence

_{o p t}

s-char-sequence:
s-char s-char-sequence

_{o p t}

s-char:
basic-s-char
escape-sequence
universal-character-name

basic-s-char:
any member of the translation character set except the U+0022 quotation mark,
U+005c reverse solidus, or new-line character

raw-string:
" d-char-sequence

_{o p t}

( r-char-sequence

_{o p t}

) d-char-sequence

_{o p t}

r-char-sequence:
r-char r-char-sequence

_{o p t}

r-char:
any member of the translation character set, except a U+0029 right parenthesis followed by
the initial d-char-sequence (which may be empty) followed by a U+0022 quotation mark

d-char-sequence:
d-char d-char-sequence

_{o p t}

d-char:
any member of the basic character set except:
U+0020 space, U+0028 left parenthesis, U+0029 right parenthesis, U+005c reverse solidus,
U+0009 character tabulation, U+000b line tabulation, U+000c form feed, and new-line

The kind of a string-literal, its type, and its associated character encoding ([lex.charset]) are determined by its encoding prefix and sequence of s-chars or r-chars as defined by Table 12 where n is the number of encoded code units that would result from an evaluation of the string-literal (see below).

Table 12 — String literals [tab:lex.string.literal]

🔗 Enco-	Kind	Type	Associated	Examples
🔗 ding			character
🔗 prefix			encoding
🔗 none	ordinary string literal	array of n const char	ordinary literal encoding	"ordinary string" R"(ordinary raw string)"
🔗 L	wide string literal	array of n const wchar_t	wide literal encoding	L"wide string" LR"w(wide raw string)w"
🔗 u8	UTF-8 string literal	array of n const char8_t	UTF-8	u8"UTF-8 string" u8R"x(UTF-8 raw string)x"
🔗 u	UTF-16 string literal	array of n const char16_t	UTF-16	u"UTF-16 string" uR"y(UTF-16 raw string)y"
🔗 U	UTF-32 string literal	array of n const char32_t	UTF-32	U"UTF-32 string" UR"z(UTF-32 raw string)z"

A string-literal that has an R in the prefix is a raw string literal.

The d-char-sequence serves as a delimiter.

The terminating d-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence.

A d-char-sequence shall consist of at most 16 characters.

[Note 1:

The characters '(' and ')' can appear in a raw-string.

Thus, R"delimiter((a|b))delimiter" is equivalent to "(a|b)".

— end note]

[Note 2:

A source-file new-line in a raw string literal results in a new-line in the resulting execution string literal.

Assuming no whitespace at the beginning of lines in the following example, the assert will succeed: const char* p = R"(a\ b c)"; assert(std::strcmp(p, "a\\\nb\nc") == 0);

— end note]

[Example 1:

The raw string R"a( )\ a" )a" is equivalent to "\n)\\\na\"\n".

The raw string R"(x = "\"y\"")" is equivalent to "x = \"\\\"y\\\"\"".

— end example]

Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals.

The string-literals in any sequence of adjacent string-literals shall have at most one unique encoding-prefix among them.

The common encoding-prefix of the sequence is that encoding-prefix, if any.

[Note 3:

A string-literal's rawness has no effect on the determination of the common encoding-prefix.

— end note]

In translation phase 5 ([lex.phases]), adjacent string-literals are concatenated.

The lexical structure and grouping of the contents of the individual string-literals is retained.

[Example 2:

"\xA" "B" represents the code unit '\xA' and the character 'B' after concatenation (and not the single code unit '\xAB').

Similarly, R"(\u00)" "41" represents six characters, starting with a backslash and ending with the digit 1 (and not the single character 'A' specified by a universal-character-name).

Table 13 has some examples of valid concatenations.

— end example]

Table 13 — String literal concatenations [tab:lex.string.concat]

🔗 Source		Means	Source		Means	Source		Means
🔗 u"a"	u"b"	u"ab"	U"a"	U"b"	U"ab"	L"a"	L"b"	L"ab"
🔗 u"a"	"b"	u"ab"	U"a"	"b"	U"ab"	L"a"	"b"	L"ab"
🔗 "a"	u"b"	u"ab"	"a"	U"b"	U"ab"	"a"	L"b"	L"ab"

Evaluating a string-literal results in a string literal object with static storage duration ([basic.stc]).

[Note 4:

String literal objects are potentially non-unique ([intro.object]).

Whether successive evaluations of a string-literal yield the same or a different object is unspecified.

— end note]

[Note 5:

The effect of attempting to modify a string literal object is undefined.

— end note]

String literal objects are initialized with the sequence of code unit values corresponding to the string-literal's sequence of s-chars (originally from non-raw string literals) and r-chars (originally from raw string literals), plus a terminating U+0000 null character, in order as follows:

(10.1)
The sequence of characters denoted by each contiguous sequence of basic-s-chars, r-chars, simple-escape-sequences ([lex.ccon]), and universal-character-names ([lex.charset]) is encoded to a code unit sequence using the string-literal's associated character encoding.

If a character lacks representation in the associated character encoding, then the program is ill-formed.

[Note 6:
No character lacks representation in any Unicode encoding form.
— end note]

When encoding a stateful character encoding, implementations should encode the first such sequence beginning with the initial encoding state and encode subsequent sequences beginning with the final encoding state of the prior sequence.

[Note 7:
The encoded code unit sequence can differ from the sequence of code units that would be obtained by encoding each character independently.
— end note]
(10.2)
Each numeric-escape-sequence ([lex.ccon]) contributes a single code unit with a value as follows:
- (10.2.1)
  Let v be the integer value represented by the octal number comprising the sequence of octal-digits in an octal-escape-sequence or by the hexadecimal number comprising the sequence of hexadecimal-digits in a hexadecimal-escape-sequence.
- (10.2.2)
  If v does not exceed the range of representable values of the string-literal's array element type, then the value is v.
- (10.2.3)
  Otherwise, if the string-literal's encoding-prefix is absent or L, and v does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the string-literal's array element type, then the value is the unique value of the string-literal's array element type T that is congruent to v modulo $2^{N}$ , where N is the width of T.
- (10.2.4)
  Otherwise, the program is ill-formed.
When encoding a stateful character encoding, these sequences should have no effect on encoding state.
(10.3)
Each conditional-escape-sequence ([lex.ccon]) contributes an implementation-defined code unit sequence.

When encoding a stateful character encoding, it is implementation-defined what effect these sequences have on encoding state.

5.13.6 Unevaluated strings [lex.string.uneval]

unevaluated-string:
string-literal

An unevaluated-string shall have no encoding-prefix.

Each universal-character-name and each simple-escape-sequence in an unevaluated-string is replaced by the member of the translation character set it denotes.

An unevaluated-string that contains a numeric-escape-sequence or a conditional-escape-sequence is ill-formed.

An unevaluated-string is never evaluated and its interpretation depends on the context in which it appears.

5.13.7 Boolean literals [lex.bool]

boolean-literal:
false
true

The Boolean literals are the keywords false and true.

Such literals have type bool.

5.13.8 Pointer literals [lex.nullptr]

pointer-literal:
nullptr

The pointer literal is the keyword nullptr.

It has type std::nullptr_t.

[Note 1:

std::nullptr_t is a distinct type that is neither a pointer type nor a pointer-to-member type; rather, a prvalue of this type is a null pointer constant and can be converted to a null pointer value or null member pointer value.

See [conv.ptr] and [conv.mem].

— end note]

5.13.9 User-defined literals [lex.ext]

user-defined-literal:
user-defined-integer-literal
user-defined-floating-point-literal
user-defined-string-literal
user-defined-character-literal

user-defined-integer-literal:
decimal-literal ud-suffix
octal-literal ud-suffix
hexadecimal-literal ud-suffix
binary-literal ud-suffix

user-defined-floating-point-literal:
fractional-constant exponent-part

_{o p t}

ud-suffix
digit-sequence exponent-part ud-suffix
hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix
hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix

user-defined-string-literal:
string-literal ud-suffix

user-defined-character-literal:
character-literal ud-suffix

ud-suffix:
identifier

If a token matches both user-defined-literal and another literal kind, it is treated as the latter.

[Example 1:

123_km is a user-defined-literal, but 12LL is an integer-literal.

— end example]

The syntactic non-terminal preceding the ud-suffix in a user-defined-literal is taken to be the longest sequence of characters that could match that non-terminal.

A user-defined-literal is treated as a call to a literal operator or literal operator template ([over.literal]).

To determine the form of this call for a given user-defined-literal L with ud-suffix X, first let S be the set of declarations found by unqualified lookup for the literal-operator-id whose literal suffix identifier is X ([basic.lookup.unqual]).

S shall not be empty.

If L is a user-defined-integer-literal, let n be the literal without its ud-suffix.

If S contains a literal operator with parameter type unsigned long long, the literal L is treated as a call of the form operator ""X(nULL)

Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.

If S contains a raw literal operator, the literal L is treated as a call of the form operator ""X("n")

Otherwise (S contains a numeric literal operator template), L is treated as a call of the form operator ""X<'

c_{1}

', '

c_{2}

', ... '

c_{k}

'>() where n is the source character sequence

c_{1} c_{2} . . . c_{k}

[Note 1:

The sequence

c_{1} c_{2} . . . c_{k}

can only contain characters from the basic character set.

— end note]

If L is a user-defined-floating-point-literal, let f be the literal without its ud-suffix.

If S contains a literal operator with parameter type long double, the literal L is treated as a call of the form operator ""X(fL)

Otherwise, S shall contain a raw literal operator or a numeric literal operator template ([over.literal]) but not both.

If S contains a raw literal operator, the literal L is treated as a call of the form operator ""X("f")

Otherwise (S contains a numeric literal operator template), L is treated as a call of the form operator ""X<'

c_{1}

', '

c_{2}

', ... '

c_{k}

'>() where f is the source character sequence

c_{1} c_{2} . . . c_{k}

[Note 2:

The sequence

c_{1} c_{2} . . . c_{k}

can only contain characters from the basic character set.

— end note]

If L is a user-defined-string-literal, let str be the literal without its ud-suffix and let len be the number of code units in str (i.e., its length excluding the terminating null character).

If S contains a literal operator template with a constant template parameter for which str is a well-formed template-argument, the literal L is treated as a call of the form operator ""X<str>()

Otherwise, the literal L is treated as a call of the form operator ""X(str, len)

If L is a user-defined-character-literal, let ch be the literal without its ud-suffix.

S shall contain a literal operator whose only parameter has the type of ch and the literal L is treated as a call of the form operator ""X(ch)

[Example 2: long double operator ""_w(long double); std::string operator ""_w(const char16_t*, std::size_t); unsigned operator ""_w(const char*); int main() { 1.2_w; // calls operator ""_w(1.2L) u"one"_w; // calls operator ""_w(u"one", 3) 12_w; // calls operator ""_w("12") "two"_w; // error: no applicable literal operator } — end example]

In translation phase 5 ([lex.phases]), adjacent string-literals are concatenated and user-defined-string-literals are considered string-literals for that purpose.

During concatenation, ud-suffixes are removed and ignored and the concatenation process occurs as described in [lex.string].

At the end of phase 5, if a string-literal is the result of a concatenation involving at least one user-defined-string-literal, all the participating user-defined-string-literals shall have the same ud-suffix and that suffix is applied to the result of the concatenation.

[Example 3: int main() { L"A" "B" "C"_x; // OK, same as L"ABC"_x "P"_x "Q" "R"_y; // error: two different ud-suffixes } — end example]