C++ Basics#
C Linkage and Name Mangling#
- Source:
In C++, the compiler applies name mangling to function symbols to support
function overloading. Name mangling encodes function signatures (parameter types,
namespaces, and template arguments) into the symbol name, allowing multiple
functions with the same name but different signatures to coexist. When interfacing
with C libraries or exposing functions to C code, extern "C" disables name
mangling to ensure binary compatibility across different compilers and languages.
With extern “C” (C linkage):
Using nm to inspect the symbol table, we can observe that the fib
function retains its original name without mangling. The symbol _fib
follows C naming conventions, making it accessible from C code or other
languages that use C-compatible calling conventions:
#include <iostream>
#ifdef __cplusplus
extern "C" {
#endif
int fib(int n) {
int a = 0, b = 1;
for (int i = 0; i < n; ++i) {
int tmp = b;
b = a + b;
a = tmp;
}
return a;
}
#ifdef __cplusplus
}
#endif
int main(int argc, char *argv[]) {
std::cout << fib(10) << "\n"; // Output: 55
}
$ g++ -std=c++17 -Wall -Werror -O3 a.cc
$ nm -g a.out | grep fib
0000000100003a58 T _fib # C-style symbol (no mangling)
Without extern “C” (C++ linkage):
Without the extern "C" wrapper, the compiler mangles the function name
to encode type information for overload resolution. The mangled symbol
__Z3fibi follows the Itanium C++ ABI naming scheme: _Z indicates a
mangled name, 3fib represents the function name with its length prefix,
and i denotes an int parameter. This encoding enables the linker to
distinguish between overloaded functions:
#include <iostream>
int fib(int n) {
int a = 0, b = 1;
for (int i = 0; i < n; ++i) {
int tmp = b;
b = a + b;
a = tmp;
}
return a;
}
int main(int argc, char *argv[]) {
std::cout << fib(10) << "\n";
}
$ nm -g a.out | grep fib
0000000100003a58 T __Z3fibi # Mangled symbol: _Z3fibi
Platform-specific macros:
On BSD and macOS systems, <sys/cdefs.h> provides __BEGIN_DECLS and
__END_DECLS as portable alternatives to extern "C". These macros
expand to the appropriate linkage specification when compiled as C++ and
to nothing when compiled as C, simplifying header files shared between
both languages:
#include <iostream>
#include <sys/cdefs.h>
__BEGIN_DECLS
int fib(int n) { /* ... */ }
__END_DECLS
Uniform Initialization (Brace Initialization)#
- Source:
Introduced in C++11, uniform initialization (also known as brace initialization)
provides a consistent syntax for initializing objects of any type. This syntax
works for primitives, aggregates, containers, and user-defined types. However,
the compiler prioritizes std::initializer_list constructors when applicable,
which can lead to surprising behavior if not understood properly.
Initializer list takes precedence:
When both a direct constructor and an std::initializer_list constructor
are viable, the compiler strongly prefers the initializer list version. In this
example, Widget{10, 5.0} invokes the initializer list constructor even though
Widget(int, double) appears to be a better match. The values 10 and 5.0
are implicitly converted to long double and passed as a two-element initializer
list. This behavior is mandated by the C++ standard to ensure consistent semantics
for brace initialization:
#include <iostream>
#include <initializer_list>
class Widget {
public:
Widget(int a, double b) { std::cout << "Direct constructor\n"; }
Widget(std::initializer_list<long double> il) { std::cout << "Initializer list constructor\n"; }
};
int main(int argc, char *argv[]) {
Widget w{10, 5.0}; // Output: Initializer list constructor
}
Narrowing conversions are prohibited:
Brace initialization prevents implicit narrowing conversions, providing an
additional layer of type safety compared to parentheses initialization. The
following code produces a compilation error because int and double
cannot implicitly narrow to bool without potential data loss. This
protection helps catch bugs at compile time that might otherwise cause
subtle runtime errors:
#include <initializer_list>
class Widget {
public:
Widget(int a, double b) {}
Widget(std::initializer_list<bool> il) {}
};
int main(int argc, char *argv[]) {
Widget w{10, 5.0}; // Compilation error: narrowing conversion
}
Fallback to direct constructor:
When no valid conversion exists for the initializer list element type,
the compiler falls back to the matching direct constructor. Here, int
and double cannot convert to std::string (there is no implicit
conversion path), so the compiler bypasses the initializer list constructor
entirely and selects the direct constructor instead:
#include <iostream>
#include <initializer_list>
#include <string>
class Widget {
public:
Widget(int a, double b) { std::cout << "Direct constructor\n"; }
Widget(std::initializer_list<std::string> il) { std::cout << "Initializer list constructor\n"; }
};
int main(int argc, char *argv[]) {
Widget w{10, 5.0}; // Output: Direct constructor
}
Pointer Arithmetic and Negative Indices#
- Source:
In C++, the subscript operator [] is defined as pointer arithmetic:
arr[i] is equivalent to *(arr + i). This equivalence is fundamental
to how arrays work in C and C++, and it allows negative indices when the
pointer references an element beyond the array’s beginning. While this
flexibility is powerful, it requires careful bounds management to avoid
undefined behavior.
In this example, ptr points to arr[1], the second element of the array.
Using ptr[-1] computes *(ptr - 1), which accesses arr[0]. This
technique is commonly used in algorithms that need to look backward from a
current position, such as insertion sort or string parsing:
#include <iostream>
int main(int argc, char *argv[]) {
int arr[] = {1, 2, 3};
int *ptr = &arr[1]; // Points to second element
std::cout << ptr[-1] << "\n"; // Output: 1 (arr[0])
std::cout << ptr[0] << "\n"; // Output: 2 (arr[1])
std::cout << ptr[1] << "\n"; // Output: 3 (arr[2])
}
Template Type Deduction#
- Source:
Understanding how template parameters deduce types is essential for writing
generic C++ code. The deduction rules differ based on the parameter declaration,
and mastering these rules is crucial for effective use of templates, auto,
and perfect forwarding. The following three examples demonstrate how the same
arguments produce different deduced types depending on whether the parameter
is an lvalue reference, universal reference, or value.
Lvalue reference parameters (T&):
When the parameter is an lvalue reference, the deduced type T preserves
const-qualifiers from the argument, but the reference itself is not part of T.
This means passing a const int deduces T as const int, and the
parameter type becomes const int&. This behavior ensures that const-correctness
is maintained through template instantiation:
template <typename T>
void f(T& param) noexcept {}
int main(int argc, char *argv[]) {
int x = 123;
const int cx = x;
const int& rx = x;
f(x); // T = int, param: int&
f(cx); // T = const int, param: const int&
f(rx); // T = const int, param: const int&
}
Universal reference parameters (T&&):
Universal references (also called forwarding references) behave differently
for lvalues and rvalues, making them the foundation of perfect forwarding.
When passed an lvalue, T deduces to an lvalue reference type (e.g., int&),
and reference collapsing produces an lvalue reference parameter. When passed
an rvalue, T deduces to a non-reference type, and the parameter becomes
an rvalue reference. This dual behavior allows a single function template to
accept both lvalues and rvalues:
template <typename T>
void f(T&& param) noexcept {}
int main(int argc, char *argv[]) {
int x = 123;
const int cx = x;
const int& rx = x;
f(x); // x is lvalue: T = int&, param: int&
f(cx); // cx is lvalue: T = const int&, param: const int&
f(rx); // rx is lvalue: T = const int&, param: const int&
f(12); // 12 is rvalue: T = int, param: int&&
}
Value parameters (T):
When the parameter is passed by value, the argument is copied, and both
references and top-level const-qualifiers are stripped from the deduced type.
This occurs because the function receives an independent copy that can be
modified without affecting the original. The decay behavior mirrors what
happens when you assign to a new variable of type auto:
template <typename T>
void f(T param) noexcept {}
int main(int argc, char *argv[]) {
int x = 123;
const int cx = x;
const int& rx = x;
f(x); // T = int, param: int (copy)
f(cx); // T = int, param: int (const dropped)
f(rx); // T = int, param: int (reference and const dropped)
f(12); // T = int, param: int
}
Auto Type Deduction#
The auto keyword follows the same deduction rules as template value
parameters, with one notable exception: braced initializers. This example
demonstrates how auto deduces types for different initializers, including
the special behavior of auto&& as a universal reference. Understanding
these rules is essential for writing modern C++ code that leverages type
inference effectively:
int main() {
auto x = 123; // int
const auto cx = x; // const int
const auto& rx = x; // const int&
auto&& urx = x; // int& (x is lvalue)
auto&& urcx = cx; // const int& (cx is lvalue)
auto&& urrx = rx; // const int& (rx is lvalue)
auto&& urrv = 12; // int&& (12 is rvalue)
}
decltype(auto) Type Deduction#
Unlike auto, decltype(auto) preserves the exact type including
references and cv-qualifiers. This distinction is critical when the original
type information must be retained, such as when forwarding return values
from wrapped functions. While auto applies template argument deduction
rules (which strip references), decltype(auto) applies decltype
semantics to the initializer expression.
The first example contrasts decltype(auto) with auto. Note how
auto strips the reference and const from crx, producing a plain int,
while decltype(auto) preserves the full const int& type. Similarly,
decltype(auto) preserves rvalue references, which auto would decay
to a value type:
#include <type_traits>
int main(int argc, char *argv[]) {
int x = 0;
const int cx = x;
const int& crx = x;
int&& z = 0;
// decltype(auto) preserves cv-qualifiers and references
decltype(auto) y1 = crx;
static_assert(std::is_same_v<const int&, decltype(y1)>);
// auto strips cv-qualifiers and references
auto y2 = crx;
static_assert(std::is_same_v<int, decltype(y2)>);
// decltype(auto) preserves rvalue references
decltype(auto) z1 = std::move(z);
static_assert(std::is_same_v<int&&, decltype(z1)>);
}
Application in return type deduction:
This behavior is particularly useful for generic functions that must preserve
the exact return type of an expression. In this example, foo uses auto
return type deduction, which strips the reference and returns by value. In
contrast, bar uses decltype(auto), preserving the const int& return
type. This distinction matters for performance (avoiding copies) and semantics
(maintaining reference semantics):
#include <type_traits>
auto foo(const int& x) {
return x; // Returns int (reference stripped)
}
decltype(auto) bar(const int& x) {
return x; // Returns const int& (reference preserved)
}
int main(int argc, char *argv[]) {
static_assert(std::is_same_v<int, decltype(foo(1))>);
static_assert(std::is_same_v<const int&, decltype(bar(1))>);
}
Reference Collapsing Rules#
When references to references occur during template instantiation or type
aliasing, the compiler applies reference collapsing rules. These rules
are fundamental to understanding how universal references and std::forward
work. In C++, you cannot directly declare a reference to a reference, but
such types can arise indirectly through template instantiation or type aliases:
T& & -> T&
T& && -> T&
T&& & -> T&
T&& && -> T&&
The rule can be summarized as: lvalue reference always wins. Only when
both references are rvalue references does the result remain an rvalue reference.
This mechanism enables perfect forwarding by allowing a single function template
to handle both lvalues and rvalues correctly. When an lvalue is passed to a
universal reference parameter, T deduces to an lvalue reference, and
reference collapsing ensures the parameter type is also an lvalue reference.
Perfect Forwarding with std::forward#
- Source:
Perfect forwarding preserves the value category (lvalue/rvalue) of function
arguments when passing them to other functions. This technique is essential
for writing wrapper functions, factory functions, and generic code that must
not alter the semantics of the forwarded arguments. Perfect forwarding combines
universal references with std::forward and relies on the reference collapsing
rules described above.
Implementation of std::forward:
The standard library’s std::forward uses static_cast combined with
reference collapsing to conditionally cast to an rvalue reference. The first
overload handles lvalue arguments: when T is an lvalue reference type,
T&& collapses to an lvalue reference. The second overload handles rvalue
arguments and includes a static assertion to prevent forwarding an rvalue
as an lvalue:
#include <type_traits>
template <typename T>
T&& forward(std::remove_reference_t<T>& t) noexcept {
return static_cast<T&&>(t);
}
template <typename T>
T&& forward(std::remove_reference_t<T>&& t) noexcept {
static_assert(!std::is_lvalue_reference_v<T>);
return static_cast<T&&>(t);
}
Wrapper function:
This example demonstrates a wrapper function that forwards arguments to
a callable while preserving their value category. When d (an lvalue)
is passed, T deduces to Data&, and std::forward returns an lvalue
reference. When std::move(t) is passed, T deduces to Data, and
std::forward returns an rvalue reference:
#include <iostream>
#include <utility>
template <typename T, typename Func>
void wrapper(T&& arg, Func fn) {
fn(std::forward<T>(arg));
}
struct Data { int x, y, result; };
int main() {
Data d{1, 2, 0};
wrapper(d, [](Data& d) { d.result = d.x + d.y; });
std::cout << d.result << "\n"; // 3
Data t{5, 6, 0};
wrapper(std::move(t), [](Data&& d) { d.result = d.x * d.y; });
std::cout << t.result << "\n"; // 30
}
Decorator pattern:
A timing decorator that wraps any callable and measures execution time while perfectly forwarding all arguments to the wrapped function:
#include <iostream>
#include <utility>
#include <chrono>
template <typename Func, typename ...Args>
auto timed(Func &&f, Args&&... args) {
auto start = std::chrono::system_clock::now();
auto ret = f(std::forward<Args>(args)...);
std::chrono::duration<double> d = std::chrono::system_clock::now() - start;
std::cout << "Time: " << d.count() << "s\n";
return ret;
}
long fib(long n) { return n < 2 ? n : fib(n-1) + fib(n-2); }
int main() { timed(fib, 35); }
Factory pattern:
A generic factory function that constructs objects by forwarding arguments to constructors, preserving value categories for efficient object creation:
#include <memory>
#include <utility>
template <typename T, typename ...Args>
std::unique_ptr<T> make(Args&&... args) {
return std::make_unique<T>(std::forward<Args>(args)...);
}
struct Widget { int x; double y; };
int main() { auto w = make<Widget>(42, 3.14); }
Bit Manipulation with std::bitset#
- Source:
The std::bitset class template provides a fixed-size sequence of bits
with convenient member functions for bit manipulation. Unlike raw integer
bit operations, std::bitset offers type safety, clear semantics, and
bounds checking in debug builds. The template parameter specifies the number
of bits, which is fixed at compile time.
This example creates a 4-bit bitset initialized to 8 (binary 1000).
The count() member function returns the number of set bits (population
count), which is useful for algorithms that need to count flags or compute
Hamming weights. The equality operator allows direct comparison with integer
values, automatically handling the conversion:
#include <bitset>
#include <iostream>
int main(int argc, char *argv[]) {
std::bitset<4> bits{0b1000}; // Binary: 1000
std::cout << bits.count() << "\n"; // Output: 1 (popcount)
std::cout << (bits == 8) << "\n"; // Output: 1 (true)
}
Safe Address-of with std::addressof#
- Source:
C++ permits overloading operator&, which can interfere with obtaining
an object’s actual memory address. While overloading this operator is rare
and generally discouraged, it does occur in some legacy codebases and smart
pointer implementations. The std::addressof function template bypasses
any overloaded operator& to return the true address, making it essential
for generic code that must work with arbitrary types.
In this example, the overloaded operator& could potentially return
an arbitrary pointer (perhaps for proxy object patterns or debugging).
Using std::addressof guarantees we obtain the object’s actual memory
location regardless of any operator overloading. This is particularly
important in allocator implementations, container internals, and other
low-level generic code:
#include <iostream>
#include <memory>
struct Widget {
int value;
};
const Widget* operator&(const Widget& w) {
// Custom behavior - could return anything
return std::addressof(w); // Safe: returns actual address
}
int main(int argc, char *argv[]) {
Widget w;
std::cout << &w << "\n"; // Uses overloaded operator&
std::cout << std::addressof(w) << "\n"; // Always returns true address
}
Note
Always use std::addressof in generic code where the type may have
an overloaded operator&. The standard library containers and algorithms
use std::addressof internally for this reason.