Daily bit(e) of C++ | Learn Modern C++ 4/N

Daily bit(e) of C++ #118, A Modern-only C++ course (including C++23), part 4 of N: Indirection

Apr 29, 2023

Welcome to the fourth lesson of the Learn Modern-only C++ course, which I’m running as a sub-series of my Daily bit(e) of C++.

In today’s lesson, we will take a deep dive into indirection. First, we will cover the language level facilities: pointers and references. Then, we will also review the most frequently used standard library types that utilize indirection: iterators, std::string_view and std::span.

If you missed the previous lesson, you can find it here:

Daily bit(e) of C++ | Learn Modern C++ 3/N

Šimon Tóth

April 15, 2023

Read full story

Until now, we have been sticking with value semantics. However, you might have noticed something peculiar if you were paying attention in the previous lesson.

This doesn’t quite fit in with value semantics. So how exactly is push_back able to modify the object “data”? The answer is indirection, and today, we will take a deep dive into the various ways you can achieve indirection in C++.

Pointers and member functions

C++ pointers serve as typed memory addresses. The key feature we get from pointers is indirection. By storing the memory address of a variable in a pointer, we can easily modify the value located at that specific address without direct access to the original variable.

Since a pointer itself is merely a value, it adheres to value semantics.

We typically group the pointer with the variable name, not the type. This might seem counterintuitive; however, it follows the C++ parsing rules. You might come across a piece of code like this: int *a, b; In this code, a is a pointer to an int; b is simply an int. This complexity is why it is preferable to declare each variable separately. That way, we avoid any confusion.

When working with compound types, the dereferencing syntax can become unwieldy. To alleviate this, C++ introduces a more concise shorthand.

Reference semantics

The indirection unlocked by pointers is also more appropriately referred to as reference semantics. With reference semantics, creating a copy creates another reference to the original state without impacting the original variable.

Member functions

We can finally cycle back to our original question. How can the member function push_back modify the object it is invoked on?

In C++, member functions get access to a pointer to the object they were invoked on. This is a hidden first argument of a member function named “this”.

If there isn’t a name collision, the name resolution will find the members even without the explicit “this→” prefix.

Avoid shadowing existing names in your code because it significantly increases code complexity.

Immutability

Indirection is a powerful tool; however, we pay for that power with clarity. Once a function can access remote data, it becomes much harder to reason about its behaviour. One way to bring some semblance of order back into the mix is through immutability.

We have already talked about the tool to achieve immutability, the const keyword. In the context of pointers, we care about the referenced type being const. This effectively creates an immutable view of the original variable.

The standard semantics apply. If we have a pointer to a const type, we can only read from the referenced variable and cannot upgrade to a pointer to a mutable type. On the other hand, a pointer to a mutable type can be “downgraded” to a const type.

Note that, as with any variable, we can also mark the pointer itself as const (const int64_t * const x). However, as with regular variables, marking the variable itself as immutable has little use.

In the previous section, we talked about member functions, and the const qualifier did appear as a qualifier for a member function. Remember that the member functions come with a hidden argument called “this”, which refers to the object the function was invoked on. Because of the hidden nature, we cannot change the type of this pointer directly, so instead, we annotate the method.

Nullptr

While we typically do not care about the actual value a pointer is storing, one unique value is very significant: the “nullptr”. This represents the situation where the pointer isn’t pointing to anything.

One of the typical use cases is to represent optional arguments or values:

Safety, lifetime, ownership

An important aspect of reference types in C++ is that they are typically weak references, meaning they do not imply ownership over the original data.

This does prevent any overhead; however, as a consequence, it is entirely up to the developer to ensure that the reference type will not outlive its source variable.

Typically, in Modern C++, you won’t be using raw pointers; however, the above problem can be replicated with any weak reference type. Keep your lifetimes hierarchical.

The one case where we still rely on a raw pointer is passing a non-owning pointer to an immutable type to a function.

If you see (or write) this type of function, the intention is:

it is the responsibility of the caller that the pointer is valid and pointing to an object that will stay alive during the function call (or nullptr)
the function is only expected to read the state of the object
the function should not store this pointer for later use (since that invalidates the contract in 1.)

References

We will revisit pointers later in the course. For now, we will stick with another built-in tool for reference semantics: references.

You might have noticed that using pointers is relatively heavy in syntax. For example, we must use the address-of operator to obtain an address. To access the original value, we must dereference and check for nullptr while at it.

A reference is internally a pointer. Unlike a pointer, a reference cannot be re-pointed and behaves as the referenced variable syntactically. This means we do not have to use either of the address-of/dereference operators.

Because a reference behaves as the original variable, creating a reference to a reference creates a reference to the original variable (unlike with pointers).

One of the use cases for references is operator overloading. We will go over this topic in detail later in the course. For now, let’s look at stream insertion and extraction. This is how you can enable stream input and output for custom types.

Safety

Because references are pointers (with better syntax), most of the same safety rules apply.

On top of that, we have one more case to consider. In Modern C++, a function would rarely return a raw pointer. However, the same cannot be said about references (we did it in the previous example).

When returning a reference, it is essential to take care of the lifetime of the object to which we are returning a reference. You should never return a reference to a local variable. The two valid cases are: pass-through, where we return one of the arguments we took by reference and getters on compound objects.

Revisiting the range-for-loop

References unlock the full potential of the range-for-loop. Not only can we avoid copying each element (which is wasteful), but if we use a mutable reference, we can modify the original elements of the container as we iterate over them.

Iterators

Iterators are one of the most important abstractions in the standard library. So far, we have only talked about std::vector, std::array and std::string. However, these three containers are similar, storing their elements in a contiguous memory block.

However, the standard library also offers other containers, for example, std::list (a doubly-linked list). This creates a problem. How do you write code that works for both a std::vector and a std::list?

At a minimum, we want iteration over all elements to work the same, no matter the underlying storage structure. This is where iterators come in. They are grouped into the following categories based on their capabilities: operations that can be completed in constant time.

input and output iterators: only forward iteration, each element can be read once
data streams: e.g. reading from a network socket
forward iterators: only forward iteration, each element can be read multiple times
singly-linked lists: e.g. std::forward_list
bidirectional iterators: as above + backward iteration
doubly-linked lists: std::list, std::map, std::set
random access iterators: as above + move by integer offset and calculate the distance between two iterators
multi-array structures: std::deque
contiguous iterators: as above + the storage is contiguous
arrays: e.g. std::vector

For now, we will remain on the user side of the code, and the above serves mainly as a reference of what operations you can expect to be available and fast when working with various data structures.

Importantly, each data structure provides access to begin and end iterators, representing the iterator to the first element and an iterator to one past the last element. This creates a half-open interval [begin, end).

As noted in the previous lesson, std::vector might need to reallocate its internal storage to grow its capacity. When it does, all iterators are invalidated.

auto

This creates an excellent point to sneak in the basics of auto. You might have noticed that using iterators without auto is a bit cumbersome. To get the iterator type, we must go through the parent container, e.g. std::vector<int>::iterator.

Auto circumvents that by providing type deduction. In simplest terms, we don’t see the actual type; instead, the type will be deduced from whatever the variable is initialized with.

std::string_view and std::span

Besides single-element reference types, the standard library also offers two types that refer to a sequence of elements.

The std::string_view is a reference type for strings and can be constructed from std::string and string literals.

Taking a reference type of string literals is OK because they are global objects (the only literal type with this property). However, all previous rules about lifetimes still apply; holding a reference to an object outside of its lifetime is a problem.

The std::span is a similar reference type; however, unlike std::string_view, which is specialized for strings and offers appropriate manipulation options, std::span can reference any contiguous sequence of elements.

On top of that, std::span is a mutable reference type, meaning we can still modify the underlying data through a std::span (unlike std::string_view).