Daily bit(e) of C++ | Learn Modern C++ 6/N
Daily bit(e) of C++ #153, A Modern-only C++ course (including C++23), part 6 of N: Standard algorithms
Today we are following up on the previous lesson, finishing our exploration of the standard library. This time, we will focus on algorithms which cover a large portion of the standard library. If you missed the previous lesson, you can find it here:
Before discussing algorithms, we need to revisit iterators since they form the abstraction layer upon which all algorithms are implemented.
Iterators
We have already mentioned iterators in lesson four. Iterators are weak reference types that abstract the internal structure of a container and provide an interface that only exposes constant time operations.
input and output iterators: only forward iteration, each element can be read once
data streams: e.g. reading from a network socketforward iterators: only forward iteration, each element can be read multiple times
singly-linked lists: e.g. std::forward_listbidirectional iterators: as above + backward iteration
doubly-linked lists: std::list, std::map, std::setrandom access iterators: as above + move by integer offset and calculate the distance between two iterators
multi-array structures: std::dequecontiguous iterators: as above + the storage is contiguous
arrays: e.g. std::vector
The distinction between different types of iterators comes into play when implementing code that should work with different containers.
Let’s switch to a more concrete example to demonstrate. Consider a simple palindrome check. We are given a container and need to determine whether it is a palindrome.
There isn’t anything wrong with the above approach; however, we are implementing this functionality for one specific type - std::vector. Therefore, it would make sense to lift this into a function, and at that point, we probably do not want to have a variant of this function for every container.
To abstract from details of the container, we can rely on iterators. In this case, we require forward and backward iteration, i.e. bidirectional iterators.
Note that we are less efficient than the previous implementation using indices, as we always traverse the entire container twice and do not meet in the middle. One way to overcome this problem is to utilize the std::counted_iterator adapter.
Another iterator adapter that we will use frequently is std::back_inserter. This adapter can adapt a container, and for each write to the iterator, it will call the corresponding push_back() member function.
For a complete overview of the iterator adapters provided by the standard library, check the relevant cppreference.com section.
Ranges
So far, we talked about iterators; however, throughout, you might have noticed the implicit concept of a range formed by the half-open interval [begin, end).
The C++20 standard introduced library-level tools that formalize this notion. A range is a pair of an iterator and a sentinel (we have seen a sentinel in action in the previous example with counted iterator). You can still conceptually think about the sentinel as an end. However, it no longer has to be the same type as the iterator; it only needs to be comparable.
Ranges follow the same categories as iterators, naturally based on the category of the iterator used to express the range (e.g. random-access range).
This is the first time we have used a template function which works similarly to type templates, except the template arguments can be deduced (same rules as auto) from the arguments the function was called with.
On top of that, we have also constrained our template argument with a concept. Where template <typename Rng> would accept any type, template <std::ranges::bidirectional_range Rng> will only accept a bidirectional range.
This version of the palindrome check is much better; however, there is still one fundamental problem. Why do we write custom logic at all? This is where algorithms and views come in.
Algorithms
The standard library provides roughly 150 algorithms and views; therefore, we won’t be going over all of them. I have written a book on standard C++ algorithms that you can grab entirely for free: https://github.com/HappyCerberus/book-cpp-algorithms.
Whenever you are writing a loop, consider whether there is an algorithm that already encapsulates this functionality. You will save time in writing and testing the custom logic. On top of that, when you see in your code std::sort(), you will immediately know what is happening and won’t have to study the custom logic you wrote a couple of months ago.
The standard algorithms have undergone several iterations throughout the C++ standards. This, unfortunately, means that when looking for a specific algorithm, you will get multiple results, and the descriptions will contain multiple versions for different algorithm variants.
Fortunately, with C++20, we got a new version of the algorithms library, and with C++23, this library is mostly complete. So for this course, we are sticking with the algorithms from std::ranges.
Sorting
The main three sorting algorithms that will use regularly are std::ranges::sort - a basic O(n*logn) sort; std::ranges::stable_sort, which is typically less performant but maintains the relative order of elements that compare equal and finally, std::ranges::partial_sort that only sorts up to the provided iterator, leaving the rest of the elements in arbitrary order. The partial sort comes particularly handy where we are only interested in the top k elements, as it runs in O(n*logk).
Default arguments and projections
The previous example introduced a couple of concepts we should talk about. Both functions and templates can specify default values for their arguments.
With this intro, let’s look at (slightly modified) std::ranges::stable_sort, notably the range overload:
The first argument is constrained to a std::ranges::random_access_range.
The second argument is optional: the comparator, a binary predicate. If comp(x,y) == true, then x needs to be ordered before y. The default type for this comparator is std::ranges::less, which will provide the same order as x<y.
The third argument is the projection (also optional). Before we apply our comparator to the two elements, we can first project them. This can be as simple as the default std::identity, which returns the original passed-in element. Alternatively, we can project to a member or a value produced by a member function or even completely transform the element.
In the above example, we use the default comparator but specify a custom projection.
Lambdas
In the previous example, we only modified the projection and only to a member. What if we want to provide a custom comparator or write a custom projection?
We have three options: a free function, a function object or a lambda.
The main benefit of using lambdas is that we keep our customization code close to where it is used, and since the compiler automatically generates the type for each lambda, we do not have to come up with unique names, such as CompareItemGreaterByLabel.
The above use of lambda is equivalent to a free function. Additionally, we can introduce an internal state to a lambda by capturing some of the local variables. Capturing lambdas are equivalent to function objects (like GreaterObj in the above example).
Operations on sorted ranges
Once we have a sorted range, we unlock a lot of functionality. For example, elements can be looked up in O(logn) time: std::ranges::lower_bound, std::ranges::upper_bound, std::ranges::equal_range.
Using the std::ranges::subrange utility, we can form arbitrary subranges for contexts where a range is expected (such as the range-for loop).
The std::ranges::equal_range gets us both bounds in one call.
Sorted ranges can be merged, maintaining sorted order in O(n): std::ranges::merge, std::ranges::inplace_merge.
In-place merge is a fundamental building block of a merge sort.
Two sorted ranges can be checked whether one is a subset of another using std::ranges::includes, also in O(n).
Copy and transform
To copy between containers of different types, we can use the std::ranges::copy algorithm. If we also desire to change the elements on the way the std::ranges::transform.
The standard also offers many algorithms that change the order of elements, notably std::ranges::rotate_copy.
Folds
Fold algorithms are strictly ordered reductions, folding each element into an accumulator, i.e. repeatedly evaluating accumulator = op(accumulator, element);. The standard offers both left-to-right std::ranges::fold_left and right-to-left std::ranges::fold_right folds.
Lookups
The standard offers algorithms to lookup elements by value, by a predicate in either direction, lookup ranges in other ranges, or even find a mismatch between two ranges.
The simplest std::ranges::find algorithm will return the first instance in a left-to-right direction, matching a value or a predicate for std::ranges::find_if.
To check whether two ranges have the same content, we can use the std::ranges::equal algorithm, which returns a boolean. Alternatively, we can use the std::ranges::mismatch algorithm that returns a pair of iterators to the mismatched elements (or simply a pair of end iterators if there is no mismatch).
Min-Max
Finally, the std::ranges::min and std::ranges::max algorithms can pick out the minimum or maximum values from either a pair of values or a range.
The std::ranges::clamp will clamp the provided value in between the two specified bounds.
Views
We have already seen views in action in the initial palindrome example. Views address the core downside of algorithms. Algorithms are eagerly evaluated, meaning that when we want to chain a sequence of operations, we have to contend with intermediate copies of data.
The syntax follows the UNIX pipe syntax, each view being lazily applied to the elements of the previous view in the pipeline. The standard offers a comprehensive set of views.
On top of that, views are composited at compile time, meaning that we do not have to wrap the views in a function to introduce a custom compound view. Instead, a simple constexpr variable is enough.
However, once applied to a range and due to lazy evaluation, views are stateful. Notably, they will hold iterators to the underlying range. This means that the usual iterator invalidation rules apply, and even when iterators are not invalidated, mutating the underlying range can lead to unexpected results.
Homework
As usual, to practice the content of this lesson, you have a couple of homework assignments.
The template repository with homework for this lesson is here: https://github.com/HappyCerberus/daily-bite-course-06.
As with all homework, you will need VSCode and Docker installed on your machine and follow the instructions from the first lesson.
The goal is to make all tests pass as described in the readme file.
Great thanks, I'm not used to VS Code either, so there are so many pain points, I tried eclipse for this course before, with which I'm more familiar. But didn't know how to configure it for bazel.
Still facing issues, when I debug the program having put this in the launch.json:
```
"program": "${workspaceFolder}/bazel-bin/format/format_test",
```
VS Code then complains that `/workspaces/daily-bite-course-06/format/format_error` does not exist. I'm running it in the container, so I should have the same as you, but can't figure out why it is removing the `bazel-bin` part of the path.
Also, I don't think it's related but, VS Code also complains that it cannot open "catch2/catch_test_macros.hpp: No such file or directory gcc" in the include. When I run it with bazel from the command line it builds and runs OK. So when I try to launch it says gcc could noot compile it.
I know it doesn't need to because bazel generated the runnable. But I also don't know how to make VS Code debug the program because it doesn't find it.
Yes, I remember something about that, but I couldn't understand it at the moment, and as it said you would talk more about debugging later I just kept going.
Moreover I still can't figure out which is the bazel binary I have to set in the `launch.json`. Never used bazel before. Do I have to modify the `launch.json` for each test?
I guess I should also run the bazel build with debug before, because I don't see the `bazel-bin/debugging` directory in the dist.
I guess with this I have enough to try and make it work, but If you could add a more detailed guide for that it would be easier for other starters.
Thanks.