Daily bit(e) of C++ | Coroutines: step by step

Daily bit(e) of C++ #441, Coroutines: step by step; a journey through seven coroutine types.

Mar 17, 2024

The C++20 standard introduced support for coroutines. For most C++ users, using coroutines is a matter of following a library’s documentation that provides coroutine types. However, if you need to implement a custom coroutine type, this article is for you.

In this article, we will go over seven types of coroutines, each introducing more concepts.

Using coroutines

Any function (except main) can be a coroutine; there are two requirements:

the function body must contain at least one of the coroutine keywords: co_return, co_yield, co_await
the return type must conform to coroutine requirements (either directly or through std::coroutine_traits)

The compiler will then generate the required wrapping code to handle the details.

The standard library already provides one coroutine type, the std::generator (C++23), that we can use to implement lazily computed sequences of elements.

std::generator supports yielding of values and ranges of values using co_yield value and co_yield std::ranges::elements_of(range).

The std::generator provides a range interface, meaning that from the caller’s perspective, this is just a call to a function that produces a range.

std::generator is particularly handy when the non-coroutine implementation would have to rely on callbacks or eagerly compute the output. For example, implementing different tree traversals using std::generator results in a simple, ergonomic interface.

Coroutine types provided by other libraries may offer a different interface and behaviour, so always check the corresponding documentation.

Custom coroutines

The first thing to keep in mind is that C++ coroutines do not come with a pre-defined asynchronous model. Instead, the coroutine type can fit into almost any workflow.

Crafting a custom coroutine involves three main pieces: the return type (e.g., std::generator), the promise type (the high-level description of the coroutine behaviour), and the awaitable types (which control the low-level mechanics of how coroutines get suspended and resumed).

In the following sections, we will cover different types of coroutines, each example adding more complexity.

Crafting a routine

Considering a coroutine is a generalization of a routine, it is only fair to start by implementing a coroutine type that models a routine (a function that returns void).

A simple routine is also an excellent way to introduce an overview of all the available customization points exposed through the promise_type.

The promise_type can also be defined inline as a nested type without the indirection demonstrated in the example.

The two customization points we need to consider now are initial_suspend() and final_suspend(). Both functions return an awaitable type (which we will discuss later). initial_suspend() is called just before the coroutine body starts, and final_suspend() is called just after the coroutine body finishes.

The three main behaviours that can be modelled using an awaitable type are:

the coroutine continues running
the control is returned to the caller (or the last resumer)
the control is handed over to another coroutine

The standard offers two awaitable types, std::suspend_never (the coroutine continues running) and std::suspend_always (the control returns to the caller).

In the case of a routine, we want the coroutine to continue running until it finishes, which is why we use std::suspend_never for both initial and final suspend.

One problem with the above approach is that it is intrusive to the result type. Since we are implementing Routine ourselves, this is not an issue, but you might want to be able to write a coroutine that returns a type from a third-party library that you cannot modify. For that, the standard library offers the std::coroutine_traits customization point.

The specializations are matched against the result and argument types. In this case, we match a coroutine with any function arguments that returns Routine.

Crafting a function

For a routine, we could mostly ignore the return type itself. However, when we implement a function, we need a mechanism for the coroutine to communicate the result to the caller (which only has access to an instance of the return type).

The standard offers a std::coroutine_handle type to control the corresponding coroutine and access its promise type. To keep the following examples concise, we will wrap std::coroutine_handle (which has reference semantics) into an RAII object.

Note that we are explicitly destroying the coroutine in the destructor.

A simple solution for communicating the result to the caller is to store the result in the promise. If we give the caller access to the coroutine handle, the caller can read the stored value through the handle.

However, the moment a coroutine finishes, the promise is destroyed, so we need to keep the coroutine alive.

In the get_return_object, we give the caller access to the coroutine handle by passing it to the return object. In final_suspend, we return std::suspend_always, which will return the control to the caller (leaving the coroutine alive).

The return_value method adds support for the coroutine to invoke co_return with an argument. In this case, we allow any argument convertible to T; however, this is entirely up to the implementer. You can also rely on overload resolution by providing multiple implementations of return_value.

In our result type, we need to implement the corresponding constructor and helper to access the result of the coroutine.

Remember that since the coroutine is now kept alive, it is the return type’s responsibility to clean up; however, we have already dealt with that through our owning_handle RAII helper.

Handling exceptions

Having implemented a function coroutine type, it’s time to take a detour and discuss handling exceptions. When an exception is not caught within the coroutine, the compiler-generated code will be called the unhandled_exception method.

Our two main options are to call std::terminate or to store the currently in-flight exception and re-emit it once the caller attempts to access the result.

To capture and re-emit an exception, we can use std::current_exception, std::exception_ptr and std::rethrow_exception.

For simplicity, the remaining example will use the std::terminate variant.

Crafting a lazy function

The previous two examples (Routine and Function) were excellent tools for introducing the basics of coroutines. However, we can already write routines and functions even in C.

A lazy function is the first coroutine type that isn’t trivially replicated without coroutines.

The obvious change we need to make is to prevent the coroutine from running until the result is requested. This means changing the result type of initial_suspend to std::suspend_always, which will suspend the coroutine and return the control to the caller.

As a consequence of not starting the coroutine automatically, we move the responsibility for running the coroutine to the result type. We must ensure the coroutine has finished running when we attempt to access the result.

Note that the std::coroutine_handle provides the methods done and resume, which we are routing through our RAII wrapper.

Crafting a generator

While the standard library provides the std::generator, writing our own is still a good exercise and a solid introduction to co_yield.

In previous examples, the coroutine only generated one result. However, we can use co_yield to potentially repeatedly yield values, with the co_return serving only for early termination.

The support for co_yield mirrors the support for co_return with an argument. As in previous examples, we allow any argument that is convertible to T.

The important difference between co_return and co_yield is the awaitable result. This gives us yet another customization point, precisely to control what happens after the coroutine yields a value, with the usual three main options:

the coroutine continues running
the control is returned to the caller/resumer
the control is transferred to another coroutine

In the case of a generator, we want to return the control to the caller (so that the caller can read the generated value), meaning we can return the standard std::suspend_always.

The other side of the coin is the support inside the result type, which now needs to resume the coroutine to generate each value.

While this interface works, it is pretty cumbersome. We have to start the coroutine separately, check if it’s exhausted (which will resume the coroutine), and only then read the current value. The std::generator overcomes this complexity by supporting a range interface, so let’s do the same.

First, we need an iterator type.

To avoid issues with empty coroutines, we have to resume the coroutine when returning the begin iterator. If the coroutine immediately finishes, handle_.done() will return true, meaning that begin == default_sentinel_t{}.

With the range interface in place, we can now use the range for loop to iterate over all generated values.

Cooperative multitasking

So far, we have dealt with a single in-flight coroutine controlled by the caller. The more interesting use cases for coroutines typically involve multiple in-flight coroutines controlled by a scheduler/executor.

We can start with a simple case of cooperative multitasking in which multiple coroutines share the same execution thread and voluntarily give up control of this thread to let other coroutines run.

To achieve that, we will need to discuss awaitable types. We have already seen std::suspend_always (which returns control to the caller) and std::suspend_never (which lets the coroutine continue). We have also discussed three methods that return an awaitable: initial_suspend, final_suspend, and yield_value.

The piece of the puzzle that has been hidden away is the co_await keyword, mainly because this keyword appeared in the code generated by the compiler. The generated calls to the above methods have the form of:

co_await promise().initial_suspend();
co_await promise().final_suspend();
co_await promise().yield_value(arg);

This means that we are invoking co_await on the instance of an awaitable type returned by these methods, which then, in turn, controls what happens next:

the coroutine continues running
the control is returned to the caller/resumer
the control is handed over to another coroutine

Importantly, nothing prevents us from directly invoking co_await in the coroutine body. For example, if we invoke co_await std::suspend_always{}, we will suspend the coroutine and return the control to the caller, who can destroy or resume the coroutine as desired. However, I mentioned a scheduler, so let’s start with that.

This is also the first time we use the type-erased std::coroutine_handle<>. The scheduler only manages the lifetime of coroutines and doesn’t need access to any of the specifics.

For the coroutine type, we return to a simple task with an important addition: a detach method that hands off the coroutine’s ownership to the scheduler.

The final piece is the awaitable type that will suspend a coroutine and let other coroutines run.

When the coroutine evaluates co_await WakeupAwaitable{}, it will be suspended, and the control will be returned to the caller, the scheduler.

Working with asynchrony

When interacting with asynchrony, you will typically need to craft the coroutines to fit your specific needs. However, one type of asynchrony that is reasonably straightforward to integrate with is time. It also introduces an excellent opportunity to talk about await_transform.

The primary change is in our scheduler, which now needs an ordered container for storing the coroutine handles, each of which also has a wakeup time attached.

The natural extension of the interface from the previous example would be co_await Scheduler{}.wake_up(time_point); however, that is a bit cumbersome. It would be a lot cleaner if we could write co_await time_point instead. We can achieve this using the await_transform method on the promise_type.

However, if the promise_type defines at least one await_transform, only types that match one of the await_transform overloads will be usable with co_await.

Awaitable coroutines

The final type of coroutine we will introduce is an awaitable coroutine. In the previous example, we had the coroutine wait for an event.

While event-based triggering is pretty typical in asynchronous systems, sometimes we want to wait for the completion of an operation. It would also be very convenient to model these operations as coroutines.

You might consider simply calling another coroutine directly; however, then you run into two issues:

How do you resume the parent coroutine when the operation finishes (considering that the operation might need to suspend to wait for events or other operations)?
In a system heavily relying on coroutines, we can end up with millions of in-flight coroutines, which can potentially deeply nest (a lot more aggressively than functions). How do we avoid running out of stack space?

These issues can be addressed by adding a layer of indirection through awaitable types that rely on symmetric transfer.

We start with the basic Task interface, which we extend with the awaitable type interface.

When we co_await coroutine(), we want the coroutine to suspend, producing an instance of the result type. The co_await operator will then find the awaitable interface, which, in this case, uses the coroutine handle version of await_supend.

We return false from await_ready to suspend the caller and then return the callee’s handle to start its execution. However, we also remember the caller so that we can resume it once the callee finishes execution.

While this all works, you might wonder whether you can apply the same logic to functions, and in fact, you can. The result of the co_await expression is the result of await_resume.

This allows us to return values from our awaitable coroutines and mix and match tasks with functions.

Conclusion

In this article, we went over seven types of coroutines:

routine
function
lazy function
generator
cooperative tasks
cooperative tasks with asynchrony
awaitable coroutines

While this is already a lot of information, we have still completely ignored many topics. Leave a comment if you would like a follow-up article covering more advanced topics, such as multi-threading with coroutines or custom allocators.

Daily bit(e) of C++