Daily bit(e) of C++ | Well-behaved paths in a tree
Daily bit(e) of C++ #184, Common interview problem: Well-behaved paths in a tree.
Today we will look at a common C++ interview problem: Well-behaved paths in a tree.
Given a tree, represented using two arrays of length n:
an array of node values, with values represented by positive integers
an array of edges represented as pairs of indexes
Return the number of well-behaved paths. A well-behaved path begins and ends in a node with the same value, with all intermediate nodes being either lower or equal to the values at the ends of the path.
For example, in the above tree, we have six total well-behaved paths: five trivial single-node paths and only one non-trivial path spaning between the two nodes with values 3.
Before you continue reading the solution, I encourage you to try to solve it yourself. Here is a Compiler Explorer link with a couple of test cases: https://compiler-explorer.com/z/o5Gee9v79.
Solution
This is a reasonably tricky problem. If we had only a binary tree, we could use a DFS and calculate the number of paths for each parent node based on the available values and the number of paths in the subtrees. Unfortunately, this approach would blow up in complexity with a general tree. So we need something more sophisticated.
Instead of considering the parent nodes as connection points, let’s consider the edges (each edge divides the tree into two sub-trees).
As a reminder, a valid path requires that both ends have the same value, and all intermediate nodes are, at most, equal to the ends of the path. This hints at an ordered approach.
And, if we process the edges in the order of their maximum value, i.e. std::max(value[node_left], value[node_right]), we get some interesting guarantees.
The maximum value in either of the connected subtrees is at most std::max(value[node_left], value[node_right]), and in fact, that has to be the maximum value in at least one of the subtrees.
If the maximum value in one of the subtrees is lower than std::max(value[node_left], value[node_right]), no valid paths are crossing this edge (since the maximum node creates a natural barrier).
If the maximum value in both of the subtrees is std::max(value[node_left], value[node_right]), then this edge adds freq_of_max[left]*freq_of_max[right] valid paths. From each node with maximum value in the left subtree to each node with maximum value in the right subtree.
While this sounds like we have solved the problem, we have created a big issue for ourselves. We need to track the frequencies for the maximum (fortunately, only the maximum) value in each connected subtree, and we need to do it in a way that allows us to look up this value based on any node that is part of this tree (since the newly added edge can connect to any of them).
Fortunately, the UnionFind algorithm allows us to keep track of a connected subtree by tracking a representative node for each subtree (and each node of that subtree). In our case, we want the representative node to have the maximum value. The part that we care about here is the complexity of the Find operation, which is O(α), where α is the inverse Ackerman function (which is <5 for all practical inputs), i.e. O(1) for practical purposes.
With that, we have an O(n*logn + n*α) solution. O(n*logn) for the initial sort and O(n*α) for the main loop. Because O(n*logn) will dominate, we can simplify it to O(n*logn).