Daily bit(e) of C++ | Minimum substring with all letters
Daily bit(e) of C++ #63, The common interview problem: minimum substring that contains all letters from a second string.
Today we will look at a common C++ interview problem: A minimum substring that contains all letters.
Given two strings, find the minimum substring of the first string that contains all letters (with duplicates) from the second string.
For example, for the strings “abbcba”
and “abc”
the output is “cba”
.
Before continuing to read the solution, I encourage you to try to solve it yourself. Here is a link to Compiler Explorer with a couple of test cases: https://compiler-explorer.com/z/T8ojbYh1r.
Solution
When we look for a minimum window, we can use the “expand right, shrink left” approach.
We start with an empty window and expand to the right until we reach a point when the window satisfies our condition. At this point, we have the first window that satisfies the condition, but this window isn’t necessarily minimal since we can potentially shave off some of the left elements while maintaining the condition.
This is where the shrink-left part of the process comes in, removing the leftmost element until we no longer satisfy the condition. We can repeat the process and find the next window that satisfies the condition we just violated.
This approach provides a linear scan over the input, meaning that we end up with O(N*Q) complexity, where N is the length of the input and Q is the complexity of the condition check.
In our case, we need to check whether the window contains all the letters (with the correct frequencies). We can keep the count of frequencies as we iterate over the input in a std::unordered_map
. However, if we would validate these frequencies trivially, we could end up with O(M*N) complexity, where M is the length of the input string, and N is the length of the target string.
We can shift this to O(M+N), but we need one more trick. Instead of checking whether a given window has the appropriate number of characters, we can track how many letters from our target string are in the current window. Once the number of letters equals the length of the target string, we know that we have all of them. To avoid overcounting some letters, we must keep track of the frequencies and initialize those frequencies based on our target string.