Daily bit(e) of C++ | Edit distance
Daily bit(e) of C++ #23, Common interview problems: Edit distance
Today we will look at a common C++ interview problem: calculating the edit distance between two strings.
The edit distance is the minimum number of edits required to change the first string into the second one.
The permitted edits are: add a character, remove a character and replace a character.
In this example, the edit distance is 3 with two replace operations and one add.
Before you continue reading the solution, try to solve it yourself. Here is a compiler explorer link with a couple of test cases: https://compiler-explorer.com/z/v6cqM7T6M.
Solution
The trivial approach would be to sweep across the two strings, trying all three operations whenever we encounter a difference.
Unfortunately, this quickly explodes. If the two strings had no overlap, we would multiply the number of states by 3 for each character, leading to O(3^N)
time complexity.
Therefore, we need something better.
As is typical with this type of problem, it's worth considering whether we can calculate the number of edits from the number of edits for shorter strings, which could allow us to calculate the number of edits iteratively.
Let's consider the three operations:
for addition, if we know the edit distance between
"abc"
and"xyz"
(let's call itdist
), then the edit distance between"abc"
and"xyzR"
is simplydist+1
for removal, if we know the edit distance between
"abc"
and"xyz"
, then the edit distance between"abcP"
and"xyz"
is simplydist+1
for replacement, if we know the edit distance between
"abc"
and"xyz"
, then the edit distance between"abcP"
and"xyzR"
isdist+1
ifP!=R
ordist
ifP==R
The only piece we still need is determining the minimum of edits. To do that, we need to consider all three operations and take the minimum of these numbers.
Since we do this iteratively, the final result is guaranteed to be the minimum.