Daily bit(e) of C++ | Find the duplicate value
Daily bit(e) of C++ #310, Common interview problem: Find the duplicate value.
Today, we will look at a common C++ interview problem: Find the duplicate value.
Given an array of length n+1 that contains the integers 1..n with one duplicate, return the duplicate.
Your solution should have O(n) time complexity and O(1) space complexity.
For example, for the input {7,1,6,9,3,4,9,5,2,8}, the duplicate value is 9.
Before you continue reading the solution, I encourage you to try to solve it yourself. Here is a Compiler Explorer link with a couple of test cases: https://compiler-explorer.com/z/GnvGcYTdz.
Solution
There are plenty of straightforward solutions that either modify the input or use additional memory.
The O(1) space solution is a bit tricky. This problem is an obfuscated version of the “Find cycle in a linked list” problem, specifically the variant where we need to find the start of the cycle.
Since our array contains n+1 elements whose values are 1..n, we can treat each element as a node that points to another index. The index that is pointed at twice is our duplicate.
Additionally, we do not need to handle cycles of size one since we have a guarantee that nothing points to zero, and the only way we can reach a cycle of size one is if it is our duplicate.
To find our cycle, we can use the tortoise and hare technique, i.e. two pointers, one advancing once, the other advancing twice in each step. Once they meet, we know that we are on our cycle, and we can calculate the position of the start of the cycle.
Consider that the slow pointer moved x steps before entering the loop and then y steps after entering the loop for a slow = x + y total.
The fast pointer moved similarly. It also moved x steps before entering the loop and then y steps after entering the loop when it met up with the slow pointer; however, before that, it did an unknown number of loops: fast = x + q*loop + y. Importantly, we also know that the fast pointer also did 2*slow steps.
If we put this together, we end up with the following:
2*(x + y) = x + q*loop + y
x = q*loop - y
This means that the number of steps to reach the loop is the same as the number of steps remaining from where the pointers met up to the start of the loop.
So, to find the start of the loop, we can iterate from the start and the meeting point. Once these two new pointers meet, we have our loop start.