I had a look at question already which talk about algorithm to find loop in a linked list. I have read Floyd\'s cycle-finding algorithm solution, mentioned at lot of places
Here is a intuitive non-mathematical way to understand this:
If the fast pointer runs off the end of the list obviously there is no cycle.
Ignore the initial part where the pointers are in the initial non-cycle part of the list, we just need to get them into the cycle. It doesn't matter where in the cycle the fast pointer is when the slow pointer finally reaches the cycle.
Once they are both in the cycle, they are circling the cycle but at different points. Imagine if they were both moving by one each time. Then they would be circling the cycle but staying the same distance apart. In other words, making the same loop but out of phase. Now by moving the fast pointer by two each step they are changing their phase with each other; Decreasing their distance apart by one each step. The fast pointer will catch up to the slow pointer and we can detect the loop.
To prove this is true, that they will meet each other and the fast pointer will not somehow overtake and skip over the slow pointer just hand simulate what happens when the fast pointer is three steps behind the slow, then simulate what happens when the fast pointer is two steps behind the slow, then when the fast pointer is just one step behind the slow pointer. In every case they meet at the same node. Any larger distance will eventually become a distance of three, two or one.
If there is a loop (of n nodes), then once a pointer has entered the loop it will remain there forever; so we can move forward in time until both pointers are in the loop. From here on the pointers can be represented by integers modulo n with initial values a and b. The condition for them to meet after t steps is then
a+t≡b+2t mod n which has solution t=a−b mod n.
This will work so long as the difference between the speeds shares no prime factors with n.
Reference https://math.stackexchange.com/questions/412876/proof-of-the-2-pointer-method-for-finding-a-linked-list-loop
The single restriction on speeds is that their difference should be co-prime with the loop's length.
If the fast pointer moves 3
steps and slow pointer at 1
step, it is not guaranteed for both pointers to meet in cycles containing even number of nodes. If the slow pointer moved at 2
steps, however, the meeting would be guaranteed.
In general, if the hare moves at H
steps, and tortoise moves at T
steps, you are guaranteed to meet in a cycle iff H = T + 1
.
Consider the hare moving relative to the tortoise.
H - T
nodes per iteration.Given a cycle of length N =(H - T) * k
, where k
is any positive
integer, the hare would skip every H - T - 1
nodes (again, relative
to the tortoise), and it would be impossible to for them to meet if
the tortoise was in any of those nodes.
The only possibility where a meeting is guaranteed is when H - T - 1 = 0
.
Hence, increasing the fast pointer by x
is allowed, as long as the slow pointer is increased by x - 1
.
Consider a cycle of size L, meaning at the kth element is where the loop is: xk -> xk+1 -> ... -> xk+L-1 -> xk. Suppose one pointer is run at rate r1=1 and the other at r2. When the first pointer reaches xk the second pointer will already be in the loop at some element xk+s where 0 <= s < L. After m further pointer increments the first pointer is at xk+(m mod L) and the second pointer is at xk+((m*r2+s) mod L). Therefore the condition that the two pointers collide can be phrased as the existence of an m satisfying the congruence
m = m*r2 + s (mod L)
This can be simplified with the following steps
m(1-r2) = s (mod L)
m(L+1-r2) = s (mod L)
This is of the form of a linear congruence. It has a solution m if s is divisible by gcd(L+1-r2,L). This will certainly be the case if gcd(L+1-r2,L)=1. If r2=2 then gcd(L+1-r2,L)=gcd(L-1,L)=1 and a solution m always exists.
Thus r2=2 has the good property that for any cycle size L, it satisfies gcd(L+1-r2,L)=1 and thus guarantees that the pointers will eventually collide even if the two pointers start at different locations. Other values of r2 do not have this property.
There is no reason that you need to use the number two. Any choice of step size will work (except for one, of course).
To see why this works, let's take a look at why Floyd's algorithm works in the first place. The idea is to think about the sequence x0, x1, x2, ..., xn, ... of the elements of the linked list that you'll visit if you start at the beginning of the list and then keep on walking down it until you reach the end. If the list does not contain a cycle, then all these values are distinct. If it does contain a cycle, though, then this sequence will repeat endlessly.
Here's the theorem that makes Floyd's algorithm work:
The linked list contains a cycle if and only if there is a positive integer j such that for any positive integer k, xj = xjk.
Let's go prove this; it's not that hard. For the "if" case, if such a j exists, pick k = 2. Then we have that for some positive j, xj = x2j and j ≠ 2j, and so the list contains a cycle.
For the other direction, assume that the list contains a cycle of length l starting at position s. Let j be the smallest multiple of l greater than s. Then for any k, if we consider xj and xjk, since j is a multiple of the loop length, we can think of xjk as the element formed by starting at position j in the list, then taking j steps k-1 times. But each of these times you take j steps, you end up right back where you started in the list because j is a multiple of the loop length. Consequently, xj = xjk.
This proof guarantees you that if you take any constant number of steps on each iteration, you will indeed hit the slow pointer. More precisely, if you're taking k steps on each iteration, then you will eventually find the points xj and xkj and will detect the cycle. Intuitively, people tend to pick k = 2 to minimize the runtime, since you take the fewest number of steps on each iteration.
We can analyze the runtime more formally as follows. If the list does not contain a cycle, then the fast pointer will hit the end of the list after n steps for O(n) time, where n is the number of elements in the list. Otherwise, the two pointers will meet after the slow pointer has taken j steps. Remember that j is the smallest multiple of l greater than s. If s ≤ l, then j = l; otherwise if s > l, then j will be at most 2s, and so the value of j is O(s + l). Since l and s can be no greater than the number of elements in the list, this means than j = O(n). However, after the slow pointer has taken j steps, the fast pointer will have taken k steps for each of the j steps taken by the slower pointer so it will have taken O(kj) steps. Since j = O(n), the net runtime is at most O(nk). Notice that this says that the more steps we take with the fast pointer, the longer the algorithm takes to finish (though only proportionally so). Picking k = 2 thus minimizes the overall runtime of the algorithm.
Hope this helps!
Let us suppose the length of the list which does not contain the loop be s
, length of the loop be t
and the ratio of fast_pointer_speed
to slow_pointer_speed
be k
.
Let the two pointers meet at a distance j
from the start of the loop.
So, the distance slow pointer travels = s + j
. Distance the fast pointer travels = s + j + m * t
(where m
is the number of times the fast pointer has completed the loop). But, the fast pointer would also have traveled a distance k * (s + j)
(k
times the distance of the slow pointer).
Therefore, we get k * (s + j) = s + j + m * t
.
s + j = (m / k-1)t
.
Hence, from the above equation, length the slow pointer travels is an integer multiple of the loop length.
For greatest efficiency , (m / k-1) = 1
(the slow pointer shouldn't have traveled the loop more than once.)
therefore , m = k - 1 => k = m + 1
Since m
is the no.of times the fast pointer has completed the loop , m >= 1
.
For greatest efficiency , m = 1
.
therefore k = 2
.
if we take a value of k > 2
, more the distance the two pointers would have to travel.
Hope the above explanation helps.