Sunday, November 1, 2015

Problem 14 : Longest Collatz Sequence

Longest Collatz sequence

Problem 14

The following iterative sequence is defined for the set of positive integers:
n → n/2 (n is even)
n → 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence:
13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.
Which starting number, under one million, produces the longest chain?
NOTE: Once the chain starts the terms are allowed to go above one million.
                                                                                                       (https://projecteuler.net/problem=14)
Problem 14 is about Collatz sequence.

 f(n) = \begin{cases} n/2 &\text{if } n \equiv 0 \pmod{2}\\ 3n+1 & \text{if } n\equiv 1 \pmod{2} .\end{cases}

Assuming the Collatz conjecture that this process will eventually reach the number 1, regardless of which positive integer is chosen initially, we will find the longest chain under one million.

This problem can be solved using something similar to the Sieve of Eratosthenes.

Basically, the computer is calculating and memorizing the information (storing the information into an array) and reusing it later to reduce the time, as there are going to be lots of repetitions and redundancies in calculation.

The first step is simple.

If you take any number under half of one million, or five hundred thousand, that number will not be the longest chain.

Why does that happen? Let's say you choose 40 and 80.

40 has 9 terms in its sequence. 80, which is double of 40, will go through extra n / 2 step in the beginning, and therefore will have 10 terms in its sequence.

Aha! So if any number within the range of 1 to 500,000 will have shorter chain length than its doubled value, as long as they are in the range also.

So here is our limit:

private static final int LIMIT = 1000000;

...

for (int i = LIMIT/2 + 1; i <= LIMIT; i++)
{...}

And keep in mind that we don't need to find the length of the longest chain, we just need the starting number that has the longest chain.

That's the reason why I'm not going to keep track of the actual length; I will just record the relative length of each chain.

We are going to make an array that will store the information.

private static int[] collatz = new int[LIMIT+1];

Now, let's look at the method that will store the information.

// fill the collatz array : memorization
public static void fillCollatz(int n)
{...}

We initialize the variables that will keep track of the chain length and the current value of calculation.

int count = 0;
int current = n;

If the number is even number, then we will divide it by 2.

if (current%2 == 0) current /= 2;

Once it is divided by 2, it will be smaller than our starting value.

If the current value is smaller than the starting value, then that means somewhere in the previous steps (in our for-loop) we already calculated the chain length for that current value as a starting value.

And here is the advantage of memorization; we just skip the entire step and save tons of time.

collatz[n] = collatz[current] + count;

Here is an example. Say we have 40 and 80, like we did before.

Let's say the computer already calculated the chain length for 40 already (9 terms) and stored in the array.

  step: value

     0: 40
     1: 20
     2: 10
     3: 5
     4: 16
     5: 8
     6: 4
     7: 2
     8: 1

And the computer has to calculate the chain length for 80 now.

The computer will divide 80 by 2, and get 40, and current chain length will be 1.

And it will recognize that it already calculated the chain length for 40 in the past.

So it will recall the number 9 from its array and add it to the current chain length.

  step: value

     0: 80
     1: 40 (from here, the calculation is identical to the former one.)
     2: 20 (we don't want to waste our time on calculating exactly same stuffs.)
     3: 10
     4: 5
     5: 16
     6: 8
     7: 4
     8: 2
     9: 1

Now, let's deal with the numbers that are not even numbers.

Because those odd numbers will be multiplied by 3, it will get really large number that int will not be able to hold.

So we need to set the limit as one-third of integer max_value, and use long instead:

private static final int MAX = Integer.MAX_VALUE/3;

// fill the collatz array : memorization
public static void fillCollatz(int n)
{

   ...

   else if (current >= MAX)
   {
       long cur = current;
       ...
   }

   ...

}

And notice that odd * 3 + 1 always become an even number, and we can reduce the step by immediately calculating (odd*3 + 1) / 2 and increasing the number of chain length by 2.

Math: new number = (odd*3 + 1) / 2 = odd + (odd + 1) / 2

if (cur%2 == 0) cur /= 2;
else
{
 cur += (cur + 1)/2;
 count++;
}

At the end, we will store the information by adding new calculation and previous memory.

collatz[n] = collatz[current] + count;

Integrated solution:

// fill the collatz array : memorization
public static void fillCollatz(int n)
{
    int count = 0;
    int current = n;
    for ( ; current >= n; count++)
    {
        if (current%2 == 0) current /= 2;
        else if (current >= MAX)
        {
            long cur = current;
            for ( ; cur >= MAX; count++)
            {
                if (cur%2 == 0) cur /= 2;
                else
                {
                    cur += (cur + 1)/2;
                    count++;
                }
            }
            current = (int)cur;
            count--;
        }
        else
        {
            current += (current + 1)/2;
            count++;
        }
    }
    collatz[n] = collatz[current] + count;
}

And our main body will iterate through the numbers until it reaches one million:

public static void run()
{
    int num = 0, maxLength = 0;
  
    collatz[1] = 1;
    for (int i = LIMIT/2 + 1; i <= LIMIT; i++)
    {
        fillCollatz(i);
        if (collatz[i] >= maxLength)
        {
            maxLength = collatz[i];
            num = i;
        }
    }
    System.out.println(num);
}


Answer is 837799
Execution time is 38.619629 ms
Source Code: https://github.com/Ainodyne/Project-Euler/blob/master/Problem014.java

Picture by: https://www.jasondavies.com/collatz-graph/

No comments:

Post a Comment