Collatz Sequence

August 2016 · 4 minute read

The problem asks you to find the number with the longest Collatz chain. The Collatz sequence takes a number and applies the following based on it being even/odd:

if n.even? 
	n / 2
else
	3 * n + 1
end

We’ll be testing the range 1 to 1 million.

The brute force method


The first thing that may come to mind is looping through all the numbers in the range 1:100 million. Then you do another loop on each number that determines its Collatz chain. Here’s what that look’s like in Ruby code:

def collatz_num_with_array
  counter = 1
  collatz_chain = [1]
  until counter > 1_000_000
   collatz_sequence = collatz_chains(counter)
   if collatz_sequence.length > collatz_chain.length
     collatz_chain = collatz_sequence
   end
   counter += 1
  end
  collatz_chain
end
def collatz_chains(num)
  sequence = [num]
  until num <= 1
      num.even? ? num = num / 2 :
                  num = 3 * num + 1
      sequence << num
  end
  sequence
end

This approach is very computationally intense. Not only are you redoing all the math operations for every number from 1 to 1 million, but you are also creating new arrays for each loop. Though this is still an open problem, the upper limit on computations approach O(n^2). Looping through each array twice on average, sometimes much more!

A touch of elegance


One mistake worth noting in the solution above is that we are actually creating an array to save each sequence entry. If we look closer at the problem, this is not what it is asking for. We can quickly eliminate that array and just use a counter instead. Upon closer inspection there is a mathematical pattern that we can use to our benefit as well. Each time a number is odd, we apply (3*num + 1) to it. This immediately tells us the result will be an even number. We can use that and cut out a step every time we get an odd result. Changes are in bold below.

def collatz_num_without_array
  counter = 1
  collatz_number = 1  
  collatz_chain_max = 0
  
  until counter > 1_000_000
   num_terms_in_chain = collatz_chains(counter)
   if num_terms_in_chain >= collatz_chain_max
     collatz_number = counter
     collatz_chain_max = num_terms_in_chain
   end
   counter += 1
  end
  collatz_number
end

def collatz_chains(num)
  sequence_steps = 1
  until num <= 1
      if num.even? 
        num = num / 2 :
      else
        num = (3 * num + 1) / 2
        sequence_steps += 1
      end
      sequence_steps += 1
  end
  sequence_steps
end

As a side note, if you desire the sequence, just build a helper function that will determine the sequence for a specific number you give it. This is nice and all and will shave off a nice chunk of operations, but there has to be more because this is still very computationally complex.

Time to cache in


Since we are scaling numbers from small -> large, there is a very good chance that a number may repeat itself inside the collatz_chains method. Let’s draw up an example in our minds. Numbers 4 & 8 should give us a good visual hook to wrap our minds around.

collatz_chains(4) = collatz_chains(2) + 1 sequence_steps
collatz_chains(2) = 1 sequence_steps
collatz_chains(4) = 1 sequence_steps + 1 sequence_steps
                  = 2 sequence_steps
collatz_chains(8) = collatz_chains(4) + 1 sequence_steps
                  = 2 sequence_steps  + 1 sequence_steps
                  = 3 sequence_steps

Starting at 1 and increasing our collatz_chains calculations by 1 each step will give us a key advantage to cut down on the number of operations. We have already calculated the number of steps for 2 in the example above. Let’s reuse that info like we did in the short example. We will cache the number of sequence_steps for each number so that when we get that number again, we can just add in the steps we’ve already calculated. Let’s see what this looks like in code. Changes are in bold below.

@num_terms_cache = {2 => 2}
...
def collatz_num_without_array
  counter = 1
  largest_collatz_num = 2
  collatz_chain_max = 0
  until counter > 1_000_000
    num_terms_in_chain = 1
    collatz_num = counter
    until collatz_num < 2
        num_terms_in_chain = collatz_chains(collatz_num)
    end
    @num_terms_cache[counter] = num_terms_in_chain
    if num_terms_in_chain >= longest_collatz_iterations
      collatz_chain_max = num_terms_in_chain
      longest_collatz_num = counter
    end
    counter += 1
  end
  longest_collatz_num
end

def collatz_chains(num)
  sequence_steps = 1
  until num <= 1
   if @num_terms_cache.include? num
     sequence_steps += @num_terms_cache[num]
     break
   end
   if num.even? 
      num = num / 2 :
   else
      num = (3 * num + 1) / 2
      sequence_steps += 1
   end
   sequence_steps += 1
  end
  sequence_steps
end

Instead of tossing out the calculations you did in each iteration, we’ll go and save it into a hash. Each time that numbers comes up again we can break our hash and add in the number of sequence_steps the previous calculation had done. At a number like 800,000, this will become a god send. The algorithm approaches O(n) bounds now.

Key takeaway: Cacheing is our friend when we don’t want to redo all the operations from the ground up!