BitVector Genealogy



The BitVectors are an ancient and immortal race of 10,000, each with a 10,000 bit genome. The race evolved from a single individual by the following process: 9,999 times a BitVector chosen at random from amongst the population was cloned using an error-prone process that considers each bit independently, and flips it with 20% probability.

Write a program to guess the reproductive history of BitVectors from their genetic material. …. Balance performance against probability of mistakes as you see fit[2]

This is my take on ITA’s BitVector Genealogy puzzle. I split the puzzle into two parts: calculating the clone relationships between BitVectors and then deducing which BitVector was most likely to be the progenitor. To calculate the clone relationships I calculated the Hamming distance pairwise. I then compared it to a threshold that I had pre-computed using Bayes theorem to determine if the BitVectors were 1st generation clones. To find the progenitor, I chose the BitVector that gave the most compact tree. To determine how compact the tree was I summed the distance between each node and the root, where distance was defined to be the number of edges betweeen the node and the root. Full details here.
Accuracy in deducing clone relationships is 100.0%. Accuracy in finding the progenitor is 30.9%, which is I believe is the maximum accuracy obtainable. Run time is approximately 30 seconds on my computer.
Source code and binary available here.

[1]The picture is “DNA and twisty green things” by Ethan Hein and is licensed under the creative commons Attribution-NonCommercial-ShareAlike 2.0 Generic licence

[2] See