The human genome project, which published its results 20 years ago last month, was a landmark in biology.
It was also somewhat misleadingly named.
After all, there is no such thing as “the” human genome.
Instead, there are 8bn individual humans, each of whom share the vast majority of their DNA—but not all of it.
The genome published by the Human Genome Project in 2003 was put together from a dozen anonymous blood donors in and around Buffalo, in New York state.
But there is more to life than Buffalo.
That, in essence, is the motive behind the publication this week, in Nature, of a set of 47 new “reference” genomes taken from individuals on four continents (Africa, both of the Americas, and Asia).
The idea of the Human Pangenome Project, the organisation behind the publications, is that rather than relying on a single “reference” genome, it would be better to have several, and to ensure that between them they capture as much of the genetic diversity of Homo sapiens as possible.
Compared with the total size of the genome, the amount of diversity in question is small.
Two people picked at random will share around 99.6% of their DNA.
That similarity is why the original genome produced by the Human Genome Project has proved so useful.
Its annotated strings of genetic code serve as a baseline.
Other genomes can be compared with it to look for variations, whether harmful or beneficial.
Yet although humans are mostly alike, their differences do matter.
A relatively recent mutation, for instance, means adults with ancestors from northern Europe, or some parts of India and the Middle East, are more likely to be able to digest lactose (a sugar found in milk) than those from elsewhere.
Which variation deserves to be treated as the standard?
Sometimes, the limits of using a single reference have direct medical consequences.
A set of genes called HLA, for instance, are involved in running the immune system.
They are highly variable, and mutations in them have been associated with autoimmune diseases such as type-1 diabetes.
One study, published in 2015, found that, because many gene-sequencing technologies are not perfectly accurate, comparing readouts from the region with the single reference genome led to mistakes around 20% of the time.
Another paper, published in 2022, found that relying on the reference genome meant that the details of some gene variants found in people with African ancestry, and seemingly associated with cancer, are poorly understood.