Can You Identify the Races of America?
In the 300,000 genomes in the NIH's All Of Us biobank, 6 or 7 distinctive racial groups emerge from the DNA data.
There’s a new American biobank called the All Of Us project that is collecting DNA for medical research purposes from a diverse set of Americans in order to get around the traditional problem that white people tend to be more willing to volunteer for scientific tracking than other groups, so most of the existing DNA data is for whites.
(For example, when I was in the marketing research business in the 1980s, it was quite easy to get thousands of white moms to let us track all their supermarket purchases via checkout scanner. It was more expensive to persuade other types of people to give us permission to poke around in their shopping habits. But, fortunately for us, in 1982 big manufacturers of branded foodstuffs and household goods mostly cared about what white moms bought at the grocery store.)
For most medical research purposes, it so far appears that white DNA mostly does a decent job of standing in for nonwhite DNA (less so for calculating polygenic scores). But it would be nice to find out where it doesn’t, so scientists could warn doctors in the future, say, “Don’t prescribe medication X to race Y.”
A new paper in Nature Communications takes a look at racial ancestry DNA of the 297,549 All Of Us volunteers.
Note, that the project didn’t attempt to recruit a precise nationally representative sample. It appears to have over-sampled blacks and undersampled Hispanics. But that’s not unreasonable because blacks tend to be more genetically distant from whites, so they should be a priority in this project.
So, don’t use it to try to calculate the genetic racial population of the U.S.
But the data does have its uses for looking at racial structure within the U.S., so long as you recognize that the size of the various racial groups that emerge from the data is not exactly representative of the actual population.
They find the US population is split up into reasonably distinct racial groups, with, of course, also a lot of people who don’t fit well in just one race. The authors write:
The extent to which human genetic diversity is characterized by clusters of closely related individuals, i.e., population structure, versus clines of continuous genetic variation has long been a subject of interest. The All of Us cohort allows for an assessment of the extent of population structure in the US, given the large size of the cohort, the extensive sampling of participants across the country, and the demographic diversity of the participants. The application of several different cluster analysis methods to participants’ genomic PCA data revealed evidence for substantial population structure in the cohort, with dense clusters of relatively closely related participants interspersed among less dense regions in PC space.
Here are seven racial groups that emerge from the genetic data (not from self-identification), plus a miscellaneous cluster. Can you identify the seven main racial ancestry clusters in the US? (#5 is kind of a stumper; it’s probably something of an artifact of shortcomings in the methodology’s reference works, so don’t worry about it too much.)
Here are my guesses:
Paywall here.
Keep reading with a 7-day free trial
Subscribe to Steve Sailer to keep reading this post and get 7 days of free access to the full post archives.