Can You Identify the Races of America?
In the 300,000 genomes in the NIH's All Of Us biobank, 6 or 7 distinctive racial groups emerge from the DNA data.
There’s a new American biobank called the All Of Us project that is collecting DNA for medical research purposes from a diverse set of Americans in order to get around the traditional problem that white people tend to be more willing to volunteer for scientific tracking than other groups, so most of the existing DNA data is for whites.
(For example, when I was in the marketing research business in the 1980s, it was quite easy to get thousands of white moms to let us track all their supermarket purchases via checkout scanner. It was more expensive to persuade other types of people to give us permission to poke around in their shopping habits. But, fortunately for us, in 1982 big manufacturers of branded foodstuffs and household goods mostly cared about what white moms bought at the grocery store.)
For most medical research purposes, it so far appears that white DNA mostly does a decent job of standing in for nonwhite DNA (less so for calculating polygenic scores). But it would be nice to find out where it doesn’t, so scientists could warn doctors in the future, say, “Don’t prescribe medication X to race Y.”
A new paper in Nature Communications takes a look at racial ancestry DNA of the 297,549 All Of Us volunteers.
Note, that the project didn’t attempt to recruit a precise nationally representative sample. It appears to have over-sampled blacks and undersampled Hispanics. But that’s not unreasonable because blacks tend to be more genetically distant from whites, so they should be a priority in this project.
So, don’t use it to try to calculate the genetic racial population of the U.S.
But the data does have its uses for looking at racial structure within the U.S., so long as you recognize that the size of the various racial groups that emerge from the data is not exactly representative of the actual population.
They find the US population is split up into reasonably distinct racial groups, with, of course, also a lot of people who don’t fit well in just one race. The authors write:
The extent to which human genetic diversity is characterized by clusters of closely related individuals, i.e., population structure, versus clines of continuous genetic variation has long been a subject of interest. The All of Us cohort allows for an assessment of the extent of population structure in the US, given the large size of the cohort, the extensive sampling of participants across the country, and the demographic diversity of the participants. The application of several different cluster analysis methods to participants’ genomic PCA data revealed evidence for substantial population structure in the cohort, with dense clusters of relatively closely related participants interspersed among less dense regions in PC space.
Here are seven racial groups that emerge from the genetic data (not from self-identification), plus a miscellaneous cluster. Can you identify the seven main racial ancestry clusters in the US? (#5 is kind of a stumper; it’s probably something of an artifact of shortcomings in the methodology’s reference works, so don’t worry about it too much.)
Here are my guesses:
Paywall here.
Keep in mind that they are comparing their results to reference works like the 1000 Genomes database, which was assembled from 2008 to 2015. But are 1000 Genomes, even though a huge undertaking 17 years ago, enough to fully understand global genetic diversity? There will be some poorly covered places until we have, say, a 10,000 Genomes reference work.
Group 1, a medium size race with large amounts of (blue) African DNA and moderate amounts of (orange) European DNA seems to be non-Hispanic African Americans. Virtually all have some white ancestry, but very few have more than about 3/8ths white ancestry.
This was something I noticed back at the beginning of this century: African-Americans who don’t have a recent white parent or grandparent will tend to have some white ancestry, but less than 50%. It’s an inevitable by-product of the strong cultural norm against white-black intermarriage that was dominant in the U.S. until recent generations.
Group 2 appears to Hispanic mulattos (e.g., Puerto Ricans): a majority of their ancestry is European, followed by African, followed by American (Amerindian), but a little bit of Oceanian and East Asian (the Manila-Acapulco galleon route brought Filipinos and others to Mexico).
Until the genome age began early in this century, it had been assumed that there were no survivors of Caribbean Amerindians like the Taino and Carib. But then from DNA it was discovered something that should have been noticeable just from looking at the mestizo-looking Jennifer Lopez: Amerindians make up a minority, but not trivial part, of the ancestors of living Puerto Ricans.
Group 3 is obviously East Asians. Some white admixture is observed. A little ancestry shows up as American. I don’t know whether that’s an error due to their Ice Age ties or whether it indicates intermarriage in historic times. And Oceanians (e.g., Malays and Polynesians) appear as well. The latter might be from Hawaii, where interracial marriages were vastly more common 100 years ago than in the rest of America.
Group 4 is South Asians. Some have small amounts of European, West Asian, and Oceanian ancestry.
Group 5 is a bit of a puzzle for the authors. They might be Middle Easterners from, say, east and or south of the Levant. Apparently, the reference works like 1000 Genomes didn’t sample in Iraq, Iran, or the Arabian peninsula, so it’s not clear who this group is supposed to be, or whether they’d just be West Asians and/or North Africans if 1000 Genomes had better coverage of West Asia.
Group 6 looks like Hispanic mestizos (e.g., Mexicans): about equal amounts of American and European ancestry with a small but noticeable amount of African descent. A few percent South Asian ancestry shows up in most of the individuals, but what that means, I don’t know.
Group 7 is whites. They are overwhelmingly European with a sliver of West Asian background, and virtually no African or American ancestry.
Whites in America have traditionally been really white.
The unnumbered final group is miscellaneous. It tends to be younger people. For example, on the Los Angeles Dodgers baseball team, Tommy Edman is white and Korean, Austin Barnes is white and Mexican, and manager Dave Roberts is black and Okinawan.
Another finding: most of the South Asian DNA in the US is South Indian rather than North Indian, with a smidgen of Central Asian.
I didn’t know that.
There are a lot of genomes in the wide World. A start is a start, to be sure and it would be interesting to line all this up with languages. Interesting subject.
Interesting. I would assume Group 6 would get quite a bit more Euro over the next couple of generations (provided we largely halt illegal immigration) due to intermarriage, and therefore the percentage of the population that claims Latino as an ethnicity will eventually decline and the share that simply identifies as white will increase although the percentage that is solely of Euro stock will be under 50 percent.
There are elements of the left and right that would be horrified at this outcome, although it’s the former that is most proactive in trying to head this off and no doubt will try to reopen the southern border as soon as they regain control of the presidency.