The idea that correlation does not imply causation is a fundamental caveat in epidemiological research. A classic example involves a hypothetical link between ice cream sales and drownings – instead of increased ice cream consumption causing more people to drown, it’s plausible that a third variable, summer weather, is driving up an appetite for ice cream and swimming, and hence opportunities to drown.
But what about correlations involving genes? How can researchers be sure that a particular trait or disease is truly genetically linked, and not caused by something else?
We are statistical geneticists who study the genetic and nongenetic factors that influence human variation. In our recently published research, we found that the genetic links between traits found in many studies might not be connected by genes at all. Instead, many are a result of how humans mate.
Genome-wide association studies try to link genes to traits
Because the genes you inherit from your parents remain unchanged throughout your life, with rare exception, it makes sense to assume that there is a causal relationship between certain traits you have and your genetics.
This logic is the basis for genome-wide association studies, or GWAS. These studies collect DNA from many people to identify positions in the genome that might be correlated with a trait of interest. For example, if you have certain forms of the BRCA1 and BRCA2 genes, you may have an increased risk for certain types of cancer.
Similarly, there may be gene variants that play a role in whether or not someone has schizophrenia. The hope is to learn something about the complex mechanisms that link variation at the molecular level to individual differences. With a clearer understanding of the genetic basis of different traits, scientists would be better able to determine risk factors for related diseases.
Researchers have run thousands of GWAS to date, identifying genetic variants associated with myriad diseases and disease-related traits. In many instances, researchers have identified genetic variants that affect more than one trait. This form of biological overlap, in which the same genes are thought to influence several apparently unrelated traits, is known as pleiotropy. For example, certain variants of the PAH gene can have several distinct effects, including altering skin pigmentation and causing seizures.
One way scientists assess pleiotropy is through genetic correlation analysis. Here, geneticists investigate whether the genes associated with a given trait are associated with other traits or diseases by statistically analyzing large samples of genetic data. Over the past decade, genetic correlation analysis has become the primary method for assessing potential pleiotropy across fields as diverse as internal medicine, social science and psychiatry.
Scientists use the findings from genetic correlation analyses to figure out the potential sh ared causes of these traits. For instance, if genes associated with bipolar disorders
Assortative mating and genetic correlation
However, just because a gene is correlated with two or more traits doesn’t necessarily mean it causes them.
Virtually all the statistical methods researchers commonly use to assess genetic correlations assume that mating is random. That is, they assume that potential mating partners decide who they will have children with based on a roll of the dice. In reality, many factors likely influence who mates with whom. The simplest example of this is geography – people living in different parts of the world are less likely to end up together than people living nearby.
We wanted to find out how much the assumption of random mating affects the accuracy of genetic correlation analyses. In particular, we focused on the potential confounding effects of assortative mating, or how people tend to mate with those who share similar characteristics with them. Assortative mating is a widely documented phenomenon seen across a broad array of traits, interests, measures and social factors, including height, education and psychiatric conditions.
In our study we examined cross-trait assortative mating, whereby people with one trait (for example, being tall) tend to mate with people with a completely different trait (for example, being wealthy). From our database of 413,980 mate pairs in the U.K. and Denmark, we found evidence of cross-trait assortative mating for many traits – for instance, an individual’s time spent in formal schooling was correlated not only with their mate’s educational attainment, but also with many other characteristics, including height, smoking behaviors and risk for different diseases.
We found that taking into consideration the similarities across mates could strongly predict which traits would be considered genetically linked. In other words, just based on how many characteristics a pair of mates shared, we could identify around 75% of the presumed genetic links between these traits – all without sampling any DNA.
Genetic correlation does not imply causation
Cross-trait assortative mating shapes the genome. If people with one heritable trait tend to mate with people with another heritable trait, then these two distinct characteristics will become genetically correlated to each other in subsequent generations. This will happen regardless of whether or not these traits are truly genetically linked to each other.
Cross-trait assortative mating means that the genes you inherit from one parent will be correlated with those you inherit from the other. How people mate is not random, violating the key assumption behind genetic correlation analyses. This inflates the genetic association between traits that aren’t truly linked together by genes.
Recent studies corroborate our findings. Earlier this year, researchers computed genetic correlations using a method that examines the association between the traits and genes of siblings. The genetic links between traits influenced by cross-trait assortative mating were substantially weakened.
But without accounting for cross-trait assortative mating, using genetic correlation estimates to study the biological pathways causing disease can be misleading. Genes that affect only one trait will appear to influence multiple different conditions. For example, a genetic test designed to assess the risk for one disease may incorrectly detect vulnerability for a broad number of unrelated conditions.
The ability to measure variation across individuals at the genetic and molecular level is truly a feat of modern science. However, genetic epidemiology is still an observational enterprise, subject to the same caveats and challenges facing other forms of nonexperimental research. Though our findings don’t discount all genetic epidemiology research, understanding what genetic studies are truly measuring will be essential to translate research findings into new ways to treat and assess disease.
Richard Border, Postdoctoral Researcher in Statistical Genetics, University of California, Los Angeles and Noah Zaitlen, Professor of Neurology and Human Genetics, University of California, Los Angeles