You, me and everyone else have been recruited into one massive experiment. We were enrolled at the point we were conceived, yet none of us knew about it until late in the 20th century. The fact that we have has helped to solve key problems in numerous areas of research. This giant experiment involves our genes, which are passed on from generation to generation, influencing a multitude of behaviours, such as how much we eat, drink, sleep, interact and get stressed.
The Limits of Trails
Nutrition scientists have a tough job. Continually faced with multiple obstacles as they attempt to reveal links between diet and disease, they have to translate the findings into dietary advice, and disentangling cause and effect can be a thankless task. Historically, the gold standard in epidemiology was a combination of observational and randomised controlled trials. However, both come with their own limitations. One major hurdle is confounding factors as chronic diseases result from numerous forces playing into each other, including factors that are not even considered. The problems can be further exacerbated by people’s behaviours changing if they become ill. For example, someone with mild angina symptoms might change their behaviour by exercising more, but when they’re later diagnosed with coronary heart disease (CHD), it’s not clear if their new active lifestyle caused the condition that was destined to happen anyway.
Observational trials measure associations by drawing inferences from a sample population. They are most reflective of free-living populations and are useful for establishing an association between the variable and the disease. However, they cannot determine whether an observed relationship is a genuine link to disease risk or merely a coincidence. To minimise bias, known confounders of disease risk, such as age, race, body weight and smoking, need to be adjusted for so that the substance disease relationship is accurately measured. Observational studies are useful for making general observations, but we should be very cautious about drawing firm conclusions from them. For example, an observational study might look at the consumption of vegetable oils in a population and see how it relates to incidence of cardiovascular disease (CVD). However, we must not draw a hard link between vegetable oil use and CVD as populations will also likely consume diets rich in sugar and salt and low in fibre and antioxidants, be more sedentary, or there may be any one, or a combination, of a multitude of unknown influences.
In randomised controlled trials (RCTs), subjects are randomly assigned to a treatment group – of which there may be more than one – or a control group. Often, RCTs are performed in a double-blind format, in which neither the subject nor those performing the experiment are aware of which group the subject is assigned to, helping to minimise the risk of bias, which would lessen the validity of the findings. However, RCTs are hard, especially when it comes to diet, as we can’t randomise people to particular diets and get them to stick to them for very long. Most non-communicable diseases (NCDs) take decades to manifest and are dependent on multiple factors. In this respect, RCTs are neither ethical nor practical. RCTs can explore the effect of a variable on specific biomarkers linked to disease risk, but naturally, this limits their effectiveness at establishing specific causal agents in disease aetiology. Thus, human RCTs, although amazing tools, have key drawbacks.
Steeped in the methodological limitations of observational studies and RCTs, epidemiologists sought alternative tools. With the major advancements available by the late 20th century, one angle that they considered was how genetic variation contributed to disease and health outcomes. By invoking ideas conceived from the 1860s, they unravelled an invaluable technique that would help overcome some of the drawbacks of observational experiments and RCTs in research [1].
An Alternative Style of Research
The Austrian-Czech biologist, mathematician and friar, Gregor Mendel, has been described as the father of genetics. Through his work on pea plants in the 1860s, Mendel discovered that traits are inherited in a predictable manner through distinct units [2]. We now know these units as genes. He established that alleles (variants) of genes segregate independently during the formation of gametes and that genetic variations are randomly assigned at conception.
Mendelian randomisation (MR) is an epidemiological method that uses genetic variation to study the causal relationship between an exposure and an outcome. MR is based on the idea that genetic variants are not usually associated with confounders in a population and are therefore less likely to be affected by reverse causation bias. The technique builds on principles of Mendelian inheritance, where alleles are randomly assorted during gamete formation and are independent of confounding factors that typically plague observational studies, allowing us to create groups who differ for genetic variants within data sets. We can use this knowledge to learn about cause and effect, grouping people according to their genetic code. If a variant is associated with the exposure of interest, we can identify outcomes that covary with its presence or absence [3]. Essentially, MR leverages genetic variants as instruments to evaluate the causal effect of a modifiable exposure (such as blood pressure or cholesterol levels) on a disease outcome (such as CVD or diabetes).
The foundation of MR lies in using genetic variants – the most common type being single nucleotide polymorphisms (SNPs; pronounced “snips”) – as proxies for the exposure of interest. For a genetic variant to be a valid instrument, it must be relevant. For example, an SNP associated with higher LDL cholesterol levels would be a relevant instrument for studying the causal effect of LDLs on heart disease. The genetic variant should also be independent of confounders so that the variant is randomly allocated at conception and is thus not associated with external confounding factors, such as lifestyle or socioeconomic status. Moreover, the genetic variant should affect the outcome only through the exposure of interest and not through alternative pathways, a principle known as the exclusion restriction criterion. If these assumptions – i.e. relevance, independence and exclusivity – are all satisfied, MR aims to mimic an RCT, allowing researchers to draw conclusions about causality rather than by mere association [4].
MR has become a central tool in epidemiology and has been applied to answer a wide variety of questions about the causal roles of exposures, such as cholesterol levels, BMI, smoking and alcohol consumption in NCD like CVD, type 2 diabetes and cancer. For example, MR has been used to demonstrate that high LDLs are a causal risk factor for CHD [5], strengthening the case for the use of statins. It has also been instrumental in evaluating whether biomarkers, like C-reactive protein, are causal risk factors for disease. MR has become a transformative tool in causal inference, providing an approach that is less prone to the biases typical of traditional observational studies, and it continues to evolve through advances in genetics and data availability. Using this information, we can see whether particular genes are linked to certain NCDs, but not others. MR builds on other evidence and is already showing how particular factors influence the risk of disease, including looking at several risk factors together to explore the influences on disease progression, and this may help us develop new treatments.
History of Mendelian Randomisation
MR was first proposed in the early 2000s and then popularised by Sir Richard Doll, an epidemiologist, and his colleagues, though its formalisation came through the work of George Davey Smith and Shah Ebrahim in a 2003 landmark paper [6]. They proposed MR as an approach to use genetic variants to test for causality in epidemiology. The method borrows from an econometric technique known as instrumental variable analysis, where a third variable (the instrument) is used to identify causal effects when there are confounding factors. Genetic variants that influence a modifiable exposure (such as blood cholesterol levels) can serve as proxies for the exposure, under the assumption that these variants are randomly distributed and are not influenced by confounders.
The publication of the Human Genome Project in 2003 paved the way for large-scale genome-wide association studies (GWAS) [7], helping to identify genetic variants associated with diseases. More recently, the emergence of human biobanks, such as the UK Biobank – one of the largest reservoirs of human genetic data – has expanded the use of MR for genetic association studies [8]. Through using biobanks, advanced computational tools and GWAS data, researchers have been able to find more reliable genetic instruments and apply MR in a broad array of settings.
Types of Mendelian Randomisation
There are several distinct methodological approaches to MR, each with its own strengths and limitations. Single-SNP MR uses a single genetic variant as the instrument. Despite being straightforward, it relies heavily on the strength and validity of that single SNP. Two-Sample MR uses summary statistics from two different datasets: one for the exposure and another for the outcome, which increases its power and allows for more precise estimates. Multivariable MR can account for pleiotropy (where a genetic variant affects multiple traits) by including multiple exposures in the model, allowing for a more nuanced understanding of causal pathways. MR-Egger Regression is a technique used to detect and correct for pleiotropy, where the genetic variant may affect the outcome through pathways other than the exposure of interest.
Benefits of Mendelian Randomisation
MR is a useful tool for causal inference in epidemiology. As genetic variants are randomly assorted and fixed at conception, MR can mitigate confounding factors that plague observational studies. The biological random allocation mimics the randomisation process in clinical trials, providing a more reliable estimate of causality. MR also eliminates the problem of reverse causation: as genetic makeup is determined at birth and remains largely unchanged, it removes the possibility that the disease might influence the exposure, rather than the other way around [5].
Another benefit relates to temporal directionality. Genetic variants precede the development of disease, ensuring that the temporal order – i.e. exposure precedes outcome – is maintained, and this is particularly useful in distinguishing between cause and effect, unlike cross-sectional studies [6].
There are also ethical and practical benefits. MR can be used to explore potential causal relationships that may be unethical or impractical to investigate through RCTs. For example, it would be unethical to directly manipulate a risk factor like alcohol consumption to study its effects on liver disease, but MR can provide insights without such interventions. And, compared to RCTs, MR studies are generally less expensive and time-consuming. They often utilise existing genetic data from GWAS, making them a cost-effective tool for causal inference [7].
Limitations of Mendelian Randomisation
However, MR is not without its challenges and limitations. One of the most significant challenges in MR is pleiotropy. This is where a genetic variant influences multiple traits, which violates the exclusion restriction assumption and can lead to biased estimates if the variant affects the outcome through pathways other than the exposure of interest. The MR-Egger regression technique helps to address this issue but comes with its own limitations. Horizontal pleiotropy occurs when the genetic variant affects the outcome through multiple, independent pathways. Unlike vertical pleiotropy, which is mediated through the exposure of interest, horizontal pleiotropy is a major source of bias and is harder to detect and correct [8].
The validity of MR findings is highly dependent on the underlying assumptions being met. If these assumptions are violated, the causal estimates may be biased or incorrect. Another issue is that if the genetic variant used as an instrument is only weakly associated with the exposure, the analysis may suffer from low statistical power, leading to imprecise estimates, and this is known as the “weak instrument” problem. Moreover, differences in allele frequencies across populations can confound MR results if not appropriately controlled. For example, genetic variants may have different effects in different ethnic groups due to varying environmental factors or linkage disequilibrium patterns. MR can only be applied to exposures for which there are known and measurable genetic variants associated with them, and this can limit the scope of its application [9].
Despite these challenges, MR continues to be a valuable approach in understanding the causal relationships between genetic predispositions, risk factors and health outcomes, informing both treatment and preventative measures in improving outcomes for chronic NCD.
References:
1. Smith, G. D. and Ebrahim, S. (2003) ‘“Mendelian Randomization”: Can Genetic Epidemiology Contribute to Understanding Environmental Determinants of Disease?’, International Journal of Epidemiology, 32(1), 1-22.
2. Hartl, D. L. (2022) ‘Gregor Johann Mendel: From Peasant to Priest, Pedagogue, and Prelate’, Proceedings of the National Academy of Sciences, 119(30), e2121953119.
3. (a) Richmond, R. C. and Smith, G. D. (2022) ‘Mendelian Randomization: Concepts and Scope’, Cold Spring Harbor Perspectives in Medicine, 12(1), a040501; (b) Sanderson, E. et al. (2022) ‘Mendelian Randomization’, Nature Reviews Methods Primers, 2, 6.
4. (a) Davies, N. M. et al. (2018) ‘Reading Mendelian Randomisation Studies: A Guide, Glossary, and Checklist for Clinicians’, BMJ, 362, k601; (b) Burgess, S. et al. (2023) ‘Guidelines for Performing Mendelian Randomization Investigations: Update for Summer 2023’, Wellcome Open Research, 4, 186.
5. (a) Ference, B. A. et al. (2012) ‘Effect of Long-Term Exposure to Lower Low-Density Lipoprotein Cholesterol Beginning Early in Life on the Risk of Coronary Heart Disease: A Mendelian Randomization Analysis’, Journal of the American College of Cardiology, 60(25), 2631-9. (b) Kawashiri, M. et al. (2018) ‘Mendelian Randomization: Its Impact on Cardiovascular Disease’, Journal of Cardiology, 72(4), 307-13; (c) Yang, G. et al. (2024) ‘Dose–Response Associations of Lipid Traits With Coronary Artery Disease and Mortality’, JAMA Network Open, 7(1), e2352572.
6. ibid (1).
7. Ikegawa, S. (2012) ‘A Short History of the Genome-Wide Association Study: Where We Were and Where We Are Going’, Genomics & Informatics, 10(4), 220-5.
8. Amin, H. A. et al. (2022) ‘Mendelian Randomisation Analyses of UK Biobank and Published Data Suggest That Increased Adiposity Lowers Risk of Breast and Prostate Cancer’, Scientific Reports, 12(1), 909.
9. ibid (4).
I won't explain how you managed to structure all of this, it's a hard to read but informative article