GWAS IV: Mendelian Randomization

BIOS 770

Josh Weinstock, PhD

Assistant Professor, Department of Human Genetics, Emory University

Learning objectives

  1. Discuss the assumptions of Mendelian randomization (MR)
  2. Discuss a few notable MR methods
  3. Close with discussion on the MR literature broadly

Inferring causal effects of \(\color{darkgreen}{X}\) on \(\color{darkorange}{Y}\)

Consider two phenotypes, measured in humans:

  1. \(\color{darkgreen}{X}\), some “exposure”, e.g., low-density lipoprotein, C-reactive protein, etc.
  2. \(\color{darkorange}{Y}\), some “outcome”, e.g., ischemic heart disease, systolic blood pressure, etc.

Inferring the causal effect of \(\color{darkgreen}{X}\) on \(\color{darkorange}{Y}\)

We are interested in inferring the causal effect of \(\color{darkgreen}{X}\) on \(\color{darkorange}{Y}\). Perhaps ideally, we could randomly assign values to \(\color{darkgreen}{X}\), i.e., \(do(\color{darkgreen}{X} = x)\), and then measure how \(\color{darkorange}{Y}\) changes as we change \(do(\color{darkgreen}{X} = x)\).

This works (e.g, RCTs). But obviously, for numerous pairs of \((\color{darkgreen}{X}, \color{darkorange}{Y})\), RCT isn’t tractable.

Transmission of alleles as a pseudo-RCT

Perhaps we can instead use natural variation, with the right kind of randomness, as a proxy for an RCT, a kind of “natural experiment.” Consider genetic inheritance:

  1. In the autosomes, offspring randomly inherit one of mother’s two alleles, and one of the father’s two alleles.
  2. These alleles are transmitted indepenently on one another (putting aside linkage disequilibrium for now)

Perhaps with these principles in mind, we can view each trait relevant variant as a kind of RCT.

Genetic variation as a parallel to an RCT

MR study design, Sanderson et al., 2022, Nature reviews methods primers

Mendelian Randomization

To do this, we can use Mendelian Randomization (MR), which leverages genetic variants as instrumental variables (IVs) to estimate the causal effect of an exposure (\(\color{darkgreen}{X}\)) on an outcome (\(\color{darkorange}{Y}\)).

In the case of no confounding between \(\color{darkgreen}{X}\) and \(\color{darkorange}{Y}\):

\(\color{darkgreen}{X_i} = \mu_x + G_i\beta + \epsilon_{x_i}\)
\(\color{darkorange}{Y_i} = \mu_y + \color{darkgreen}{X_i}\color{darkblue}{\alpha} + \epsilon_{y_i}\)

Assuming \(E(\epsilon_{x}) = E(\epsilon_{y}) = 0\) and \(Cov(\epsilon_{x}, \epsilon_{y}) = 0\). \(G_i\) are genotypes for the \(ith\) individual.

Implications of this two stage model: Expectation

\[ \begin{align} E(\color{darkorange}{Y_i} | G_i) &= E\bigl(E(\color{darkorange}{Y_i} | \color{darkgreen}{X_i}, G_i)\bigr) \\ &= \mu_y + \color{darkblue}{\alpha}E(\color{darkgreen}{X_i | G_i}) \\ &= \mu_y + \color{darkblue}{\alpha}(G_i\beta + \mu_x) \end{align} \]

Implications of this two stage model: Covariance

\[ \begin{align} Cov(\color{darkorange}{Y}, \color{darkgreen}{X} \mid G) &= Cov(\mu_y + \color{darkblue}{\alpha}*\color{darkgreen}{X} + \epsilon_y,\ \color{darkgreen}{X} \mid G) \\ &= \color{darkblue}{\alpha} * Var(\color{darkgreen}{X} \mid G) \end{align} \]

\[ \begin{align} Cov(\color{darkorange}{Y}, \color{darkgreen}{X}) &= E\bigl(Cov(\color{darkorange}{Y}, \color{darkgreen}{X} \mid G)\bigr) + Cov\bigl(E(\color{darkorange}{Y} \mid G), E(\color{darkgreen}{X} \mid G)\bigr) \\ &= \color{darkblue}{\alpha} * E\bigl(Var(\color{darkgreen}{X} \mid G)\bigr) + \color{darkblue}{\alpha} * Cov\bigl(E(\color{darkgreen}{X} \mid G), E(\color{darkgreen}{X} \mid G)\bigr) \\ &= \color{darkblue}{\alpha} * Var(\epsilon_x) + \color{darkblue}{\alpha} * \beta^2 Var(G) \end{align} \]

Note: Only true if \(Cov(\epsilon_y, \color{darkgreen}{X}) = 0\) !

Now, a bit more realistic, with a confounder

However, if there was no confounding between \(\color{darkgreen}{X}\) and \(\color{darkorange}{Y}\), why do we need MR anyway?

Now, a bit more realistic, with a confounder

Consider the following scenario where there is a confounder \(U\) that affects both \(\color{darkgreen}{X}\) and \(\color{darkorange}{Y}\):

\[ \begin{align} \color{darkgreen}{X_i} &= \mu_x + G_i\beta + U_i\gamma + \epsilon_{x_i} \\ \color{darkorange}{Y_i} &= \mu_y + \color{darkgreen}{X_i}\color{darkblue}{\alpha} + U_i\delta + \epsilon_{y_i} \end{align} \]

Here, \(U_i\) represents the confounder, \(\gamma\) is the effect of the confounder on \(\color{darkgreen}{X}\), and \(\delta\) is the effect of the confounder on \(\color{darkorange}{Y}\).

Implications of this model with a confounder

The presence of the confounder \(U\) introduces bias in the estimation of the causal effect \(\color{darkblue}{\alpha}\).

Covariance

\[ \begin{align} Cov(\color{darkorange}{Y}, \color{darkgreen}{X} \mid G) &= Cov(\mu_y + \color{darkblue}{\alpha}\color{darkgreen}{X} + U\delta + \epsilon_y,\ \color{darkgreen}{X} \mid G) \\ &= \color{darkblue}{\alpha} * Var(\color{darkgreen}{X} \mid G) + \delta * Cov(U, \color{darkgreen}{X} \mid G) \end{align} \]

Note: The term \(\delta * Cov(U, \color{darkgreen}{X} \mid G)\) introduces bias due to the confounder \(U\).

Addressing confounding in MR

To address confounding, we can use genetic variants as instrumental variables (IVs) that are not associated with the confounder \(U\). This allows us to estimate the causal effect \(\color{darkblue}{\alpha}\) without bias from \(U\).

Example

Suppose we have identified a set of genetic variants that are associated with body mass index (BMI) and we want to estimate the causal effect of BMI on blood pressure. We can use these genetic variants as instruments in an MR analysis to infer the causal relationship between BMI and blood pressure.

Example

Sanderson et al., 2022, Nature reviews methods primers

Example confounders

  • Age
  • Smoking
  • Diet
  • Alcohol consumption
  • Comorbid diseases
  • Many others…

Assumptions of MR

  1. Relevance: (\(G\) is associated with \(\color{darkgreen}{X}\))
  2. Independence: (\(G\) is not associated with \(U\))
  3. Exclusion restriction: The total effect of \(G\) on \(\color{darkorange}{Y}\) is mediated through \(\color{darkgreen}{X}\)

How to estimate \(\alpha\)

Consider the least squares estimate of the effect of \(X\) on \(Y\)

\[ \color{darkblue}{\hat{\alpha}} = (\color{darkgreen}{X}^t\color{darkgreen}{X})^{-1} \color{darkgreen}{X}^t \color{darkorange}{Y} \]

What if we didn’t use the observed \(\color{darkgreen}{X}\), but a proxy for it using \(G\) ?

Estimation

\[ \hat{\beta} = (G^tG)^{-1}G^{t}\color{darkgreen}{X} \\ \color{darkgreen}{\hat{X}} = G(G^tG)^{-1}G^t\color{darkgreen}{X} \]

We can estimate \(\color{darkblue}{\alpha}\) by plugging in our estimate of \(\color{darkgreen}{X}\)

\[ \begin{align} \color{darkblue}{\hat{\alpha}} &= \bigl((G(G^tG)^{-1}G^t\color{darkgreen}{X})^t(G(G^tG)^{-1}G^t\color{darkgreen}{X})\bigr)^{-1}(G(G^tG)^{-1}G^t\color{darkgreen}{X})^t\color{darkorange}{Y} \\ P_G &= G(G^tG)^{-1}G \\ \color{darkblue}{\hat{\alpha}} &= \bigl( (P_G \color{darkgreen}{X})^t(P_G \color{darkgreen}{X}) \bigr)^{-1} (P_G \color{darkgreen}{X})^t \color{darkorange}{Y} \\ \color{darkblue}{\hat{\alpha}} &= \bigl( \color{darkgreen}{X}^t P_G \color{darkgreen}{X} \bigr)^{-1} \color{darkgreen}{X}^t P_G \color{darkorange}{Y} \end{align} \]

Estimation

\[ E\bigl(( \color{darkgreen}{X}^t P_G \color{darkgreen}{X} )^{-1} \color{darkgreen}{X}^t P_G \color{darkorange}{Y} \bigr) \\ \]

\[ \begin{align*} &= E\bigl(E\bigl(( \color{darkgreen}{X}^t P_G \color{darkgreen}{X} )^{-1} \color{darkgreen}{X}^t P_G \color{darkorange}{Y} \mid \color{darkgreen}{X} \bigr) \bigr) \\ &= E\bigl(( \color{darkgreen}{X}^t P_G \color{darkgreen}{X} )^{-1} ( \color{darkgreen}{X}^t P_G \color{darkgreen}{X}) \color{darkblue}{\alpha} + ( \color{darkgreen}{X}^t P_G \color{darkgreen}{X})^{-1} E(\epsilon_y | \color{darkgreen}{X}) \bigr) \\ &= \color{darkblue}{\alpha} * I + E\bigl(( \color{darkgreen}{X}^t P_G \color{darkgreen}{X})^{-1} E(\epsilon_y | \color{darkgreen}{X}) \bigr)\\ &= \color{darkblue}{\alpha} \quad\quad \text{if A1} \end{align*} \]

\(A1\): \(E(\color{darkorange}{Y}|\color{darkgreen}{X}, G) = \color{darkblue}{\alpha} \color{darkgreen}{X}\)

In general A1 is not true in finite samples

The asymptotic bias

To understand the asymptotic bias of the IV estimator, consider the bias:

\[ E\bigl(( \color{darkgreen}{X}^t P_G \color{darkgreen}{X})^{-1} \color{darkgreen}{X^t}P_G\epsilon_y \bigr) \]

  • Expectation is difficult to evaluate, because we can’t move the expectation operator in the inverse (a non-linear function)
  • However, we can approximate this with some asymptotic arguments.

\[ \frac{1}{n}\color{darkgreen}{X^t}P_G\color{darkgreen}{X} \rightarrow Q_{XG} \]

\[ \frac{1}{n}\color{darkgreen}{X^t}P_G\epsilon_y \rightarrow 0 \quad \text{by assumption} \]

Then: \[ E\bigl(( \color{darkgreen}{X}^t P_G \color{darkgreen}{X})^{-1} \color{darkgreen}{X^t}P_G\epsilon_y \bigr) \rightarrow Q^{-1}_{XG} * 0 = 0 \]

This is an estimator of \(\color{darkblue}{\alpha}\) if one has access to \(G\), \(\color{darkgreen}{X}\), and \(\color{darkorange}{Y}\) all at the same time. Often, however, this is not the case.

Two-sample MR

It’s quite natural in MR to ask whether some molecular phenotype mediates the assocation between genotype and phenotype

  • Does gene expression mediate the association between \(G\) and \(\color{darkorange}{Y}\)? (e.g., TWAS)
  • Does protein abundance mediate the association? (e.g. PWAS)
  • Do metabolites mediate the association?

Often, interesting molecular phenotypes are measured in distinct cohorts (e.g., GTEx, Geuvadis) from the GWAS.

Inverse-variance weighted estimator (IVW)

Assume that summary statistics for the following have been generated:

  • Association between \(G\) and \(\color{darkgreen}{X}\) : \(\hat{\beta}\), \(\hat{\sigma^2_x}\),
  • Association between \(G\) and \(\color{darkorange}{Y}\) : \(\hat{\psi}\), \(\hat{\sigma^2_y}\)

\[ \begin{aligned} \hat{\alpha}_{IVW} &= \frac{\sum_{i=1}^{P} W_i \frac{\hat{\psi}_i}{\hat{\beta}_i}}{\sum_{i=1}^{P} W_i} \qquad\text{with}\qquad W_i = \left(\frac{\hat{\beta}}{\hat{\sigma_y}}\right)^2 \end{aligned} \]

How would you estimate this the lm command in R?

What if some of our instruments are invalid?

  1. Perhaps some of the SNPs don’t predict the phenotype
  2. Perhaps some of the SNPs have direct effects on the outcome phenotype

Examining this behavior through simulation

run_sim = function() {
  N = 1000
  P = 1
  beta = rep(0.5, P)
  alpha = 0.5
  U_sd = 0.3
  G = rnorm(N * P, mean = 0, sd = 1)
  U = rnorm(N, mean = 0, sd = U_sd)
  X = G * beta + rnorm(N, mean = 0, sd = sqrt(1.0 - beta ^ 2 - U_sd ^ 2)) + U
  Y = X * alpha + rnorm(N, mean = 0, sd = sqrt(1.0 - alpha ^ 2 * var(X) - U_sd ^ 2)) + U
  return(coef(lm(Y ~ X))[2])
}
 
sims = map_dbl(1:1000, ~run_sim())

Now, with instrumental variables

run_sim_iv = function() {
  N = 1000
  P = 1
  beta = 0.5
  alpha = 0.5
  U_sd = 0.3
  G = rnorm(N * P, mean = 0, sd = 1)
  U = rnorm(N, mean = 0, sd = U_sd)
  X = G * beta + rnorm(N, mean = 0, sd = sqrt(1.0 - beta ^ 2 - U_sd ^ 2)) + U
  Y = X * alpha + rnorm(N, mean = 0, sd = sqrt(1.0 - alpha ^ 2 * var(X) - U_sd ^ 2)) + U


  YG = coef(lm(Y ~ G))[2]
  XG = coef(lm(X ~ G))[2]
  return(YG / XG)
}
 
sims_iv = map_dbl(1:1000, ~run_sim_iv())

Weak instrument simulation

run_sim_iv_weak = function() {
  N = 1000
  P = 1
  beta = 0.03
  alpha = 0.5
  U_sd = 0.3
  G = rnorm(N * P, mean = 0, sd = 1)
  U = rnorm(N, mean = 0, sd = U_sd)
  X = G * beta + rnorm(N, mean = 0, sd = sqrt(1.0 - beta ^ 2 - U_sd ^ 2)) + U
  Y = X * alpha + rnorm(N, mean = 0, sd = sqrt(1.0 - alpha ^ 2 * var(X) - U_sd ^ 2)) + U


  YG = coef(lm(Y ~ G))[2]
  XG = coef(lm(X ~ G))[2]
  return(YG / XG)
}
 
set.seed(3)
sims_iv_weak = map_dbl(1:1000, ~run_sim_iv_weak())

Bias when instruments are weak

sprintf("mean: %f", mean(sims_iv_weak))
[1] "mean: 0.296619"
sprintf("se: %f", sd(sims_iv_weak) / sqrt(1000))
[1] "se: 0.350748"

Median estimator

  • Let \(\color{darkblue}{\hat{\alpha}_j}\) denote the \(jth\) ordered estimate.
  • Simply take the median of these values.
  • Can be improved by taking the inverse of the variance of the estimates as weights.
  • Consistent even if 50% of the instruments are invalid.
  • (See Bowden et al., 2016, Genetic Epidemiology).

MR-RAPS

Consider the likelihood

\[ \mathcal{l}(\color{darkblue}{\alpha}, \beta_1, \ldots, \beta_P) = -0.5 \left[ \sum_{i=1}^{P} \frac{(\hat{\beta}_i - \beta_i)^2}{\sigma^2_{x_i}} + \sum_{i=1}^{P} \frac{(\hat{\psi}_i - \beta_i\, \color{darkblue}{\alpha})^2}{\sigma^2_{y_i}} \right] \]

After profiling out the \(\beta\) terms:

\[ \mathcal{l}(\color{darkblue}{\alpha}) = -0.5 \sum_{i=1}^{P}\frac{(\hat{\psi}_i - \color{darkblue}{\alpha}\,\hat{\beta}_i)^2}{\sigma^2_{X_i}\color{darkblue}{\alpha}^2 + \sigma^2_{Y_i}} \]

Zhao et al., 2020, Annals of Statistics.

Robust likelihood

\[ \mathcal{l}(\color{darkblue}{\alpha}) = -0.5 \sum_{i=1}^{P}\frac{\rho\Bigl(\hat{\psi}_i - \color{darkblue}{\alpha}\,\hat{\beta}_i\Bigr)}{\sqrt{\sigma^2_{X_i}\,\color{darkblue}{\alpha}^2 + \sigma^2_{Y_i}}} \]

with \(\rho\) defined as some “robust” loss function, e.g., the Huber loss:

\[ \rho(r; k) = \begin{cases} \frac{1}{2}r^2, & \text{if } |r| \le k,\\[1ex] k\,|r| - \frac{1}{2}k^2, & \text{if } |r| > k. \end{cases} \]

Returning to pleiotropy

Zhao et al., 2020, AOS

Complex traits are highly pleiotropic

Table 1, Watanabe et al., Nature Genetics, 2019

Complex traits are highly polygenic

Figure 1, Zhang et al., Nature Communications, 2022

Correlated and Uncorrelated pleiotropy

Figure 1, Morrison et al., Nature Genetics, 2020

MR-CAUSE

Methods, Morrison et al., Nature Genetics, 2020

Pleiotropy in MR-RAPS

Assume: \(\psi - \alpha * \beta \sim N(0, \tau^2)\)

Then:

\[ \mathcal{l}(\alpha) = -0.5 * \sum_{i=1}^{P}\frac{(\hat{\psi}_i - \alpha \hat{\beta})^2}{\sigma^2_{X_i}\alpha^2 + \sigma^2_{Y_i} + \color{red}{\hat{\tau}^2}} \]

Sparsity assumptions on uncorrelated pleiotropy terms

Berzuini et al., 2020, Biostatistics

Bayesian inference of pleiotropy terms

Consider the following structural equation models: \[ \begin{align} \color{darkgreen}{X_i} &= \mu_x + G_i\beta + U_i\gamma + \epsilon_{x_i} \\ \color{darkorange}{Y_i} &= \mu_y + \color{darkgreen}{X_i}\color{darkblue}{\alpha} + U_i\delta + \color{red}{G_{i} \theta} + \epsilon_{y_i} \end{align} \]

What probabilistic assumptions can we place on \(\theta\) to make these terms identifiable?

Sparse prior on \(\theta\)

\[ \theta_j \sim \mathcal{N}(0, \lambda_j^2 \tau^2) \\ \lambda_j \sim \mathcal{C}^{+}(0, 1) \\ \tau \sim \mathcal{C}^{+}(0, 1) \]

What is this doing?

Carvalho et al., 2009, JMLR

Posterior computation performed with Hamiltonian Monte-Carlo implemented in the Stan probabilistic programming language.

MR-PRESSO

Verbanck et al., 2018, Nature Genetics

MR-PRESSO

Figure 1a, Verbanck et al.

MR-PRESSO