GWAS VI: Polygenic risk scores and LD score regression

BIOS 770

Josh Weinstock, PhD

Assistant Professor, Department of Human Genetics, Emory University

Learning objectives

  1. Define polygenic risk scores (PRS)
  2. Discuss common PRS methods
  3. Discuss challenges in PRS portability and interpretation
  4. Discuss linkage disequilibrium score regression (LDSR)

Genotypes → Phenotypes: Recap of GWAS

Continuous Trait
\[ \color{darkblue}{Y}_i \sim \mathcal{N}(\color{darkgreen}{X}_i \beta, \sigma^2) \]

Binary Trait
\[ \color{darkblue}{Y}_i \sim \text{Bernoulli}(\sigma(\color{darkgreen}{X}_i \beta)) \]

  • \(\color{darkgreen}{X}_i\): Genotype vector (e.g., SNP counts 0/1/2)
  • \(\beta\): Effect sizes from GWAS
  • \(\sigma\): Sigmoid function for binary outcomes

Interpretation

  • \(\beta\) represents marginal effects of genotypes on phenotypes
  • PRS = \(\color{darkgreen}{X}_i \hat{\beta}\) (weighted sum of risk alleles)

Why create a PRS?

Torkamani et al., Nature Reviews Genetics, 2018

Statistical methods for PRS construction

  1. Clumping and Thresholding
    • Clumping: Remove SNPs in linkage disequilibrium (LD) .
    • Thresholding: Select SNPs with GWAS p < threshold.
  2. Penalized Regression
    • LASSO/Ridge: penalize by L1 or L2 norm to either shrink coefficients or induce sparsity.
  3. Bayesian Methods (PRS-CS, LDpred, BayesR)
    • Bayesian regression with some helpful prior.

Creating a PRS

Wray et al., Nature Reviews Genetics, 2013

Modeling linkage disequlibrium

For genotype matrix \(\color{darkgreen}{X} \in \mathbb{R}^{n \times p}\) (samples \(\times\) SNPs):

Covariance: \[ R = \text{Cov}(\color{darkgreen}{X}) = \mathbb{E}[\color{darkgreen}{X}^T \color{darkgreen}{X}] - \mathbb{E}[\color{darkgreen}{X}]^T \mathbb{E}[\color{darkgreen}{X}] \]

Under common assumptions:

  1. Centered genotypes: \(\mathbb{E}[X] = 0\)
  2. Unit variance: \(\text{Var}(X_{i}) = 1 \forall i\)

Simplifies to Correlation Matrix: \[ R = \text{Cor}(\color{darkgreen}{X}) = \mathbb{E}[\color{darkgreen}{X}^T \color{darkgreen}{X}] \]

Properties of the estimator

Good: \[ \mathbb{E}[\hat{R}] = R \quad \text{(Unbiased)} \]

Bad:

  • When \(P \ge N\), \(\hat{R}\) becomes rank-deficient: \[ \text{rank}(\hat{R}) \leq \min(N,P) = N \]

\(\hat{R}\) may also fail to be positive definite.

GWAS Dimensionality Challenge

UK Biobank Example (10Mb region): \[ \begin{aligned} P &= 1{,}000 \ \text{SNPs} \\ \text{Params in } \hat{R} &= \frac{P(P-1)}{2} = 499{,}500 \\ N &\approx 500{,}000 \ \text{(typical subsample)} \end{aligned} \]

Implications:

  • LD matrix estimation becomes ill-conditioned
  • Matrix inversion \(\hat{R}^{-1}\) unstable
  • Impacts PRS methods and fine-mapping

Recovering joint estimates from summary statistics

  • GWAS estimates the marginal effects.
  • These are likely to be “worse” (for prediction, loci discovery) than joint effects if the SNPs are correlated (highly common)

Simulation: joint vs marginal coefficients

N = 10000
P = 100
M = 20
h2 = 0.20
U <- qr.Q(qr(matrix(rnorm(P ^ 2), P)))
D = seq(0.01, 10, length = P)
Sigma = crossprod(U, U * D)
L = chol(Sigma)
X = matrix(rnorm(N * P), N, P) %*% L
X = scale(X) # ensure variances are 1
beta = rep(0, P)
beta_causal = sample(1:P, size = M)
beta[beta_causal] = rnorm(M, mean = 0, sd = sqrt(h2 / M))
Y = X %*% beta + rnorm(N, 0, sqrt(1.0 - h2))

Joint esimates are better than marginal estimates

beta_joint = coef(lm(Y ~ X + 0))
beta_gwas  = (t(X) %*% Y) / N

summary(lm(beta ~ beta_joint))[["r.squared"]] 
[1] 0.8535646
summary(lm(beta ~ beta_gwas))[["r.squared"]] 
[1] 0.7479689

Using sufficient statistics, convert marginal to joint

\[ \begin{align} \hat{\beta}^{\text{marg}} &= \frac{\color{darkgreen}{X}^t \color{darkblue}{Y}}{N} \\ (\color{darkgreen}{X}^t \color{darkgreen}{X})^{-1} \hat{\beta}^{\text{marg}} &= (\color{darkgreen}{X}^t \color{darkgreen}{X})^{-1}\frac{\color{darkgreen}{X}^t \color{darkblue}{Y}}{N} \\ (N\hat{R})^{-1} \hat{\beta}^{\text{marg}} &= \frac{(\color{darkgreen}{X}^t \color{darkgreen}{X})^{-1} * \color{darkgreen}{X}^t \color{darkblue}{Y}}{N} \\ \hat{R}^{-1} \hat{\beta}^{\text{marg}} &= \hat{\beta}^{joint} \end{align} \]

Asymptotically: \[ \hat{\beta}^{\text{marg}} \sim \mathcal{N}\left(\hat{R}\beta, \frac{\sigma^2}{N}\hat{R} \right) \]

What we get from GWAS (\(\hat{\beta}^{\text{marg}}\)) are the true coefficients (\(\beta\)) “rotated” by LD (\(R\)) + noise

GWAS summary statistics

  1. Often, we don’t have access to individual level data
  2. But we can pair sufficient statistics from GWAS with LD estimates from a population reference panel
  3. This information is sufficient to approximate an individual level likelihood without requiring individual genotypes from the GWAS

In summary so far

  1. GWAS estimates marginal effects
  2. We always want to convert these to joint estimates for prediction purposes
  3. We can actually do this without needing to re-analyze the original individual level GWAS data

Methods for computing PRS

Clumping and thresholding

  1. Perform GWAS
  2. “Clump” correlated variants together / LD-prune
  3. Simply take all betas where the pvalue is less than some threshold
  4. Note, no conversion from marginal to joint here

LDPred: Bayesian linear regression

Prior on the SNP effects: spike and slab

\[ \beta_i \stackrel{\text{iid}}{\sim} \begin{cases} \mathcal{N}\left(0, \dfrac{h_g^2}{M_p}\right) & \text{with probability } p \\ 0 & \text{with probability } (1-p) \end{cases} \]

Posterior expectation without LD

\[ \mathbb{E}\left(\beta_i \mid \hat{\beta}_i^{marg} \right) = \left(\dfrac{h_g^2}{h_g^2 + \dfrac{M_p}{N}}\right) \bar{p}_i \hat{\beta}^{marg}_i \]

Estimates in the presence of LD

\[ \mathbb{E}\left(\beta_i \mid \hat{\beta}_i^{marg}, \mathbf{\hat{R}} \right) = \left(\mathbf{\hat{R}} + I \dfrac{M}{Nh_g^2}\right)^{-1} \hat{\beta}^{marg}_i \]

Simulation

beta_joint = coef(lm(Y ~ X + 0))
beta_gwas  = (t(X) %*% Y) / N
beta_ld_pred_inf = solve(cor(X) + diag(P) * (M / N * h2), beta_gwas)

summary(lm(beta ~ beta_joint))[["r.squared"]]
[1] 0.8535646
summary(lm(beta ~ beta_gwas))[["r.squared"]] 
[1] 0.7479689
summary(lm(beta ~ beta_ld_pred_inf))[["r.squared"]] 
[1] 0.8629468

LDPred induces shrinkage towards origin

Connections to ridge regression

\[ \hat{\beta}^{ridge} = (\color{darkgreen}{X}^t \color{darkgreen}{X} + \lambda I)^{-1}\color{darkgreen}{X}^t \color{darkblue}{Y} \]

\[ \mathbb{E}\left(\beta_i \mid \hat{\beta}_i^{marg}, \mathbf{\hat{R}} \right) = \left(\mathbf{\hat{R}} + I \dfrac{M}{Nh_g^2}\right)^{-1} \hat{\beta}^{marg}_i \]

\[ \lambda \approx \dfrac{M}{Nh_g^2} \]

Lasso derived PRS

Minimize: \[ (\color{darkblue}{Y} - \color{darkgreen}{X}\beta)^t(\color{darkblue}{Y} - \color{darkgreen}{X}\beta) + \lambda||\beta||_1 \]

This can be approximated with summary statistics: \[ \color{darkblue}{y}^t\color{darkblue}{y} + \beta^{t}\hat{R}\beta - 2\beta^{t}N\hat{\beta}^{marg} + \lambda||\beta||_1 \]

Mak et al., Genetic Epidemiology, 2017

Simulation with glmnet as a proxy

library(glmnet)

lasso = glmnet(X, Y, alpha = 1.0)
lasso_estimates = as.vector(coef(lasso, s = 0.01))[2:(P + 1)]

summary(lm(beta ~ beta_joint))[["r.squared"]] 
[1] 0.8535646
summary(lm(beta ~ beta_gwas))[["r.squared"]] 
[1] 0.7479689
summary(lm(beta ~ beta_ld_pred_inf))[["r.squared"]] 
[1] 0.8629468
summary(lm(beta ~ lasso_estimates))[["r.squared"]] 
[1] 0.9905764

PRSCS: Bayesian linear regression with continuous shrinkage prior

\[ \beta_i \sim \mathcal{N}\left(0, \frac{\sigma^2}{N} \lambda_i \tau \right) \\ \lambda_i \sim Gamma(a, \delta_i) \\ \delta_i \sim Gamma(b, 1) \\ \sqrt{\tau} \sim \mathcal{C}^{+}(0, 1) \]

Ge et al., 2019, Nature Communications

PRSCS intuition

\[ \mathbb{E}\left(\beta_i \mid \hat{\beta}_i^{marg}, \mathbf{\hat{R}} \right) = \left(\mathbf{\hat{R}} + \mathbf{T}^{-1} \right)^{-1} \hat{\beta}^{marg}_i \]

No LD case (simplification)

\[ \mathbb{E}\left(\beta_i \mid \hat{\beta}_i^{marg} \right) = \left(1 - \omega_i \right) \hat{\beta}^{marg}_i \\ \omega_i = \frac{1}{(1 + \lambda_i \tau)} \]

\(\omega_i\) prior

SBayesR: Bayesian linear regression with a mixture distribution prior

\[ \beta_j \mid \pi, \sigma_{\beta}^2 = \begin{cases} 0 & \text{with probability } \pi_1, \\ \sim N(0, \gamma_2 \sigma_{\beta}^2) & \text{with probability } \pi_2, \\ \vdots & \vdots \\ \sim N(0, \gamma_C \sigma_{\beta}^2) & \text{with probability } 1 - \sum_{c=1}^{C-1} \pi_c, \end{cases} \]

Lloyd-Jones^, Zeng^ et al., Nature Communications, 2019

SBayesRC: SBayesR, but prior is informed by functional annotations

Figure 1, Zheng et al., 2024, Nature Genetics

Contribution of annotations to mixture probabilities

Non-parametric Bayesian PRS with dirichlet process regression

\[ \sigma^2_\beta \sim \mathbf{G} \\ \mathbf{G} \sim DP(\mathbf{H}, \lambda) \]

  • \(\mathbf{H}\) is a “base distribution”
  • \(\lambda\) is a concentration parameter

Zeng et al., Nature Communications, 2017

“Stick-breaking” interpretation

\[ \beta_{i} \sim \sum_{k=1}^{\infty} \pi_{k} N\left(0, \sigma_{k}^{2}\right) \\ \pi_{k} = \nu_{k} \prod_{l=1}^{k-1} (1 - \nu_{l}), \quad \nu_{k} \sim \mathrm{Beta}\left(1, \lambda\right) \]

Review

  1. PRS methods generally select and/or shink coefficients
  2. It helps to use LD to convert from marginal to joint estimates
  3. Several different Bayesian priors have shown to be useful in different ways
  4. Bayesian methods offer a framework to include external functional annotations

PRS limitations: do PRS transfer across ancestries?

Figure 1., Martin et al., Nature Genetics, 2019

PRS performance decays in test-set different in ancestry from GWAS cohort

Figure 3., Martin et al., Nature Genetics, 2019

Portability is inversely proportional to PCA distance

Figure 1, Ding et al., Nature, 2023

Effect apparent even within discrete ancestry clusters

Figure 3, Ding et al., Nature, 2023

Why does this happen?

  1. Causal variants vary in allele frequency across ancestries
  2. Causal variants vary across ancestries
  3. LD differences across ancestries

Allele frequency and LD variation

Martin et al., Figure 3, Nature Genetics, 2019

Multi-ancestry PRS extensions

Ruan et al., Figure 1, Nature Genetics, 2022

Linkage disequilibrium regression

How can we characterize the mechanisms that manifest in a particular distribution of GWAS test statistics? Why are test statistics frequently inflated?

  1. The trait many have many causal loci (degree of polygenicity)
  2. Inflation or deflation may occur as a consequence of population stratification, invalidation of homoscedascity, poor asymptotic approximations to tail probabilities used to compute pvalues

LD Scores and \(\chi^2\)-Statistics

  • Effect size estimate: \(\hat{\beta}_j^{marg} = \color{darkgreen}{X}_j^\top y/N\)
  • \(\chi_j^2 = N\left(\hat{\beta}^{marg}_j\right)^2\)
  • Define the LD Score of variant \(j\) as: \(\ell_j := \sum_{k=1}^M r_{jk}^2\)

Proposition: Under this model, the expected \(\chi^2\)-statistic is: \[ \mathbb{E}[\chi_j^2] \approx \frac{N h_g^2}{M}\ell_j + 1 \]

Bulk-Sullivan et al., Nature Genetics, 2015

Proof step 1: law of total variance

Decompose \(\mathrm{Var}[\hat{\beta}_j]\): \[ \mathrm{Var}[\hat{\beta}_j] = \underbrace{\mathbb{E}[\mathrm{Var}[\hat{\beta}_j \mid \color{darkgreen}{X}]]}_{\text{Main term}} + \underbrace{\mathrm{Var}[\mathbb{E}[\hat{\beta}_j \mid \color{darkgreen}{X}]]}_{\text{Vanishes}} \] The second term vanishes because \(\mathbb{E}[\hat{\beta}_j \mid \color{darkgreen}{X}] = 0\).

Step 2: conditional variance

Given \(X\), the variance becomes: \[ \mathrm{Var}[\hat{\beta}_j \mid \color{darkgreen}{X}] = \frac{1}{N^2} \color{darkgreen}{X}_j^\top \mathrm{Var}[\color{darkblue}{y} \mid \color{darkgreen}{X}] \color{darkgreen}{X}_j \] Substitute \(y = X= \color{darkgreen}{X}\beta + \epsilon\), and note \(Var(\beta) = I\frac{h_g^2}{M}\): \[ = \frac{1}{N^2} \left(\frac{h_g^2}{M} \color{darkgreen}{X}_j^\top \color{darkgreen}{X} \color{darkgreen}{X}^\top \color{darkgreen}{X}_j + N(1-h_g^2)\right) \]

Step 3: Sample LD scores

Rewrite the key quadratic form: \[ \frac{1}{N^2} \color{darkgreen}{X}_j^\top \color{darkgreen}{X} \color{darkgreen}{X}^\top \color{darkgreen}{X}_j = \sum_{k=1}^M \tilde{r}_{jk}^2 \] where \(\tilde{r}_{jk} := \frac{1}{N}\sum_{i=1}^N \color{darkgreen}{X}_{ij} \color{darkgreen}{X}_{ik}\) (sample LD).

Step 4: Expectation

Using the \(\delta\)-method approximation: \[ \mathbb{E}[\tilde{r}_{jk}^2] \approx r_{jk}^2 + \frac{1 - r_{jk}^2}{N} \] Summing over \(k\) gives: \[ \mathbb{E}\left[\sum_{k=1}^M \tilde{r}_{jk}^2\right] \approx \ell_j + \frac{M - \ell_j}{N} \]

Step 5: Final steps

Combine all terms: \[ \mathbb{E}[\chi_j^2] \approx \frac{N h_g^2}{M} \left(\ell_j + \frac{M - \ell_j}{N}\right) + (1 - h_g^2) \] Simplify for large \(N\): \[ \mathbb{E}[\chi_j^2] \approx \frac{N h_g^2}{M}\ell_j + 1 \]

Extension to SNP annotations

\[ \mathbb{E}[\chi_{j}^{2}] = N c \sum_{C} \tau_{C} \ell(j, C) + N a + 1 \]

Finucane et al., Nature Genetics, 2015

Contribution of annotations to SNP heritability across traits