This can be approximated with summary statistics: \[
\color{darkblue}{y}^t\color{darkblue}{y} + \beta^{t}\hat{R}\beta - 2\beta^{t}N\hat{\beta}^{marg} + \lambda||\beta||_1
\]
Mak et al., Genetic Epidemiology, 2017
Simulation with glmnet as a proxy
library(glmnet)lasso =glmnet(X, Y, alpha =1.0)lasso_estimates =as.vector(coef(lasso, s =0.01))[2:(P +1)]summary(lm(beta ~ beta_joint))[["r.squared"]]
PRS methods generally select and/or shink coefficients
It helps to use LD to convert from marginal to joint estimates
Several different Bayesian priors have shown to be useful in different ways
Bayesian methods offer a framework to include external functional annotations
PRS limitations: do PRS transfer across ancestries?
Figure 1., Martin et al., Nature Genetics, 2019
PRS performance decays in test-set different in ancestry from GWAS cohort
Figure 3., Martin et al., Nature Genetics, 2019
Portability is inversely proportional to PCA distance
Figure 1, Ding et al., Nature, 2023
Effect apparent even within discrete ancestry clusters
Figure 3, Ding et al., Nature, 2023
Why does this happen?
Causal variants vary in allele frequency across ancestries
Causal variants vary across ancestries
LD differences across ancestries
Allele frequency and LD variation
Martin et al., Figure 3, Nature Genetics, 2019
Multi-ancestry PRS extensions
Ruan et al., Figure 1, Nature Genetics, 2022
Linkage disequilibrium regression
How can we characterize the mechanisms that manifest in a particular distribution of GWAS test statistics? Why are test statistics frequently inflated?
The trait many have many causal loci (degree of polygenicity)
Inflation or deflation may occur as a consequence of population stratification, invalidation of homoscedascity, poor asymptotic approximations to tail probabilities used to compute pvalues
Using the \(\delta\)-method approximation: \[
\mathbb{E}[\tilde{r}_{jk}^2] \approx r_{jk}^2 + \frac{1 - r_{jk}^2}{N}
\] Summing over \(k\) gives: \[
\mathbb{E}\left[\sum_{k=1}^M \tilde{r}_{jk}^2\right] \approx \ell_j + \frac{M - \ell_j}{N}
\]
Step 5: Final steps
Combine all terms: \[
\mathbb{E}[\chi_j^2] \approx \frac{N h_g^2}{M} \left(\ell_j + \frac{M - \ell_j}{N}\right) + (1 - h_g^2)
\] Simplify for large \(N\): \[
\mathbb{E}[\chi_j^2] \approx \frac{N h_g^2}{M}\ell_j + 1
\]
Extension to SNP annotations
\[
\mathbb{E}[\chi_{j}^{2}] = N c \sum_{C} \tau_{C} \ell(j, C) + N a + 1
\]
Finucane et al., Nature Genetics, 2015
Contribution of annotations to SNP heritability across traits
Figure 3, Finucane et al.
Contribution of cell types to trait heritability
Figure 6, Finucance et al.
Cross-trait LD score regression
Can we use summary statistics to infer the genetic correlation between traits?\[
\mathbb{E}[z_{1j} z_{2j}] = \frac{\sqrt{N_1 N_2} \, \rho_g}{M} \, t_j + \frac{\rho N_s}{\sqrt{N_1 N_2}}
\]
Bulik-Sullivan & Finucance et al., Nature Genetics, 2015
Cross-trait LD score
Figure 2, Bulik-Sullivan et al.
Effects of assorative mating on cross-trait LD score regression
Figure 1., Border et al., Science, 2022
Forward simulations are consistent with empirical observations
Figure 2., Border et al., Science, 2022
Summary
PRS are estimators of genetic risk/liability/propensity for a particular phenotype
Numerous PRS methods have emerged, all generally bespoke versions of Bayesian or penalized regression
LD score regression is a powerful tool for partitioning heritability within and across phenotypes