The PGS Catalog Calculator
The Polygenic Score (PGS) Catalog Calculator reproducibly calculates PGS, conducts genetic similarity analysis, and adjusts calculated scores in the context of genetic ancestry.
​
To install the calculator, go to: https://github.com/PGScatalog/pgsc_calc
​
To read the documentation, go to: https://pgsc-calc.readthedocs.io/
INTERVENE supports the development of the PGS Catalog Calculator
An important issue concerning the use of PGS, is that many have ancestry biases and transferability problems. PGS are typically developed using cohorts with predominantly European ancestry, which causes reduced predictive performance[1] when applied to non-European ancestry individuals. Recent updates to the PGS Catalog[2] have helped to improve awareness about ancestry biases and transferability problems, and the diversity of cohorts used to develop and evaluate new PGS is steadily improving.
​
The mean and variance of a PGS can differ across different genetic ancestry groups; this difference doesn’t necessarily reflect true variance in risk (i.e., changes in prevalence or biomarker values). Instead, these differences are driven by differing linkage disequilibrium patterns and allelic frequencies across groups. Therefore, incorporating genetic ancestry information when calculating PGS is necessary to mitigate an important statistical artefact and to represent different PGS on a shared interpretable scale.
​
INTERVENE supports the development of the PGS Catalog Calculator and INTERVENE biobank analysts have deployed the workflow at more than 10 diverse biobank studies and trusted research environments across the globe, demonstrating its portability, global reach and value for population scale genetic research.
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
Visual summary of the PGS Catalog Calculator (pgsc_calc) pipeline, displaying the inputs, outputs, and data flows between modules
The calculator has several features that facilitate its use across biobank studies:
-
Users can bring “code to the data'' without having to move sensitive data.
-
It supports offline environments and is scalable to population scale biobanks studies.
-
The calculated scores are provided in a simple format to allow domain experts without significant genomics expertise (e.g. Artificial Intelligence data scientists) to integrate these results into other steps of genetic prediction studies.
-
Extensive documentation is available, and users are supported by an active issue tracker and discussion forum for interactive communication.
The pipeline is developed and maintained by the PGS Catalog team under a permissive software license (Apache v2.0). The code is open source and available on Github which also hosts the documentation.
​
INTERVENE analysts have analysed data from from e.g. the All of Us research programme -one million or more people living in the United States; the Hunt Study - a longitudinal population health study in Norway; Genes & Health - a community-based genetics study focusing on people of Pakistani and Bangladeshi heritage in East London and The China Kadoorie Biobank - a large prospective cohort study conducted by the UK and China.
​
To read about the tool and the latest development of the PGS catalog, see our recent article:
​1. Martin A. et al., 2019. PMID: 30926966​