Katerina Santiago
Graduate Trainee, Yale University
4 active projects
Updated Colorectal Cancer PRS Analysis
Scientific Questions Being Studied
Colorectal cancer (CRC) is the third most common cancer in the United States and fourth most common cause of cancer death globally. In the United States, colorectal cancer incidence is highest among non-Hispanic black individuals compared to all other races/ethnicities. The genetic information gained from GWAS has been utilized to develop polygenic risk scores (PRS). However, current PRS have far greater predictive value in individuals of European descent compared to other ancestries. There is a critical need to determine the genetic factors that confer colorectal cancer risk; however, it is unclear if these factors differ across ancestry. The overall objective of this project is to determine whether genetic risk for CRC in different genetic ancestral groups can be predicted by the same polygenic risk score. It is hypothesized that a PRS generated from a white European cohort will demonstrate decreased predictive power for colorectal cancer risk in different ancestral groups.
Project Purpose(s)
- Disease Focused Research (Colorectal Cancer)
- Ancestry
Scientific Approaches
We will use published and available genetic summary statistics to construct the CRC PRS, such statistics as now are available in the GWAS catalogue. We will then validate and test the predictive ability of the PRS in a multi-ethnic sample of All of Us study participants as well in homogenous samples of genetic ancestry in the All of Us cohort. The PRS predictive power will then be compared across subpopulations to determine the generalizability of a CRC PRS from a predominately European cohort.
Anticipated Findings
We anticipate to describe the utility of PRSs as a clinical tool given the goal of this study is to assess the predictive ability of a PRS in multiple ethnic groups. PRS have been increasingly used in clinical settings to quantify risk and inform patient recommendations such as changes to lifestyle factors that also contribute to disease outcomes such as CRC risk in this study. However, the reliability of PRSs from large homogenous samples in various genetic ancestral groups is not well understood. Our findings have the potential to support the utility of and inform the caution that is necessitated when using PRSs in clinical settings.
Demographic Categories of Interest
- Race / Ethnicity
Data Set Used
Controlled TierColorectal Cancer PRS Analysis
Scientific Questions Being Studied
Colorectal cancer (CRC) is the third most common cancer in the United States and fourth most common cause of cancer death globally. In the United States, colorectal cancer incidence is highest among non-Hispanic black individuals compared to all other races/ethnicities. The genetic information gained from GWAS has been utilized to develop polygenic risk scores (PRS). However, current PRS have far greater predictive value in individuals of European descent compared to other ancestries. There is a critical need to determine the genetic factors that confer colorectal cancer risk; however, it is unclear if these factors differ across ancestry. The overall objective of this project is to determine whether genetic risk for CRC in different genetic ancestral groups can be predicted by the same polygenic risk score. It is hypothesized that a PRS generated from a white European cohort will demonstrate decreased predictive power for colorectal cancer risk in different ancestral groups.
Project Purpose(s)
- Disease Focused Research (Colorectal Cancer)
- Ancestry
Scientific Approaches
We will use published and available genetic summary statistics to construct the CRC PRS, such statistics as now are available in the GWAS catalogue. We will then validate and test the predictive ability of the PRS in a multi-ethnic sample of All of Us study participants as well in homogenous samples of genetic ancestry in the All of Us cohort. The PRS predictive power will then be compared across subpopulations to determine the generalizability of a CRC PRS from a predominately European cohort.
Anticipated Findings
We anticipate to describe the utility of PRSs as a clinical tool given the goal of this study is to assess the predictive ability of a PRS in multiple ethnic groups. PRS have been increasingly used in clinical settings to quantify risk and inform patient recommendations such as changes to lifestyle factors that also contribute to disease outcomes such as CRC risk in this study. However, the reliability of PRSs from large homogenous samples in various genetic ancestral groups is not well understood. Our findings have the potential to support the utility of and inform the caution that is necessitated when using PRSs in clinical settings.
Demographic Categories of Interest
- Race / Ethnicity
Data Set Used
Controlled TierResearch Team
Owner:
- Lauren Lautermilch - Graduate Trainee, Yale University
- Katerina Santiago - Graduate Trainee, Yale University
Duplicate of X and Y Chromosome Intensity
Scientific Questions Being Studied
Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.
Project Purpose(s)
- Other Purpose (Demonstrate to the All of Us Researcher Workbench users how to get started with the All of Us genomic data and tools. It includes an overview of all the All of Us genomic data and shows some simple examples on how to use these data.)
Scientific Approaches
Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.
Anticipated Findings
Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierDuplicate of How to Get Started with Registered Tier Data (v6)
Scientific Questions Being Studied
We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data.
What should you expect? This notebook will give you an overview of what data is available in the current Curated Data Repository (CDR). It will also teach you how to retrieve information about Electronic Health Record (EHR), Physical Measurements (PM), and Survey data.
Project Purpose(s)
- Educational
- Methods Development
- Other Purpose (This is an All of Us Tutorial Workspace. It is meant to provide instruction for key Researcher Workbench components and All of Us data representation.)
Scientific Approaches
This Tutorial Workspace contains two Jupyter Notebooks (one written in Python, the other in R). Each notebook is divided into the following sections:
1. Setup: How to set up this notebook, install and import software packages, and select the correct version of the CDR.
2. Data Availability Part 1: How to summarize the number of unique participants with major data types: Physical Measurements, Survey, and EHR;
3. Data Availability Part 2: How to delve a little deeper into data availability within each major data type;
4. Data Organization: An explanation of how data is organized according to our common data model.
5. Example Queries: How to directly query the CDR, using two examples of SQL queries to extract demographic data.
6. Expert Tip: How to access the base version of the CDR, for users that want to do their own cleaning.
Anticipated Findings
By reading and running the notebooks in this Tutorial Workspace, you will understand the following:
All of Us data are made available in a Curated Data Repository. Participants may contribute any combination of survey, physical measurement, and electronic health record data. Not all participants contribute all possible data types. Each unique piece of health information is given a unique identifier called a concept_id and organized into specific tables according to our common data model. You can use these concept_ids to query the CDR and pull data on specific health information relevant to your analysis. See our support article Learning the Basics of the All of Us Dataset for more info.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Registered TierYou can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.