Spencer Boris
Undergraduate Student, Brigham Young University
3 active projects
Duplicate of Germline Mutations that Increase Cancer Risk
Scientific Questions Being Studied
Previous studies have reported numerous, heritable gene variants that can increase risk of developing cancer. We look to increase understanding of these gene variants and their connection to cancer risk in a more diverse population. We are also interested in exploring how these variants connect to other reported health problems in individuals who later develop cancer. Specifically, we intend to ask the following questions:
1. Are harmful, germline, gene variants a good predictor of whether or not an individual will develop cancer during their life?
2. Are there other commonly reported health problems that can be linked to greater risk for cancer in people with these gene variants?
3. Do these findings hold across a diverse population?
Project Purpose(s)
- Disease Focused Research (cancer)
- Ancestry
Scientific Approaches
We will create workflows that will align, intersect, extract, integrate, and analyze known predisposition cancer mutations in the All of Us cohort.
Align: We will use the UCSC genome browser tool to ensure that predisposition variants match the human reference build of All of Us.
Intersect: We will use “bedtools intersect” and “BigQuery” to identify predisposition variants in whole genome sequencing mutation files (VCFs).
Extract: We will store all suspected cancer predisposition variants as a first “data freeze”. This dataset will be our “training-set”. We will use subsequent All of Us data releases as “test-sets” for any novel associations or statical models we identify.
Integrate: Using genomics data and insurance billing codes, we will visualize the relationships between predisposition variants, cancer occurrences, and other reported health problems.
Analyze: We will build custom scripts in Python and R to identify associations found when combining genomics and phenotypic data.
Anticipated Findings
We expect to see that the presence of pathogenic gene variants can help predict a person’s risk for developing cancer. We anticipate that this finding will hold across a diverse population. We also expect to find other frequently reported health problems that associate with increased occurrence of cancer.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierDuplicate of How to Work with All of Us Genomic Data (Hail - Plink)(v7)
Scientific Questions Being Studied
Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.
Project Purpose(s)
- Other Purpose (Demonstrate to the All of Us Researcher Workbench users how to get started with the All of Us genomic data and tools. It includes an overview of all the All of Us genomic data and shows some simple examples on how to use these data.)
Scientific Approaches
Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.
Anticipated Findings
Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierPhe-WAS | ERAP2, HLA-DQB1, PPIL3
Scientific Questions Being Studied
I will be performing a phenome-wide association study that focuses on three genes: ERAP2, HLA-DQB1, and PPIL3. I will look at variations of these genes in participants within the All of Us dataset and compare the variations to diagnosed diseases that these participants have to determine if there are connections between them.
Project Purpose(s)
- Ancestry
Scientific Approaches
I plan to start by gathering the data of all participants within the All of Us dataset who have genetic data and locate those three genes within each of their genomes. From there I will track variants within those genes and perform an analysis of their association with various diseases.
Anticipated Findings
I anticipate finding connections between certain variants and diseases based on the nature of these three genes. This would contribute to the body of scientific knowledge by providing evidence of a correlation between genetic variants and specific disesases.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierYou can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.