Shivam Sharma
Graduate Trainee, Georgia Institute of Technology
10 active projects
GeneticAncestryInference
Scientific Questions Being Studied
As a demonstration project, this project will describe, characterize and, validate the extent of diversity in the All of Us cohort with respect to the participants' race & ethnicity (which are socially defined), and genetic ancestry (which can be objectively inferred from participants' genome). Socially defined race & ethnicity and genetically inferred ancestry are both relevant to health outcomes. Race & ethnicity shape individuals’ lived experience and social environment, eg structural inequities, environmental injustice, and barriers to healthcare access. Genetic ancestry can affect health outcomes via differences in the frequencies of variants associated with disease and drug response. Specifically, we will ask:
1. What is the extent of racial, ethnic, and genetic diversity in the All of Us cohort?
2. How do genetic ancestry and admixture change over geography and with age in the US?
3. Are there associations between genetic ancestry and health outcomes in the All of Us cohort?
Project Purpose(s)
- Population Health
- Methods Development
- Ancestry
- Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)
Scientific Approaches
To characterize the diversity of the All of Us cohort, we analyzed participant genetic, demographic, and geographic data.
Here is a brief list of methods used:
1. All of Us participant genome-wide genotype was merged and harmonized with global reference population data.
2. Unsupervised clustering analysis techniques - Hopkins statistic, visual assessment of clustering tendency, K-means clustering & UMAP - to assess the extent of genetic structure in the cohort.
3. Supervised genetic ancestry inference using global reference populations, principal components analysis, and the Rye (Rapid ancestrY Estimation) program.
4. Genetic ancestry was compared to participants' self-identified race & ethnicity.
5. Geocoded data and participant age were used to measure how genetic ancestry and admixture vary with respect to participant geography and age.
6. Admixture regression to associate participant health outcomes, gleaned from electronic health records, with their genetic ancestry.
Anticipated Findings
1. The All of Us participant cohort will be racially, ethnically, and genetically diverse, consistent with the project’s aim to recruit underrepresented biomedical research groups in support of health equity.
2. All of Us participant genetic variation will be highly structured and best modeled by clusters rather than a continuum of variation.
3. All of Us participants’ will show patterns of genetically inferred ancestry that are correlated with their socially defined ancestry (i.e. race and ethnicity).
4. All of Us participants’ genetic ancestry and admixture will change over geography and with age.
5. All of Us participants’ genetic ancestry will be associated with a variety of health outcomes.
Demographic Categories of Interest
- Race / Ethnicity
Data Set Used
Controlled TierGeoDisparity
Scientific Questions Being Studied
Some specific questions I want to answer are to study how exactly chronic kidney disease varies between different ethnic and racial groups? This is important especially when considering health disparities and their impacts on different communities. Being able to look at multiple factors to take into consideration is a key part of creating this workspace and working on this research. I also want to know how this can be stopped or slowed down, as the grand goal is to be able to promote better living and access to health resources so that this problem can be solved or at least stalled.
Project Purpose(s)
- Population Health
- Ancestry
Scientific Approaches
First I will make a workspace that will have a cohort of individuals who listed having chronic kidney disease or chronic kidney failure. Then I will use this data to make graphs noting the prevalence of these diseases across different ethnic groups. This prevalence chart will also be used to make charts that take the different factors for developing CKD and tells which is statistically more likely to cause CKD prevalence in communities. I will then use this data to compare the different formulas of diagnosing CKD, and the traditional method, to see which one best diagnoses CKD but also aligns with diagnosing minorities with CKD.
Anticipated Findings
My anticipated findings should show that certain groups are more prone to developing CKD, but also to really getting which factors contribute to them. What I really want to see are there creatinine levels, and to really find a direct relation from creatinine levels of individuals to their diagnosis of CKD. These data sources would be able to bring newer and relevant data in being able to diagnoses CKD in individuals, but also be able to tell the issues relating to how prevalent and rampant this disease is in certain communities.
Demographic Categories of Interest
- Race / Ethnicity
Data Set Used
Controlled TierApportionment of Human Genomic Diversity
Scientific Questions Being Studied
For this study, we are attempting to reconcile two distinct views of human genetic diversity (1) Variation-partition analysis by Lewtonin, (2) Classification analysis by AWF Edward. By now, it is well appreciated that the dissonance between variance-partitioning and classification rests on the difference between single-locus versus multilocus approaches to genetic diversity14. In our project, we extended a previously developed approach for variance-partitioning using multiple loci to evaluate how variance-partitioning changes with the number of loci being considered. After considering the apportionment of genetic variation at two levels - Within and between groups and then three levels - Within populations, Among populations within regional groups, and Between populations.
Project Purpose(s)
- Population Health
- Ancestry
Scientific Approaches
Partitioning variation, as in multifactorial ANOVA, is particularly powerful for testing hypotheses in a complex ecological system. As the populations are considered subgroups of Super populations, we have to extend MANOVA (Multivariate ANOVA) with Nested ANOVA on All of Us datset. In place of traditional ANOVA (Within and Between populations), we include Among groups, Among Populations within groups, and within populations. Preliminary normalization norms (MinMax Normalization) can be further analyzed before the downstream statistics.
Anticipated Findings
Apportionment of Human Genomic diversity between populations is drastically larger compared to within populations.
Demographic Categories of Interest
- Race / Ethnicity
Data Set Used
Registered TierResearch Team
Owner:
- Shivam Sharma - Graduate Trainee, Georgia Institute of Technology
- Geetha Priyanka Yerradoddi - Graduate Trainee, Georgia Institute of Technology
Chronic Kidney Disease
Scientific Questions Being Studied
Some specific questions I want to answer are to study how exactly chronic kidney disease varies between different ethnic and racial groups? This is important especially when considering health disparities and their impacts on different communities. Being able to look at multiple factors to take into consideration is a key part of creating this workspace and working on this research. I also want to know how this can be stopped or slowed down, as the grand goal is to be able to promote better living and access to health resources so that this problem can be solved or at least stalled.
Project Purpose(s)
- Disease Focused Research (Chronic Kidney Disease )
- Population Health
- Social / Behavioral
- Drug Development
- Methods Development
- Ancestry
Scientific Approaches
First I will make a workspace that will have a cohort of individuals who listed having chronic kidney disease or chronic kidney failure. Then I will use this data to make graphs noting the prevalence of these diseases across different ethnic groups. This prevalence chart will also be used to make charts that take the different factors for developing CKD and tells which is statistically more likely to cause CKD prevalence in communities. I will then use this data to compare the different formulas of diagnosing CKD, and the traditional method, to see which one best diagnoses CKD but also aligns with diagnosing minorities with CKD.
Anticipated Findings
My anticipated findings should show that certain groups are more prone to developing CKD, but also to really getting which factors contribute to them. What I really want to see are there creatinine levels, and to really find a direct relation from creatinine levels of individuals to their diagnosis of CKD. These data sources would be able to bring newer and relevant data in being able to diagnoses CKD in individuals, but also be able to tell the issues relating to how prevalent and rampant this disease is in certain communities.
Demographic Categories of Interest
- Race / Ethnicity
- Age
- Sex at Birth
- Gender Identity
- Sexual Orientation
- Geography
- Disability Status
- Access to Care
- Education Level
- Income Level
Data Set Used
Controlled TierResearch Team
Owner:
- Shivam Sharma - Graduate Trainee, Georgia Institute of Technology
- Jeremiah Longino - Other, Georgia Institute of Technology
GeneticAncestryDemoProject
Scientific Questions Being Studied
As a demonstration project, this project will describe, characterize and, validate the extent of diversity in the All of Us cohort with respect to the participants' race & ethnicity (which are socially defined), and genetic ancestry (which can be objectively inferred from participants' genome). Socially defined race & ethnicity and genetically inferred ancestry are both relevant to health outcomes. Race & ethnicity shape individuals’ lived experience and social environment, eg structural inequities, environmental injustice, and barriers to healthcare access. Genetic ancestry can affect health outcomes via differences in the frequencies of variants associated with disease and drug response. Specifically, we will ask:
1. What is the extent of racial, ethnic, and genetic diversity in the All of Us cohort?
2. How do genetic ancestry and admixture change over geography and with age in the US?
3. Are there associations between genetic ancestry and health outcomes in the All of Us cohort?
Project Purpose(s)
- Population Health
- Methods Development
- Ancestry
- Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)
Scientific Approaches
To characterize the diversity of the All of Us cohort, we analyzed participant genetic, demographic, and geographic data.
Here is a brief list of methods used:
1. All of Us participant genome-wide genotype was merged and harmonized with global reference population data.
2. Unsupervised clustering analysis techniques - Hopkins statistic, visual assessment of clustering tendency, K-means clustering & UMAP - to assess the extent of genetic structure in the cohort.
3. Supervised genetic ancestry inference using global reference populations, principal components analysis, and the Rye (Rapid ancestrY Estimation) program.
4. Genetic ancestry was compared to participants' self-identified race & ethnicity.
5. Geocoded data and participant age were used to measure how genetic ancestry and admixture vary with respect to participant geography and age.
6. Admixture regression to associate participant health outcomes, gleaned from electronic health records, with their genetic ancestry.
Anticipated Findings
1. The All of Us participant cohort will be racially, ethnically, and genetically diverse, consistent with the project’s aim to recruit underrepresented biomedical research groups in support of health equity.
2. All of Us participant genetic variation will be highly structured and best modeled by clusters rather than a continuum of variation.
3. All of Us participants’ will show patterns of genetically inferred ancestry that are correlated with their socially defined ancestry (i.e. race and ethnicity).
4. All of Us participants’ genetic ancestry and admixture will change over geography and with age.
5. All of Us participants’ genetic ancestry will be associated with a variety of health outcomes.
Demographic Categories of Interest
- Race / Ethnicity
Data Set Used
Controlled TierResearch Team
Owner:
- Shivam Sharma - Graduate Trainee, Georgia Institute of Technology
- Ashley Green - Project Personnel, All of Us Program Operational Use
GeneticAncestry
Scientific Questions Being Studied
As a demonstration project, this project will describe, characterize and, validate the extent of diversity in the All of Us cohort with respect to the participants' race & ethnicity (which are socially defined), and genetic ancestry (which can be objectively inferred from participants' genome). Socially defined race & ethnicity and genetically inferred ancestry are both relevant to health outcomes. Race & ethnicity shape individuals’ lived experience and social environment, eg structural inequities, environmental injustice, and barriers to healthcare access. Genetic ancestry can affect health outcomes via differences in the frequencies of variants associated with disease and drug response. Specifically, we will ask:
1. What is the extent of racial, ethnic, and genetic diversity in the All of Us cohort?
2. How do genetic ancestry and admixture change over geography and with age in the US?
3. Are there associations between genetic ancestry and health outcomes in the All of Us cohort?
Project Purpose(s)
- Population Health
- Methods Development
- Ancestry
- Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)
Scientific Approaches
To characterize the diversity of the All of Us cohort, we analyzed participant genetic, demographic, and geographic data.
Here is a brief list of methods used:
1. All of Us participant genome-wide genotype was merged and harmonized with global reference population data.
2. Unsupervised clustering analysis techniques - Hopkins statistic, visual assessment of clustering tendency, K-means clustering & UMAP - to assess the extent of genetic structure in the cohort.
3. Supervised genetic ancestry inference using global reference populations, principal components analysis, and the Rye (Rapid ancestrY Estimation) program.
4. Genetic ancestry was compared to participants' self-identified race & ethnicity.
5. Geocoded data and participant age were used to measure how genetic ancestry and admixture vary with respect to participant geography and age.
6. Admixture regression to associate participant health outcomes, gleaned from electronic health records, with their genetic ancestry.
Anticipated Findings
1. The All of Us participant cohort will be racially, ethnically, and genetically diverse, consistent with the project’s aim to recruit underrepresented biomedical research groups in support of health equity.
2. All of Us participant genetic variation will be highly structured and best modeled by clusters rather than a continuum of variation.
3. All of Us participants’ will show patterns of genetically inferred ancestry that are correlated with their socially defined ancestry (i.e. race and ethnicity).
4. All of Us participants’ genetic ancestry and admixture will change over geography and with age.
5. All of Us participants’ genetic ancestry will be associated with a variety of health outcomes.
Demographic Categories of Interest
- Race / Ethnicity
Data Set Used
Controlled TierResearch Team
Owner:
- Shivam Sharma - Graduate Trainee, Georgia Institute of Technology
Collaborators:
- Vincent Lam - Research Fellow, National Institutes of Health (NIH)
- Sonali Gupta - Research Assistant, National Institutes of Health (NIH)
UnsupervisedRye
Scientific Questions Being Studied
To implement an extension analysis of the unsupervised clustering in our existing method Rye: Genetic ancestry inference at biobank scale. Unsupervised clustering will use PC data.
Project Purpose(s)
- Methods Development
- Ancestry
Scientific Approaches
We will use All of Us genetic data which be be used in Plink v2 to generate PC vector data. This PC data can also similarly be obtained for 1000 Genome population which can be potentially used as reference points within the unsupervised analysis.
Anticipated Findings
We expect to find that unsupervised points can identify reference points successfully and generate meaningful clusters in the genetic PC data.
Demographic Categories of Interest
- Race / Ethnicity
Data Set Used
Controlled TierGeneBurdenAdmixedPopulations
Scientific Questions Being Studied
The objective of this aim is to identify genes with a high burden of evolutionarily constrained variants for each disease and ancestry group. This will be done by grouping individuals by dominant ancestry or if they are admixed, and then performing rare variant gene-burden tests on each group for each disease. A mask will be applied to the exome variants to account for those with a high Combined Annotation-Dependent Depletion (CADD) score, indicating that the variants are more likely to be deleterious and potentially constrained by evolution. An appropriate regression model that accounts for imbalanced case/control ratios will be selected. The gene-trait associations identified for each group and disease will be compared for level of significance and direction of effect.
Project Purpose(s)
- Population Health
- Ancestry
Scientific Approaches
We will first characterize global and local ancestries of All of Us participants. We will use these to define admixed (two way between European, African, and native American) individuals and non-admixed individuals. We will the define masks for the burden tests which will help us perform burden testing. Burden tests will then be performed between these cohorts.
Anticipated Findings
Burdens between admixed and non-admixed populations would differ, it will be exciting to know how burden changes with ancestry estimates between admixed and non-admixed populations.
Demographic Categories of Interest
- Race / Ethnicity
Data Set Used
Controlled TierPharmacogenomics
Scientific Questions Being Studied
Our project aims to analyze the utility of self-identified race and ethnicity labels in genetically-informed drug predictions for different individuals.
Project Purpose(s)
- Population Health
- Drug Development
Scientific Approaches
We are going to use whole genome sequencing data filtered for only pharmacogenomics variants. We will then employ principal component analysis followed by a suite of machine learning models to predict SIRE labels using PC data.
Anticipated Findings
We anticipate the PC vectors will be able to capture the pharmacogenomics variation and predict the SIRE labels in the All of Us dataset.
Demographic Categories of Interest
- Race / Ethnicity
Data Set Used
Controlled TierChronicKidneyDisease
Scientific Questions Being Studied
We want to investigate if chronic kidney disease has disparate prevalence in the United States. If yes, then what could be the genetic factors that contribute to this disparity and how analyze how the environmental factors might be confounding these results.
Project Purpose(s)
- Disease Focused Research (Chronic kidney disease)
- Population Health
- Ancestry
Scientific Approaches
We will be performing fine-scale local ancestry painting using the genotype data. This will let us use admixture mapping to narrow down ancestral loci on the human genome that might be responsible for chronic kidney disease.
Anticipated Findings
We anticipate to find genetic loci associated with different ancestries like: African, European etc. The loci, can then be tested in causal statistical models to test for causality.
Demographic Categories of Interest
- Race / Ethnicity
Data Set Used
Controlled TierYou can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.