Shivam Sharma

Graduate Trainee, Georgia Institute of Technology

10 active projects

GeneticAncestryInference

As a demonstration project, this project will describe, characterize and, validate the extent of diversity in the All of Us cohort with respect to the participants' race & ethnicity (which are socially defined), and genetic ancestry (which can be objectively…

Scientific Questions Being Studied

As a demonstration project, this project will describe, characterize and, validate the extent of diversity in the All of Us cohort with respect to the participants' race & ethnicity (which are socially defined), and genetic ancestry (which can be objectively inferred from participants' genome). Socially defined race & ethnicity and genetically inferred ancestry are both relevant to health outcomes. Race & ethnicity shape individuals’ lived experience and social environment, eg structural inequities, environmental injustice, and barriers to healthcare access. Genetic ancestry can affect health outcomes via differences in the frequencies of variants associated with disease and drug response. Specifically, we will ask:

1. What is the extent of racial, ethnic, and genetic diversity in the All of Us cohort?

2. How do genetic ancestry and admixture change over geography and with age in the US?

3. Are there associations between genetic ancestry and health outcomes in the All of Us cohort?

Project Purpose(s)

  • Population Health
  • Methods Development
  • Ancestry
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)

Scientific Approaches

To characterize the diversity of the All of Us cohort, we analyzed participant genetic, demographic, and geographic data.

Here is a brief list of methods used:

1. All of Us participant genome-wide genotype was merged and harmonized with global reference population data.

2. Unsupervised clustering analysis techniques - Hopkins statistic, visual assessment of clustering tendency, K-means clustering & UMAP - to assess the extent of genetic structure in the cohort.

3. Supervised genetic ancestry inference using global reference populations, principal components analysis, and the Rye (Rapid ancestrY Estimation) program.

4. Genetic ancestry was compared to participants' self-identified race & ethnicity.

5. Geocoded data and participant age were used to measure how genetic ancestry and admixture vary with respect to participant geography and age.

6. Admixture regression to associate participant health outcomes, gleaned from electronic health records, with their genetic ancestry.

Anticipated Findings

1. The All of Us participant cohort will be racially, ethnically, and genetically diverse, consistent with the project’s aim to recruit underrepresented biomedical research groups in support of health equity.

2. All of Us participant genetic variation will be highly structured and best modeled by clusters rather than a continuum of variation.

3. All of Us participants’ will show patterns of genetically inferred ancestry that are correlated with their socially defined ancestry (i.e. race and ethnicity).

4. All of Us participants’ genetic ancestry and admixture will change over geography and with age.

5. All of Us participants’ genetic ancestry will be associated with a variety of health outcomes.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Shivam Sharma - Graduate Trainee, Georgia Institute of Technology

GeoDisparity

Some specific questions I want to answer are to study how exactly chronic kidney disease varies between different ethnic and racial groups? This is important especially when considering health disparities and their impacts on different communities. Being able to look…

Scientific Questions Being Studied

Some specific questions I want to answer are to study how exactly chronic kidney disease varies between different ethnic and racial groups? This is important especially when considering health disparities and their impacts on different communities. Being able to look at multiple factors to take into consideration is a key part of creating this workspace and working on this research. I also want to know how this can be stopped or slowed down, as the grand goal is to be able to promote better living and access to health resources so that this problem can be solved or at least stalled.

Project Purpose(s)

  • Population Health
  • Ancestry

Scientific Approaches

First I will make a workspace that will have a cohort of individuals who listed having chronic kidney disease or chronic kidney failure. Then I will use this data to make graphs noting the prevalence of these diseases across different ethnic groups. This prevalence chart will also be used to make charts that take the different factors for developing CKD and tells which is statistically more likely to cause CKD prevalence in communities. I will then use this data to compare the different formulas of diagnosing CKD, and the traditional method, to see which one best diagnoses CKD but also aligns with diagnosing minorities with CKD.

Anticipated Findings

My anticipated findings should show that certain groups are more prone to developing CKD, but also to really getting which factors contribute to them. What I really want to see are there creatinine levels, and to really find a direct relation from creatinine levels of individuals to their diagnosis of CKD. These data sources would be able to bring newer and relevant data in being able to diagnoses CKD in individuals, but also be able to tell the issues relating to how prevalent and rampant this disease is in certain communities.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Shivam Sharma - Graduate Trainee, Georgia Institute of Technology

Apportionment of Human Genomic Diversity

For this study, we are attempting to reconcile two distinct views of human genetic diversity (1) Variation-partition analysis by Lewtonin, (2) Classification analysis by AWF Edward. By now, it is well appreciated that the dissonance between variance-partitioning and classification rests…

Scientific Questions Being Studied

For this study, we are attempting to reconcile two distinct views of human genetic diversity (1) Variation-partition analysis by Lewtonin, (2) Classification analysis by AWF Edward. By now, it is well appreciated that the dissonance between variance-partitioning and classification rests on the difference between single-locus versus multilocus approaches to genetic diversity14. In our project, we extended a previously developed approach for variance-partitioning using multiple loci to evaluate how variance-partitioning changes with the number of loci being considered. After considering the apportionment of genetic variation at two levels - Within and between groups and then three levels - Within populations, Among populations within regional groups, and Between populations.

Project Purpose(s)

  • Population Health
  • Ancestry

Scientific Approaches

Partitioning variation, as in multifactorial ANOVA, is particularly powerful for testing hypotheses in a complex ecological system. As the populations are considered subgroups of Super populations, we have to extend MANOVA (Multivariate ANOVA) with Nested ANOVA on All of Us datset. In place of traditional ANOVA (Within and Between populations), we include Among groups, Among Populations within groups, and within populations. Preliminary normalization norms (MinMax Normalization) can be further analyzed before the downstream statistics.

Anticipated Findings

Apportionment of Human Genomic diversity between populations is drastically larger compared to within populations.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Registered Tier

Research Team

Owner:

Chronic Kidney Disease

Some specific questions I want to answer are to study how exactly chronic kidney disease varies between different ethnic and racial groups? This is important especially when considering health disparities and their impacts on different communities. Being able to look…

Scientific Questions Being Studied

Some specific questions I want to answer are to study how exactly chronic kidney disease varies between different ethnic and racial groups? This is important especially when considering health disparities and their impacts on different communities. Being able to look at multiple factors to take into consideration is a key part of creating this workspace and working on this research. I also want to know how this can be stopped or slowed down, as the grand goal is to be able to promote better living and access to health resources so that this problem can be solved or at least stalled.

Project Purpose(s)

  • Disease Focused Research (Chronic Kidney Disease )
  • Population Health
  • Social / Behavioral
  • Drug Development
  • Methods Development
  • Ancestry

Scientific Approaches

First I will make a workspace that will have a cohort of individuals who listed having chronic kidney disease or chronic kidney failure. Then I will use this data to make graphs noting the prevalence of these diseases across different ethnic groups. This prevalence chart will also be used to make charts that take the different factors for developing CKD and tells which is statistically more likely to cause CKD prevalence in communities. I will then use this data to compare the different formulas of diagnosing CKD, and the traditional method, to see which one best diagnoses CKD but also aligns with diagnosing minorities with CKD.

Anticipated Findings

My anticipated findings should show that certain groups are more prone to developing CKD, but also to really getting which factors contribute to them. What I really want to see are there creatinine levels, and to really find a direct relation from creatinine levels of individuals to their diagnosis of CKD. These data sources would be able to bring newer and relevant data in being able to diagnoses CKD in individuals, but also be able to tell the issues relating to how prevalent and rampant this disease is in certain communities.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

GeneticAncestryDemoProject

As a demonstration project, this project will describe, characterize and, validate the extent of diversity in the All of Us cohort with respect to the participants' race & ethnicity (which are socially defined), and genetic ancestry (which can be objectively…

Scientific Questions Being Studied

As a demonstration project, this project will describe, characterize and, validate the extent of diversity in the All of Us cohort with respect to the participants' race & ethnicity (which are socially defined), and genetic ancestry (which can be objectively inferred from participants' genome). Socially defined race & ethnicity and genetically inferred ancestry are both relevant to health outcomes. Race & ethnicity shape individuals’ lived experience and social environment, eg structural inequities, environmental injustice, and barriers to healthcare access. Genetic ancestry can affect health outcomes via differences in the frequencies of variants associated with disease and drug response. Specifically, we will ask:

1. What is the extent of racial, ethnic, and genetic diversity in the All of Us cohort?

2. How do genetic ancestry and admixture change over geography and with age in the US?

3. Are there associations between genetic ancestry and health outcomes in the All of Us cohort?

Project Purpose(s)

  • Population Health
  • Methods Development
  • Ancestry
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)

Scientific Approaches

To characterize the diversity of the All of Us cohort, we analyzed participant genetic, demographic, and geographic data.

Here is a brief list of methods used:

1. All of Us participant genome-wide genotype was merged and harmonized with global reference population data.

2. Unsupervised clustering analysis techniques - Hopkins statistic, visual assessment of clustering tendency, K-means clustering & UMAP - to assess the extent of genetic structure in the cohort.

3. Supervised genetic ancestry inference using global reference populations, principal components analysis, and the Rye (Rapid ancestrY Estimation) program.

4. Genetic ancestry was compared to participants' self-identified race & ethnicity.

5. Geocoded data and participant age were used to measure how genetic ancestry and admixture vary with respect to participant geography and age.

6. Admixture regression to associate participant health outcomes, gleaned from electronic health records, with their genetic ancestry.

Anticipated Findings

1. The All of Us participant cohort will be racially, ethnically, and genetically diverse, consistent with the project’s aim to recruit underrepresented biomedical research groups in support of health equity.

2. All of Us participant genetic variation will be highly structured and best modeled by clusters rather than a continuum of variation.

3. All of Us participants’ will show patterns of genetically inferred ancestry that are correlated with their socially defined ancestry (i.e. race and ethnicity).

4. All of Us participants’ genetic ancestry and admixture will change over geography and with age.

5. All of Us participants’ genetic ancestry will be associated with a variety of health outcomes.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Shivam Sharma - Graduate Trainee, Georgia Institute of Technology
  • Ashley Green - Project Personnel, All of Us Program Operational Use

GeneticAncestry

As a demonstration project, this project will describe, characterize and, validate the extent of diversity in the All of Us cohort with respect to the participants' race & ethnicity (which are socially defined), and genetic ancestry (which can be objectively…

Scientific Questions Being Studied

As a demonstration project, this project will describe, characterize and, validate the extent of diversity in the All of Us cohort with respect to the participants' race & ethnicity (which are socially defined), and genetic ancestry (which can be objectively inferred from participants' genome). Socially defined race & ethnicity and genetically inferred ancestry are both relevant to health outcomes. Race & ethnicity shape individuals’ lived experience and social environment, eg structural inequities, environmental injustice, and barriers to healthcare access. Genetic ancestry can affect health outcomes via differences in the frequencies of variants associated with disease and drug response. Specifically, we will ask:

1. What is the extent of racial, ethnic, and genetic diversity in the All of Us cohort?

2. How do genetic ancestry and admixture change over geography and with age in the US?

3. Are there associations between genetic ancestry and health outcomes in the All of Us cohort?

Project Purpose(s)

  • Population Health
  • Methods Development
  • Ancestry
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)

Scientific Approaches

To characterize the diversity of the All of Us cohort, we analyzed participant genetic, demographic, and geographic data.

Here is a brief list of methods used:

1. All of Us participant genome-wide genotype was merged and harmonized with global reference population data.

2. Unsupervised clustering analysis techniques - Hopkins statistic, visual assessment of clustering tendency, K-means clustering & UMAP - to assess the extent of genetic structure in the cohort.

3. Supervised genetic ancestry inference using global reference populations, principal components analysis, and the Rye (Rapid ancestrY Estimation) program.

4. Genetic ancestry was compared to participants' self-identified race & ethnicity.

5. Geocoded data and participant age were used to measure how genetic ancestry and admixture vary with respect to participant geography and age.

6. Admixture regression to associate participant health outcomes, gleaned from electronic health records, with their genetic ancestry.

Anticipated Findings

1. The All of Us participant cohort will be racially, ethnically, and genetically diverse, consistent with the project’s aim to recruit underrepresented biomedical research groups in support of health equity.

2. All of Us participant genetic variation will be highly structured and best modeled by clusters rather than a continuum of variation.

3. All of Us participants’ will show patterns of genetically inferred ancestry that are correlated with their socially defined ancestry (i.e. race and ethnicity).

4. All of Us participants’ genetic ancestry and admixture will change over geography and with age.

5. All of Us participants’ genetic ancestry will be associated with a variety of health outcomes.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Shivam Sharma - Graduate Trainee, Georgia Institute of Technology

Collaborators:

  • Vincent Lam - Research Fellow, National Institutes of Health (NIH)
  • Sonali Gupta - Research Assistant, National Institutes of Health (NIH)

UnsupervisedRye

To implement an extension analysis of the unsupervised clustering in our existing method Rye: Genetic ancestry inference at biobank scale. Unsupervised clustering will use PC data.

Scientific Questions Being Studied

To implement an extension analysis of the unsupervised clustering in our existing method Rye: Genetic ancestry inference at biobank scale. Unsupervised clustering will use PC data.

Project Purpose(s)

  • Methods Development
  • Ancestry

Scientific Approaches

We will use All of Us genetic data which be be used in Plink v2 to generate PC vector data. This PC data can also similarly be obtained for 1000 Genome population which can be potentially used as reference points within the unsupervised analysis.

Anticipated Findings

We expect to find that unsupervised points can identify reference points successfully and generate meaningful clusters in the genetic PC data.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Shivam Sharma - Graduate Trainee, Georgia Institute of Technology

GeneBurdenAdmixedPopulations

The objective of this aim is to identify genes with a high burden of evolutionarily constrained variants for each disease and ancestry group. This will be done by grouping individuals by dominant ancestry or if they are admixed, and then…

Scientific Questions Being Studied

The objective of this aim is to identify genes with a high burden of evolutionarily constrained variants for each disease and ancestry group. This will be done by grouping individuals by dominant ancestry or if they are admixed, and then performing rare variant gene-burden tests on each group for each disease. A mask will be applied to the exome variants to account for those with a high Combined Annotation-Dependent Depletion (CADD) score, indicating that the variants are more likely to be deleterious and potentially constrained by evolution. An appropriate regression model that accounts for imbalanced case/control ratios will be selected. The gene-trait associations identified for each group and disease will be compared for level of significance and direction of effect.

Project Purpose(s)

  • Population Health
  • Ancestry

Scientific Approaches

We will first characterize global and local ancestries of All of Us participants. We will use these to define admixed (two way between European, African, and native American) individuals and non-admixed individuals. We will the define masks for the burden tests which will help us perform burden testing. Burden tests will then be performed between these cohorts.

Anticipated Findings

Burdens between admixed and non-admixed populations would differ, it will be exciting to know how burden changes with ancestry estimates between admixed and non-admixed populations.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Shivam Sharma - Graduate Trainee, Georgia Institute of Technology

Pharmacogenomics

Our project aims to analyze the utility of self-identified race and ethnicity labels in genetically-informed drug predictions for different individuals.

Scientific Questions Being Studied

Our project aims to analyze the utility of self-identified race and ethnicity labels in genetically-informed drug predictions for different individuals.

Project Purpose(s)

  • Population Health
  • Drug Development

Scientific Approaches

We are going to use whole genome sequencing data filtered for only pharmacogenomics variants. We will then employ principal component analysis followed by a suite of machine learning models to predict SIRE labels using PC data.

Anticipated Findings

We anticipate the PC vectors will be able to capture the pharmacogenomics variation and predict the SIRE labels in the All of Us dataset.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Shivam Sharma - Graduate Trainee, Georgia Institute of Technology

ChronicKidneyDisease

We want to investigate if chronic kidney disease has disparate prevalence in the United States. If yes, then what could be the genetic factors that contribute to this disparity and how analyze how the environmental factors might be confounding these…

Scientific Questions Being Studied

We want to investigate if chronic kidney disease has disparate prevalence in the United States. If yes, then what could be the genetic factors that contribute to this disparity and how analyze how the environmental factors might be confounding these results.

Project Purpose(s)

  • Disease Focused Research (Chronic kidney disease)
  • Population Health
  • Ancestry

Scientific Approaches

We will be performing fine-scale local ancestry painting using the genotype data. This will let us use admixture mapping to narrow down ancestral loci on the human genome that might be responsible for chronic kidney disease.

Anticipated Findings

We anticipate to find genetic loci associated with different ancestries like: African, European etc. The loci, can then be tested in causal statistical models to test for causality.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Shivam Sharma - Graduate Trainee, Georgia Institute of Technology
1 - 10 of 10
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.