Research Projects Directory

Research Projects Directory

5,357 active projects

This information was updated 6/7/2023

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

Duplicate of Vaping and Contact Dermatitis

Prior research has demonstrated a possible link between vaping and contact dermatitis, however a population-based study has not been conducted. Our research question aims to explore this potential relationship, and quantify the extent of it. We intend to study, what…

Scientific Questions Being Studied

Prior research has demonstrated a possible link between vaping and contact dermatitis, however a population-based study has not been conducted. Our research question aims to explore this potential relationship, and quantify the extent of it. We intend to study, what is the relationship between vaping and contact dermatitis? And what about other skin diseases?

Project Purpose(s)

  • Disease Focused Research (urticaria)

Scientific Approaches

We plan to use the All of Us Database to identify patients who vape and explore any associated increased in skin disease prevalence. We will begin by using survey data to determine patients who vape, and the frequency at which they vape.

Anticipated Findings

With limited research on this topic, we are unsure what to anticipate. We anticipate that any associations found from this study can guide preventative strategies for vape use.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Mihir Patil - Graduate Trainee, University of Illinois at Urbana Champaign

Collaborators:

  • Carlos Salazar - Graduate Trainee, Harvard Medical School

Vaping and Contact Dermatitis

Prior research has demonstrated a possible link between vaping and contact dermatitis, however a population-based study has not been conducted. Our research question aims to explore this potential relationship, and quantify the extent of it. We intend to study, what…

Scientific Questions Being Studied

Prior research has demonstrated a possible link between vaping and contact dermatitis, however a population-based study has not been conducted. Our research question aims to explore this potential relationship, and quantify the extent of it. We intend to study, what is the relationship between vaping and contact dermatitis? And what about other skin diseases?

Project Purpose(s)

  • Disease Focused Research (urticaria)

Scientific Approaches

We plan to use the All of Us Database to identify patients who vape and explore any associated increased in skin disease prevalence. We will begin by using survey data to determine patients who vape, and the frequency at which they vape.

Anticipated Findings

With limited research on this topic, we are unsure what to anticipate. We anticipate that any associations found from this study can guide preventative strategies for vape use.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Mihir Patil - Graduate Trainee, University of Illinois at Urbana Champaign

Collaborators:

  • Carlos Salazar - Graduate Trainee, Harvard Medical School

Duplicate of Demo - Family History in EHR & PPI Data

As a demonstration project, this study will summarize structured data elements available in the All of Us registered tier and compare to published survey results to describe data for reuse in disease specific outcomes. Specific questions include: 1. Could harnessing…

Scientific Questions Being Studied

As a demonstration project, this study will summarize structured data elements available in the All of Us registered tier and compare to published survey results to describe data for reuse in disease specific outcomes. Specific questions include:

1. Could harnessing informatics tools like predictive modeling and clinical decision support to detect and alert healthcare providers to these preventative measures significantly improve the precise care we deliver to patients?
2. How can one evaluate the availability of family medical history information within the All of Us registered tier data and characterize the structured data elements from both data sources?

Project Purpose(s)

  • Methods Development
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)

Scientific Approaches

We utilize the Family Medical History PPI survey to capture self-reported information but exclude participants who did not know any of their family history or who skipped every survey question. We pay particular attention to the disease/relative pairings that map to the American College of Medical Genetics and Genomics’ (ACMG) list of important diseases.

We define EHR family history information as the collection of registered tier observations with "family+history" or "FH:" anywhere in their OMOP concept name. We exclude observations of “Family social history” and remove duplicate observation and value concept pairings from the same healthcare organization regarding the same participant as these were likely due to repeated entries across multiple routine annual physical exams.

We aim to compare the data sources by summarizing the type and amount of family history information gained.

Anticipated Findings

This description of the family medical history data in the All of Us registered tier database will assist future investigators in understanding All of Us data methods and give feedback to the program on the utility of participant survey and EHR data.

We hypothesize that the survey data will provide a more complete look at family medical history due to its structured nature. Though, we are also interested in determining how much overlap there is between the PPI and EHR data. It’s plausible that the free-form nature of EHR family history information yields more detailed records. We would ultimately like to determine if a gold standard method for defining a participant’s family medical history is attainable within the All of Us registered tier data.

We anticipate facing informatics challenges because of collecting data from different sources, mapping these data to a common data model, and attempting to harness data from these sources to find the common source of truth.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Duplicate of Genetics of substance use and substance use disorders

We are affiliated with the Psychiatric Genomics Consortium Substance Use Disorders (PGC SUD) working group and our major goal is to better understand the genetic and biological factors underlying substance use disorders. This research may potentially lead to improved treatments…

Scientific Questions Being Studied

We are affiliated with the Psychiatric Genomics Consortium Substance Use Disorders (PGC SUD) working group and our major goal is to better understand the genetic and biological factors underlying substance use disorders. This research may potentially lead to improved treatments and prevention efforts.

Project Purpose(s)

  • Disease Focused Research (Substance use disorders and correlated psychiatric disorders)
  • Ancestry

Scientific Approaches

We plan to use approaches such as genome-wide association analyses, polygenic risk score approaches, Mendelian randomization, GCTA, and other statistical genetics methods for analyzing SNP- and gene-based associations, generating polygenic predictors, estimating causal paths, and estimating heritability. We anticipate using the electronic health records data and other health- and behavior-related data in All of US. We will also be using the whole genome sequence data and the genotype array data, as well as the ancestry assignments and principal components provided by All of Us.

Anticipated Findings

We know that there are many genetic variants that contribute to substance use disorders (SUDs). Many of these variants exert effects on multiple SUDs (e.g., cannabis use disorder and tobacco use disorder), as well as other psychiatric disorders, while some variants seem to be substance-specific. We are still uncovering the gene networks and biological pathways implicated by these risk variants, and larger samples are still needed to detect common genetic risk variants, especially for under-represented populations in addiction genetic research. The All of Us Research Program provides a valuable resource for us to make progress on these questions and expand our research to include individuals of diverse ancestries. This research could eventually lead to better prevention methods and treatments for SUDs, and it is essential that all populations benefit from these findings.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Sarah Colbert - Project Personnel, Icahn School of Medicine at Mount Sinai
  • Renato Polimanti - Mid-career Tenured Researcher, Yale University
  • Howard Edenberg - Late Career Tenured Researcher, Indiana University
  • Dongbing Lai - Project Personnel, Indiana University
  • Alexander Hatoum - Research Fellow, Washington University in St. Louis
  • Alex Miller - Research Fellow, Washington University in St. Louis

AFib epidemiology (AOU v4)

The overall goal of this study, as a Demonstration project, is to evaluate the ability of the All of Us Research Program data to replicate epidemiologic patterns of atrial fibrillation (AF), a common arrhythmia, previously described in other setting. We…

Scientific Questions Being Studied

The overall goal of this study, as a Demonstration project, is to evaluate the ability of the All of Us Research Program data to replicate epidemiologic patterns of atrial fibrillation (AF), a common arrhythmia, previously described in other setting. We will address this goal with these two aims:
• Specific Aim 1. To determine the association of race and ethnicity with the prevalence and incidence of atrial fibrillation (AF). We hypothesize than non-whites will have lower prevalence and incidence of AF than whites.
• Specific Aim 2. To estimate associations of established risk factors for AF with the prevalence and incidence of AF. We hypothesize that increased body mass index, higher blood pressure, diabetes, smoking and a prior history of cardiovascular diseases will be associated with increased prevalence and incidence of AF.

Project Purpose(s)

  • Population Health
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)

Scientific Approaches

We will select all All of Us participants who self-reported sex at birth male or female, whose self-reported race was white, black or Asian, as well as those who self-reported being Hispanics.

Atrial fibrillation (AF) will be identified from self-reports in the medical survey or from electronic health records (EHR).

Clinical factors will be identified from EHR and study measurements (blood pressure, weight, height).

We will evaluate the association of demographic (age, sex, race/ethnicity) and clinical (body mass index, blood pressure, smoking, cardiovascular diseases) factors with prevalence of self-reported AF and prevalence of AF in the EHR, as well as incident AF ascertained from the EHR.

Anticipated Findings

The overall goal of this project is to evaluate the prevalence and incidence of atrial fibrillation (AF), overall and by race/ethnicity, as well as to confirm the association of established risk factors for AF in the All of Us Research participants. We expect to confirm associations between demographic and clinical variables previously reported in the literature, demonstrating the value of the All of Us Research Program data to address questions regarding this common cardiovascular disease.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Vignesh Subbian - Early Career Tenure-track Researcher, University of Arizona
  • Francis Ratsimbazafy - Other, All of Us Program Operational Use
  • Aymone Kouame - Other, All of Us Program Operational Use
  • Aniqa Alam
  • Konstantinos Sidiropoulos - Other, Nova Southeastern University

Introductory to All of Us

This workspace is to help me understand how to use the All of Us Research program. I am specifically interested in determining the relationships between chronic kidney diseases, diabetes, cardiovascular diseases, and other types of medical problems.

Scientific Questions Being Studied

This workspace is to help me understand how to use the All of Us Research program. I am specifically interested in determining the relationships between chronic kidney diseases, diabetes, cardiovascular diseases, and other types of medical problems.

Project Purpose(s)

  • Disease Focused Research (chronic kidney disease, diabetes)

Scientific Approaches

I am looking for datasets with information on mentioned diseases. I plan to use machine learning to develop models that detect the diseases and finds the relationships between these diseases.

Anticipated Findings

I anticipate that patients with diabetes may be also involved with other types of diseases. If this is correct, then our work will add into the medical library on new approach how to detect the diabetes or chronic kidney disease.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Quan Nguyen - Graduate Trainee, Institute for Systems Biology

Duplicate of Identity-by-descent in the United States

We are leveraging the full genomic and population diversity of the All of Us project to understand the genetic ancestral basis of diversity in the causes, etiology and treatment of health outcomes. All of Us provides the racial and ethnic…

Scientific Questions Being Studied

We are leveraging the full genomic and population diversity of the All of Us project to understand the genetic ancestral basis of diversity in the causes, etiology and treatment of health outcomes. All of Us provides the racial and ethnic background of participants, but these are inaccurate proxies for genetic ancestry, which will help us understand the contribution of genetic ancestral differences among individuals to the biological basis of health outcomes. Therefore, we will measure genetic diversity, identify the genetic ancestry of All of Us participants throughout the United States. This information will help us better understand biological variation that contributes to differences in health outcomes.

Project Purpose(s)

  • Population Health
  • Ancestry

Scientific Approaches

We are first quantifying fine scale population substructure using genomic approaches that measure: a) global genetic diversity, or the total proportion of different global ancestries represented in an individual's genome; b) local genetic ancestry, or where in the genome this ancestry is located in an individual; c) detection of genomic segments shared identity-by-descent (IBD). These IBD segments are segments of DNA shared between individuals from a shared common ancestor. We are using Hail, PLINK, ADMIXTURE, RFMix, MOSAIC,TBWPT, and in-house Python and R scripts and other genomic software to capture this variation.

Anticipated Findings

We anticipate that we will identify founder populations that are distributed differently across the United States, and distinguish population subgroups that are finer grained than either racial categories or continental ancestry categories. For example, the Latino ethnicity comprises individuals who are Dominican, Puerto Rican, Mexican, Cuban, etc. We anticipate being able to distinguish these groups, as well as the admixture among these groups, to more accurately understand the contribution of ancestry to health outcomes. Quantification this ancestry is the first step to understanding the biological diversity within the United States.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • chenxin zhang - Project Personnel, Albert Einstein College of Medicine

Covid-19 vaccine uptake among cancer survivors V

The purpose of the study is to evaluate the modifiable, multilevel factors associated with COVID-19 vaccine uptake among cancer survivors from the All of Us dataset.

Scientific Questions Being Studied

The purpose of the study is to evaluate the modifiable, multilevel factors associated with COVID-19 vaccine uptake among cancer survivors from the All of Us dataset.

Project Purpose(s)

  • Disease Focused Research (cancer)
  • Population Health
  • Social / Behavioral

Scientific Approaches

A cohort of cancer survivors will be from using the database. Various survey questions will aid in answering our research aims. In addition, the covid-19 survey questionnaires will also be used to determine our outcome of interest.

Anticipated Findings

Multilevel factors are anticipated to be associated with vaccine uptake and hesitance. These results can help to identify specific characteristics of cancer survivors that make them more or less likely to experience vaccine hesitancy and inform efforts to target, adapt and tailor interventions to their needs.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Angel Arizpe - Graduate Trainee, University of Southern California
  • Albert Farias - Early Career Tenure-track Researcher, University of Southern California

Duplicate of Data Wrangling in All of Us Program (v7)

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Scientific Questions Being Studied

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Project Purpose(s)

  • Educational
  • Other Purpose (For use with Office hours. notebooks for adding code snippets useful for researchers. This is a placeholder for creating notebooks for best practices among other things)

Scientific Approaches

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Anticipated Findings

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Registered Tier

Research Team

Owner:

  • Vida Pourmand - Graduate Trainee, University of California, Irvine

Glaucoma polygenic risk

Glaucoma, a progressive optic neuropathy, is the leading cause of irreversible vision loss worldwide. It is associated with a poor quality of life, decreased mobility, increased falls, and increased economic burden. Primary open angle glaucoma (POAG) is a complex disease…

Scientific Questions Being Studied

Glaucoma, a progressive optic neuropathy, is the leading cause of irreversible vision loss worldwide. It is associated with a poor quality of life, decreased mobility, increased falls, and increased economic burden. Primary open angle glaucoma (POAG) is a complex disease with a heterogeneous presentation and disease course. POAG is also one of the most heritable of all complex human diseases, with over 127 risk loci identified in multiethnic populations. Genome-wide polygenic risk scores (PRSs) have been used to effectively identify individuals at high risk for POAG. To date, little is known about clinical features of individuals with high and low genetic burden for POAG. We hypothesize that POAG genetic risk may modulate and be modulated by clinical features of individuals and their environment; identifying these characteristics would improve our understanding of how genetic variants may impact POAG pathogenesis.

Project Purpose(s)

  • Disease Focused Research (glaucoma)
  • Ancestry

Scientific Approaches

We will construct a POAG polygenic risk score (PRS) using genome-wide association study summary statistics from the prior large cross-ancestry meta-analysis. We will identify glaucoma cases based on glaucoma codes. Stratifying by PRS deciles, models will be constructed to assess associations with a range of demographic (age, gender, ethnicity), metabolic and systemic (body mass index, forced vital capacity, peak expiratory flow, heart rate, blood pressure, diabetes, autoimmune disease), environmental and nutritional (caffeine, alcohol, nicotine and cannabis use), ocular features (lasers and surgery) and medication use. False discovery rate thresholds will be used to adjust P-values for multiple comparisons

Anticipated Findings

Differences found through this analysis may provide insights into disease specific pathogenesis and generate new hypotheses for further research.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Yan Zhao - Other, Mass General Brigham

Stimulant & Ca+ channel blocker Impact on CVD

I am exploring data to formalize a specific research question pertaining to the use concurrent use of stimulants and anti-hypertensive medications and the impact on delayed onset diastolic heart failure.

Scientific Questions Being Studied

I am exploring data to formalize a specific research question pertaining to the use concurrent use of stimulants and anti-hypertensive medications and the impact on delayed onset diastolic heart failure.

Project Purpose(s)

  • Disease Focused Research (diastolic heart failure)
  • Drug Development
  • Methods Development

Scientific Approaches

I will utilize the All of Us data in hopes of exploring and solidifying my research question. My dataset will consist of adults 18+ prescribed amphetamine with and an anti-hypertensive medication. I will use retrospective analysis to determine if there is a significant association.

Anticipated Findings

I anticipate to find that the concurrent use of stimulant and anti-hypertensive medication will result in a later onset and/or reduced onset of diastolic heart failure later on in life compared to those prescribed stimulants alone. These findings would implicate potential revisions in amphetamine and anti-hypertensive prescriptions for adults and may reduce healthcare burden later on in patient's lives.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Disability Status

Data Set Used

Controlled Tier

Research Team

Owner:

  • Danich Qadir - Graduate Trainee, Philadelphia College of Osteopathic Medicine

RTI Type I Diabetes PRS Validation Analysis

Based on the research done from the Sharp et. al. 2019 paper and the Qu et. al. paper, Polygenic Risk Scores (PRS's) were developed for European and African biological and ancestral groups. We would like to go and validate the…

Scientific Questions Being Studied

Based on the research done from the Sharp et. al. 2019 paper and the Qu et. al. paper, Polygenic Risk Scores (PRS's) were developed for European and African biological and ancestral groups. We would like to go and validate the PRS's against and independent cohort of individuals associated with Type I Diabetes to recalculate the PRS threshold using the all of us database. Additionally, we want to further validate the PRS's against other biological ancestral races such as East Asian and Hispanic.

Project Purpose(s)

  • Disease Focused Research (type 1 diabetes mellitus)
  • Methods Development
  • Control Set
  • Ancestry

Scientific Approaches

Create cohorts representing individuals with Type I Diabetes from various backgrounds and races based off of PTA ancestral predictions and run a customized PRS calculation script on the candidate PRS markers described in Qu et. al., 2022.

Anticipated Findings

The anticipated findings for the study could either be confirming that all 72 PRS candidate markers are sufficient across multiple ancestral populations, or that confoundment plays a role in the overall effect size of the PRS markers for given or specific ancestral populations, thus requiring new thresholds to be established for different ancestral populations, other than European.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Kristin Glaze - Project Personnel, All of Us Researcher Academy/RTI International
  • Javan Carter - Research Associate, All of Us Researcher Academy/RTI International

Environmental Barriers Impacting Genetic Influences on Educational Attainment

Researchers have identified a polygenic index, representing genetic propensity, that accounts for significant variance in educational attainment (EA). The educational attainment polygenic index (EA PGI) accounts about 16% of the variance in educational attainment. When the EA PGI was measured…

Scientific Questions Being Studied

Researchers have identified a polygenic index, representing genetic propensity, that accounts for significant variance in educational attainment (EA). The educational attainment polygenic index (EA PGI) accounts about 16% of the variance in educational attainment. When the EA PGI was measured in samples of African ancestry, it accounts for disproportionately lower variance. Investigation into this identifies environmental barriers manifesting as repressive gene-environment correlations ; environments complimentary to one’s genome are less available to those of African ancestry. The breadth of available environments complimentary to one’s genotype varies across genetic ancestry because of systemic barriers. Therefore, the relationship between the EA PGI and EA may be moderated by an environment's “fit” with one’s genome, and the effect sizes of these moderators may vary across genetic ancestries. Quantitatively validating these barriers informs decisions aimed at supporting equity.

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Ancestry

Scientific Approaches

The data will be used to generate the polygenic index for educational attainment utilizing the 3,952 uncorrelated single-nucleotide polymorphisms (SNP) reaching the significance threshold of p < 1 x 10-8 displayed in Okbay et al. (2022); these SNPs will be weighted and summed using standard polygenic index generation methodology. This polygenic index will be used to predict the amount of variance accounted for in the number of years of education completed by the participants in the sample. Measures of the environment included in the phenotype database will be used to test for moderators that impact the relationship between the EA PGI and measured educational attainment across ancestries.

Anticipated Findings

This proposed project has two specific aims. Aim 1 is to demonstrate disproportionately less variance in EA accounted for by the EA PGI in an independent sample of participants of African ancestry when compared to samples of European ancestry. Additionally, this will replicate findings shown in Okbay et al. (2022) and Rabinowitz et al. (2019) with a notably larger sample. Aim 2 will seek to identify measured environmental influences , such as economic factors related to zip code or perceived stress from the environment, that moderate the relationship between the EA PGI and educational attainment across ancestries. Quantitatively validating the repressive effect of environmental disadvantage while controlling for individual differences in underlying genetic influences will provide valuable insight on the impact of these barriers, while additionally informing hypotheses and decisions aimed at minimizing societal barriers to equal opportunity and achievement.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Alex Olejko - Graduate Trainee, Case Western Reserve University

Problematic Alcohol Use and Related Psychiatric and Cardiometabolic Conditions

The aim of this proposal is to identify new genetic and non-genetic factors that are leveraged through a machine-learning based approach to predict individual 1) risk for problematic alcohol use, 2) outcome from problematic alcohol use (e.g., recurrence of alcohol-related…

Scientific Questions Being Studied

The aim of this proposal is to identify new genetic and non-genetic factors that are leveraged through a machine-learning based approach to predict individual 1) risk for problematic alcohol use, 2) outcome from problematic alcohol use (e.g., recurrence of alcohol-related hospitalization or death), and 3) genetic and non-genetic alcohol risk factors that are shared with other psychiatric diseases and obesity-related cardiometabolic phenotypes. Prediction will utilize a combination of genetic, clinical, and lifestyle risk factors. Ultimately, we aim to identify not only individual predictors, but build a novel risk prediction model that improves on currently developed polygenic risk scores, which show clinical promise, but have methodological shortcomings that limit their accuracy.

Project Purpose(s)

  • Disease Focused Research (alcohol misuse and associated diseases)
  • Social / Behavioral
  • Ancestry

Scientific Approaches

We are integrating data from the UK Biobank and the NIH All of Us Research Program. We will conduct genome-wide studies (GWAS) with REGENIE or Tractor to identify new variants associated with our alcohol use phenotypes. Potential clinical predictors will be identified from literature and machine learning (e.g., gradient boosting) analyses of separate lifestyle domains. For individual risk prediction, we will use neural networks to reduce the dimensionality of genetic data into interpretable independent latent factors, demonstrate that the latent factors recapitulate the alcohol-related GWAS associations that we identified, before finally using the latent factors in a multi-layer perceptron model to predict broad alcohol use phenotypes. We will also use neural networks to predict different sub-components of problematic alcohol use and negative alcohol use outcomes. Similar stepwise approaches will be performed for Mondrian cross-conformal prediction and gradient boosting classification.

Anticipated Findings

There is growing interest in using genetic risk scores in clinical practice, especially for decision making surrounding early intervention in high-risk individuals as well as triaging alcohol- and cardiometabolic-risk after an alcohol-related medical event for increased follow-up. Current genetic risk models can stratify individuals into large buckets of risk, but many of these methods discard useful information in favor of simple models or fail to generalize to real-world settings. A further limitation broadly plaguing the field of genomics research is the drastic overrepresentation of individuals of white European ancestry. By leveraging the diversity of the All of Us Research Program, we seek to build reproducible and equitable risk prediction models that may benefit a diverse patient population.

Demographic Categories of Interest

  • Race / Ethnicity
  • Gender Identity
  • Geography
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Gordon Ye - Undergraduate Student, University of California, San Diego
  • Eric Zorrilla - Early Career Tenure-track Researcher, Scripps Research

Collaborators:

  • Sandra Sanchez-Roige - Early Career Tenure-track Researcher, University of California, San Diego
  • Emily Zhu - Other, Scripps Research
  • John Huber - Other, Scripps Research
  • Jennifer Zhang - Project Personnel, All of Us Program Operational Use
  • Eli Browne - Other, Scripps Research
  • Bhadra Rupesh - Other, Scripps Research
  • Paulina Ai - Other, Scripps Research

Problematic Alcohol Use and Related Psychiatric and Cardiometabolic Conditions 7

The aim of this proposal is to identify new genetic and non-genetic factors that are leveraged through a machine-learning based approach to predict individual 1) risk for problematic alcohol use, 2) outcome from problematic alcohol use (e.g., recurrence of alcohol-related…

Scientific Questions Being Studied

The aim of this proposal is to identify new genetic and non-genetic factors that are leveraged through a machine-learning based approach to predict individual 1) risk for problematic alcohol use, 2) outcome from problematic alcohol use (e.g., recurrence of alcohol-related hospitalization or death), and 3) genetic and non-genetic alcohol risk factors that are shared with other psychiatric diseases and obesity-related cardiometabolic phenotypes. Prediction will utilize a combination of genetic, clinical, and lifestyle risk factors. Ultimately, we aim to identify not only individual predictors, but build a novel risk prediction model that improves on currently developed polygenic risk scores, which show clinical promise, but have methodological shortcomings that limit their accuracy.

Project Purpose(s)

  • Disease Focused Research (alcohol misuse and associated diseases)
  • Social / Behavioral
  • Ancestry

Scientific Approaches

We are integrating data from the UK Biobank and the NIH All of Us Research Program. We will conduct genome-wide studies (GWAS) with REGENIE or Tractor to identify new variants associated with our alcohol use phenotypes. Potential clinical predictors will be identified from literature and machine learning (e.g., gradient boosting) analyses of separate lifestyle domains. For individual risk prediction, we will use neural networks to reduce the dimensionality of genetic data into interpretable independent latent factors, demonstrate that the latent factors recapitulate the alcohol-related GWAS associations that we identified, before finally using the latent factors in a multi-layer perceptron model to predict broad alcohol use phenotypes. We will also use neural networks to predict different sub-components of problematic alcohol use and negative alcohol use outcomes. Similar stepwise approaches will be performed for Mondrian cross-conformal prediction and gradient boosting classification.

Anticipated Findings

There is growing interest in using genetic risk scores in clinical practice, especially for decision making surrounding early intervention in high-risk individuals as well as triaging alcohol- and cardiometabolic-risk after an alcohol-related medical event for increased follow-up. Current genetic risk models can stratify individuals into large buckets of risk, but many of these methods discard useful information in favor of simple models or fail to generalize to real-world settings. A further limitation broadly plaguing the field of genomics research is the drastic overrepresentation of individuals of white European ancestry. By leveraging the diversity of the All of Us Research Program, we seek to build reproducible and equitable risk prediction models that may benefit a diverse patient population.

Demographic Categories of Interest

  • Race / Ethnicity
  • Gender Identity
  • Geography
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Gordon Ye - Undergraduate Student, University of California, San Diego
  • Eric Zorrilla - Early Career Tenure-track Researcher, Scripps Research

Collaborators:

  • Sandra Sanchez-Roige - Early Career Tenure-track Researcher, University of California, San Diego
  • Emily Zhu - Other, Scripps Research
  • John Huber - Other, Scripps Research
  • Jennifer Zhang - Project Personnel, All of Us Program Operational Use
  • Eli Browne - Other, Scripps Research
  • Bhadra Rupesh - Other, Scripps Research
  • Paulina Ai - Other, Scripps Research

Duplicate of Duplicate of Vaping study

vaping and skin cancer. Un sure of exact parameters but want to look at everything including melanoma, basal cell and squamous cell carcinoma.

Scientific Questions Being Studied

vaping and skin cancer. Un sure of exact parameters but want to look at everything including melanoma, basal cell and squamous cell carcinoma.

Project Purpose(s)

  • Disease Focused Research (Skin cancers)

Scientific Approaches

odds ratio. Probably just look for association studies. Will also include regression studies and other statistical tests that may link vaping to increased odds of skin cancer

Anticipated Findings

Given the ambivalent data on vaping and skin cancers, I am unsure of what results to expect. I anticipate that it is possuible there is increased risk, but the time that vaping has been around is not long enough to determine.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Kelvin Zhou - Graduate Trainee, Baylor College of Medicine
  • Harrison Zhu - Graduate Trainee, Baylor College of Medicine

Pediatric Hematuria

What causes microscopic hematuria? In pediatric patients, one of the most common known causes is Alport syndrome, caused by a mutation in a gene required for extracellular matrix (ECM) integrity. Another more recent discovery showed that a different gene that…

Scientific Questions Being Studied

What causes microscopic hematuria? In pediatric patients, one of the most common known causes is Alport syndrome, caused by a mutation in a gene required for extracellular matrix (ECM) integrity. Another more recent discovery showed that a different gene that is also required for ECM integrity causes hematuria if mutated. This study will try to identify other genes that are required for the ECM function and determine if any are associated with hematuria. Discovering the specific cause hematuria can help determine the best treatment for patients rather than treating all cases of hematuria the same without knowing the root cause.

Project Purpose(s)

  • Disease Focused Research (Microscopic Hematuria)

Scientific Approaches

To determine genes associated with microscopic hematuria, this project will focus in on patients with persistent microscopic hematuria without Alport syndrome. All ECM genes will be analyzed and mutations identified. The rate of mutations for these genes will be compared to that of a control population with similar history and characteristics to determine if any of the genes have a higher percentage of mutation compared to the control population.

Anticipated Findings

I anticipate that other ECM genes such as agrins or proteins needed for glycosylation will be identified. This will open new avenues for researchers to study the roles of these proteins and how mutations may contribute to disease.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Machine Learning Model for Heart Attack Risk

Our research project aims to design an machine learning model to estimate the risk of myocardial infarction (heart attack) between regular doctor visits. This model will incorporate a wide range of data, such as individual health history, demographic information, and…

Scientific Questions Being Studied

Our research project aims to design an machine learning model to estimate the risk of myocardial infarction (heart attack) between regular doctor visits. This model will incorporate a wide range of data, such as individual health history, demographic information, and actigraphy data (which captures rest/activity cycles). By utilizing these diverse data sources, we aim to identify patterns, correlations, and risk factors that might otherwise remain hidden in traditional analytical methodologies. Health history can provide insights into predisposing conditions, while demographic data can shed light on socio-economic and lifestyle factors. Simultaneously, actigraphy data will allow us to understand the influence of physical activity and sleep patterns on cardiovascular health. The ultimate goal of this project is to enhance the predictive accuracy of heart attack risk, providing healthcare professionals with a powerful tool for early intervention.

Project Purpose(s)

  • Disease Focused Research (myocardial infarction)
  • Social / Behavioral

Scientific Approaches

We believe that medical history can reveal predispositions due to past health conditions; demographic data can highlight socio-economic, age, or ethnicity-related risk factors, while actigraphy data can offer insights into the influences of lifestyle, physical activity, and sleep patterns on heart health. We will examine various machine learning algorithms utilizing the sklearn platform in Python. This will include traditional models like decision trees, random forests, support vector machines, and more contemporary incremental learning methods like passive-aggressive algorithms. Simultaneously, we aim to leverage the TensorFlow library to explore deep learning architectures. Deep learning models, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and long short-term memory networks (LSTMs), offer the potential to detect complex, non-linear relationships within our data, often unseen by traditional methods.

Anticipated Findings

We hope to find valuable insights into the multifaceted influences of health history, demographic factors, and actigraphy data on myocardial infarction risk. By developing a machine learning model capable of accurately predicting heart attack risk between doctor visits, we aim to contribute a crucial tool to the arsenal of predictive medicine. This could lead to a significant shift in the management of cardiovascular health, moving towards personalized risk assessment, early interventions, and potentially improved patient outcomes. Our exploration of various machine learning and deep learning models also stands to enrich the field's understanding of these methodologies' applications in healthcare. The comparative analysis of traditional, deep learning and incremental learning methods could provide evidence-based guidance for researchers and practitioners on the strengths and limitations of these techniques in medical contexts.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

MSUD

How do liver transplants improve the metabolic stability of MSUD patients compared to low-protein dieting?

Scientific Questions Being Studied

How do liver transplants improve the metabolic stability of MSUD patients compared to low-protein dieting?

Project Purpose(s)

  • Educational

Scientific Approaches

Levels of BCAAs in the blood (isoleucine, leucine, and valine)
Age (or age group); make sure to specify if their age
Weight (or BMI if weight not possible)
Blood pressure
Ketone levels

Anticipated Findings

N/A

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Jesper Wolski - Undergraduate Student, Arizona State University

Duplicate of Skills Assessment Training Notebooks For Users (v7)

This workspace contains multiple notebooks that assess users' understanding of the workbench and OMOP. These notebooks are meant to help users check their knowledge not only on Python, R, and SQL, but also on the general data structure and data…

Scientific Questions Being Studied

This workspace contains multiple notebooks that assess users' understanding of the workbench and OMOP. These notebooks are meant to help users check their knowledge not only on Python, R, and SQL, but also on the general data structure and data model used by the All of Us program.

Project Purpose(s)

  • Educational

Scientific Approaches

There are no scientific approach used in this workspace because it is meant for educational purposes only. We will cover all aspects of OMOP, and hence will use most datasets available in the workbench.

Anticipated Findings

We do not anticipate to have any findings. Instead, we are educating people on the use of the workbench and the common data model OMOP used by the program.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

EHR_Indels_Cancer

The All of Us Research Program is an unprecedented initiative aimed at collecting data from a minimum of 1 million individuals, with the primary objective of expediting health research and facilitating groundbreaking medical discoveries. Utilizing the data derived from this…

Scientific Questions Being Studied

The All of Us Research Program is an unprecedented initiative aimed at collecting data from a minimum of 1 million individuals, with the primary objective of expediting health research and facilitating groundbreaking medical discoveries. Utilizing the data derived from this program, we aim to gain deeper insights into the influences of individual variations in lifestyle, environment, and biological composition on health and cancer. The All of Us Research Program has released the medical record data of ~450,000 participants (~250,000 of which have accompanying blood-derive Whole Genome Sequencing variant calls). Here we plan to apply existing genomic algorithms to this rich resource to discover new patient-centric insights that will elucidate properties of misdiagnosis and disparities in cancer diagnosis.

Project Purpose(s)

  • Disease Focused Research (cancer)
  • Population Health
  • Methods Development
  • Control Set
  • Ancestry
  • Ethical, Legal, and Social Implications (ELSI)

Scientific Approaches

An early cancer diagnosis has repeatedly been shown to be one of the best predictors of better patient outcomes. While finding tumors early has been a priority in cancer research for ages, minimal research effort has been done to predict the age of a tumor or how long a patient has lived with cancer without a diagnosis. Our objective is to build a computational tool that predicts misdiagnosed incidences of cancer. To accomplish this goal, we will apply genomic alignment algorithms to electronic health records (EHR) to predict cancer misdiagnosis in the All of Us Research Program. We hypothesize that insertion-like and deletion-like events are tractable within electronic health records. Furthermore, we surmise these events will vary drastically between genetic ancestry and socioeconomic demographics. Below we outline three aims that will ultimately enhance cancer diagnosis accuracy. We plan to write and develop code in python and R to accomplish this task.

Anticipated Findings

Overall, I anticipate that this work will have the following outcomes.
1) It will provide a reliable comparison tool to assess the similarity between patients.
2) It will identify optimal time windows to compare health records.
3) It will use a clear box algorithm that will allow clinicians to identify features that are common to patients with cancer and not being treated for disease so that interventions can be made for varying demographics.
4) This work will eventually result in a predictive algorithm that seeks to catch tumors in early stages.
5) Finally we will build succinct workflows and statistical models to share with the rest of the All of Us community.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography

Data Set Used

Controlled Tier

Research Team

Owner:

  • Matthew Bailey - Early Career Tenure-track Researcher, Brigham Young University

Black Mental Health Disparities

I am exploring the All of Us data because I want to know the levels of depression and anxiety diagnosis within black communities and how it compares to other races. I also want to know about some of the social…

Scientific Questions Being Studied

I am exploring the All of Us data because I want to know the levels of depression and anxiety diagnosis within black communities and how it compares to other races. I also want to know about some of the social determinants of health that mostly impact depression and anxiety rates within and outside black communities in America.

Project Purpose(s)

  • Population Health

Scientific Approaches

I plan on building a few separate cohorts and comparing regression models and visualizations with others. I will build cohorts of African Americans with moderate to severe depression or anxiety and compare it with the general population of the United States with the same disorders. I will use the health data from these cohorts to visualize summary statistics and build regression models to quantify the rates of depression and anxiety in the United States.

Anticipated Findings

I anticipate that I would find that a higher rate of depression and anxiety levels within the black community in comparison to most races in America. My findings would shed more light mental health issues that black people face in America and could prompt more motion towards actions that mitigates some of the negative social determinants of health.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Gender Identity
  • Access to Care
  • Education Level

Data Set Used

Registered Tier

Research Team

Owner:

  • Eddie Ekpoh - Undergraduate Student, University of Arizona

Collaborators:

  • Haiquan Li - Early Career Tenure-track Researcher, University of Arizona

Meta-Analyses of Retrospective Cancer Cohorts

Cancer is a leading cause of death in the country and racial minority groups endure disproportionately higher death burden, likely because of lack of access to healthcare and higher cancer stage at diagnosis among racial minorities. To improve health outcomes…

Scientific Questions Being Studied

Cancer is a leading cause of death in the country and racial minority groups endure disproportionately higher death burden, likely because of lack of access to healthcare and higher cancer stage at diagnosis among racial minorities. To improve health outcomes for all cancer patients, it is important to understand and eliminate racial disparities in screening, diagnosis, and treatment, a crucial component of NIH’s new initiative of ending structural racism. Despite overall improvements, however, ethnic and racial inequities continue to in- crease, suggesting deficiencies in research designs. For example, compared to the US Census, most observational cancer studies are found to overrepresent Whites and underrepresent African Americans and Asians. It remains challenging to utilize these studies for detecting and understanding disparities.

Project Purpose(s)

  • Disease Focused Research (cancer)
  • Population Health
  • Social / Behavioral
  • Methods Development
  • Ancestry

Scientific Approaches

Leveraging the rich resource of the All of US program, we propose to develop:

Aim 1. Meta-analytical frameworks for unconfounded comparisons of group-specific disease out- comes. We will develop novel concordant-weighted estimators of various group-specific potential outcomes. We will investigate the estimators’ asymptotic properties and design innovative frequentist procedures for uncertainty quantification.

Aim 2. Flexible Bayesian nonparametric frameworks for high dimensional biomarker and censored group-specific cancer outcomes. We will foster biologically interpretable Bayesian nonparametric models to efficiently process information from the clinical, demographic, and multi-omic domains in the retrospective cohorts.

Aim 3. Software to implement our proposed frameworks for meta-analyzing various observational populations with high dimensional biomarkers.

Anticipated Findings

Statistically, while comparing health outcomes between patient groups determined by a characteristic such as treatment or disease subtype, social determinants, environmental forces, and genetic factors may generate confounding due to covariate imbalance among the groups, resulting in biased comparisons. Weighting and matching are covariate-balancing approaches facilitating unconfounded comparisons in a pseudo-population. Existing weighting approaches cannot i) directly analyze multiple observational studies with multiple groups, ii) provide meaningful answers, because their pseudo-populations often differ considerably from the population of interest, iii) incorporate scientific or domain knowledge, or iv) deliver precise inferences for a wide variety of cancer outcomes. Leveraging the cancer cohorts, we will develop methods with a common goal of effectively comparing group-specific cancer outcomes by integrating high dimensional observational studies with multiple groups.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Subharup Guha - Mid-career Tenured Researcher, University of Florida

Social Determinants and Mental Health (v7)

We will explore the social determinants of health (e.g. social support, neighborhood cohesion, loneliness, housing security, etc.) and their impact on mental disorders such as depression and anxiety by utilizing the survey and EHR data within the All of Us…

Scientific Questions Being Studied

We will explore the social determinants of health (e.g. social support, neighborhood cohesion, loneliness, housing security, etc.) and their impact on mental disorders such as depression and anxiety by utilizing the survey and EHR data within the All of Us cohort.

Some questions of interest are:

1) Are the determinants associated with risk or protection for mental health disorders such as depression and anxiety?
2) How do the associations look like for different demographics including:
Age, sex assigned at birth, race and ethnicity, residence (urban, suburban, rural), sexual orientation, income, and education.

In the midst of a mental health crisis, accentuated by the COVID-19 pandemic, it is important to find risk and protective factors for mental illnesses in diverse populations. We hope this study will help elucidate this much-needed topic.

Project Purpose(s)

  • Population Health
  • Social / Behavioral

Scientific Approaches

We will use the EHR data and self-reported survey data on basic demographics and social determinants of health in the All of Us dataset. We will use epidemiological methods to account for possible biases (selection bias, missing data, etc.) in the dataset. We will use R to conduct logistic regression analyses for depression and anxiety separately adjusting for the covariates mentioned above. A Possible limitation is that the reliance on EHR diagnosis of mental disorders may leave room for misclassification.

Anticipated Findings

For this study, we anticipate that depression or anxiety status may be associated with varying levels of social determinants. We expect that this relationship may look different depending on the social demographic group. We believe these findings will be important for developing future targeted interventions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Younga Lee - Research Fellow, Mass General Brigham

Sleep PheWAS RTDv7

Our primary goal is to develop a model based on the American Heart Association's (AHA) Essential 8 in order to understand the interaction between activity levels and sleep quality with the development and progression of human disease. These analyses will…

Scientific Questions Being Studied

Our primary goal is to develop a model based on the American Heart Association's (AHA) Essential 8 in order to understand the interaction between activity levels and sleep quality with the development and progression of human disease. These analyses will generate hypotheses guiding clinical and research interventions focused on activity and sleep to reduce morbidity and mortality in patients seeking care.

Project Purpose(s)

  • Population Health
  • Social / Behavioral

Scientific Approaches

We will examine the relationship between derived heart scores from the AHA's Essential 8 and the prevalence and progression of coded human diseases. We will perform variable/model selection to study the degree to which each of the AHA's Essential 8 factors impacts outcomes. We will use the Fitbit data, EHR-curated diagnoses, laboratory values, quality of life survey results, and clinical outcomes (hospitalizations/mortality).

Anticipated Findings

We expect to find that lower levels of activity and sleep are associated with a higher prevalence and more rapid progression of chronic diseases. We may find clustering in activity and disease prevalence/severity which would motivate studies/interventions to reduce these health disparities. We may also find patterns in seasonal, weekly, or daily patterns in physical activity lead to differences in outcomes.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Neil Zheng - Graduate Trainee, Yale University
  • Hiral Master - Project Personnel, All of Us Program Operational Use
1 - 25 of 5357
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.