Research Projects Directory

Research Projects Directory

8,307 active projects

This information was updated 12/9/2023

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

CS541 Final Project

We aim to study whether cholesterol and triglyceride levels can be better predicted with transfer learning so as to benefit minority groups.

Scientific Questions Being Studied

We aim to study whether cholesterol and triglyceride levels can be better predicted with transfer learning so as to benefit minority groups.

Project Purpose(s)

  • Educational

Scientific Approaches

We plan to use deep learning, specifically transfer learning. We aim to learn from genomic datasets and survey responses to try and predict obesity biomarkers, and to improve performance on less well-represented racial and ethnic groups.

Anticipated Findings

We anticipate that the use of transfer learning will enable the model to perform better on minority classes than previously possible, and thus enable more accurate predictions of at-risk individuals for obesity or high cholesterol based on their genetic and EHR data.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • youssef benchikhi - Undergraduate Student, Worcester Polytechnic Institute
  • Neil Kale - Undergraduate Student, Worcester Polytechnic Institute
  • Cameron Schloer - Graduate Trainee, Worcester Polytechnic Institute

Duplicate of 3. Cirrhosis_v7

The aim in the proposed work is to characterize the genetic and non-genetic contributions to liver-related disease development in multiple populations.

Scientific Questions Being Studied

The aim in the proposed work is to characterize the genetic and non-genetic contributions to liver-related disease development in multiple populations.

Project Purpose(s)

  • Educational

Scientific Approaches

I would like to obtain genetic data of WGS and GWAS (array) and non-genetic data with clinical and epidemiological parameters including individuals with various liver-related diseases (cases) and those without liver-related diseases (controls).

Anticipated Findings

I will implement association tests to identify genetic variants influencing directly liver-related disase susceptibility and genetic epistasis analysis to develop the genetic interaction models using plink and R.

Demographic Categories of Interest

  • Race / Ethnicity
  • Sex at Birth
  • Gender Identity
  • Geography

Data Set Used

Controlled Tier

Research Team

Owner:

bioage_pred_20231208

I intend to evaluate key determinants in healthspan and how they vary across population strata using AI-based methods.

Scientific Questions Being Studied

I intend to evaluate key determinants in healthspan and how they vary across population strata using AI-based methods.

Project Purpose(s)

  • Population Health

Scientific Approaches

I plan to use all available data and AI-based methods to uncover variation with respect to markers of healthspan and wellness.

Anticipated Findings

Findings from this work will highlight the variability in healthspan across diverse populations and identify targets to alter the trajectories of those otherwise predisposed to poor health.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

GLP1-A for alcohol use disorder

Alcohol use disorder (AUD) affects over 2 billion individuals worldwide and is associated with a number of health and social complications. There are currently three medications approved by the Food and Drug Administration (FDA) for treatment of AUD, however the…

Scientific Questions Being Studied

Alcohol use disorder (AUD) affects over 2 billion individuals worldwide and is associated with a number of health and social complications. There are currently three medications approved by the Food and Drug Administration (FDA) for treatment of AUD, however the effectiveness of these medications are modest at best.  
Emerging evidence suggests that the glucagon like peptide 1 agonists (GLP-1As) may be effective in reducing alcohol use. GLP-1As are currently approved for treatment of diabetes mellitus and obesity.  We aim to review the data of patients with diabetes mellitus and/or obesity who are taking GLP-1A versus those not taking GLP-1A, to compare their alcohol use. We will check the odds ratio/prevalence of alcohol use disorder among patients with or without GLP-1A. Other comparisons will include liver enzyme levels, prevalence of hepatic steatosis, and other comorbid conditions among those taking GLP-1A versus those not taking GLP-1A.

Project Purpose(s)

  • Disease Focused Research (alcohol abuse)

Scientific Approaches

a cross-sectional study to compare patients with diabetes mellitus and/or obesity who are on GLP-1A vs not receiving medication, assessing differences in alcohol consumption and the prevalence of alcohol-related liver disease.
Inclusion criteria: Adult patients aged 18 to 70 years with a diagnosis of type 2 diabetes or obesity (defined as a body mass index [BMI] greater than 30 kg/m2 in non-Asians, and BMI greater than 25kg/m2 in Asians.
Analysis steps : Participants will be sub-stratified into GLP1(+) versus GLP1(-).
The presence of alcohol use disorder will be collected from medical records. The degree of alcohol consumption will be assessed.
Demographic and clinical data, including age, sex, BMI, comorbidity, laboratory data including liver enzymes, kidney tests, will be collected from medical records. Alcohol use disorder will be classified according to the CDC classification system, Differences between groups will be analyzed using the chi-square test or Fisher’s exact test.

Anticipated Findings

We hypothesize that patients receiving GLP-1A treatment will exhibit a lower prevalence of alcohol use disorder and alcohol related liver disease compared to those not receiving GLP-1A.
If the data support our hypothesis, this finding would further support emerging evidence suggesting the GLP-1As have a role in reducing alcohol intake and potentially decreasing alcohol use disorder.
It is important to note that these anticipated findings are based on emerging evidence, and the actual results may differ. The study's outcomes will provide valuable scientific insights into the relationship between GLP-1A treatment, alcohol use disorder, alcohol-associated liver disease, and can potentially guide future research and clinical practice in this field.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Khaled Zahrawi - Project Personnel, Yale University

Genetics of ZIP8 in Crohn's disease

How frequent SLC38A9 (ZIP8 protein) mutations are in Crohn's disease and complicated Crohn's disease, compared to mutations in NOD2 and ATG16L1, IRGM, and IL23R, and whether consumption of glucosamine reduces severity and complications of Crohn's disease specifically in the SLC38A9…

Scientific Questions Being Studied

How frequent SLC38A9 (ZIP8 protein) mutations are in Crohn's disease and complicated Crohn's disease, compared to mutations in NOD2 and ATG16L1, IRGM, and IL23R, and whether consumption of glucosamine reduces severity and complications of Crohn's disease specifically in the SLC38A9 population, who have a glycosylation defect.

Project Purpose(s)

  • Disease Focused Research (Crohn's disease)

Scientific Approaches

Concept sets for crohn's disease with and without complications, genetics of five major genes for CD, glucosamine consumption, and need for steroids, hospitalization, and intestinal resection surgery.

Anticipated Findings

Possibly that mutations in SLC38A9 are associated with more complicated and more severe Crohn's disease, and that glucosamine consumption reduces the severity in these patients, but not in patients with NOD2, IRGM, IL23R, or ATG16L1 variants

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Peter Higgins - Late Career Tenured Researcher, University of Michigan

Lichen planus v7

Lichen planus (LP) is a chronic and common inflammatory and immune mediated disease, including skin. Hypertrophic lichen planus (HLP) and cutaneous squamous cell carcinoma (cSCC) share many clinical and histopathologic characteristics, making them difficult to distinguish. Previous studies identified susceptibility…

Scientific Questions Being Studied

Lichen planus (LP) is a chronic and common inflammatory and immune mediated disease, including skin. Hypertrophic lichen planus (HLP) and cutaneous squamous cell carcinoma (cSCC) share many clinical and histopathologic characteristics, making them difficult to distinguish. Previous studies identified susceptibility loci for cSCC, inculding genes in the pigmentation pathway and immune regulation. We are investigating the susceptibility loci associated with LP.

Project Purpose(s)

  • Disease Focused Research (Lichen planus)
  • Ancestry

Scientific Approaches

To address the knowledge gaps, we will perform a genome-wide association studies (GWAS) of LP among non-Hispanic white individuals in the Mass General Brigham (MGB) Biobank, All of Us, and UK Biobank (UKB) cohorts and then perform the meta-analyses.

Anticipated Findings

We are looking for novel loci that affect LP pathogenesis. Our research will clarify the pathophysiological mechanisms behind LP.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Yuhree Kim - Other, University of Colorado, Denver

Depression, coronary artery disease and WBC counts

Depression and coronary artery disease (CAD) are prevalent conditions responsible for significant morbidity and mortality which are highly comorbid and both exhibit inflammatory etiologies. Investigating the joint effects of these two conditions on systemic inflammation may elucidate mechanisms of their…

Scientific Questions Being Studied

Depression and coronary artery disease (CAD) are prevalent conditions responsible for significant morbidity and mortality which are highly comorbid and both exhibit inflammatory etiologies. Investigating the joint effects of these two conditions on systemic inflammation may elucidate mechanisms of their cooccurrence and provide improved recommendations for prevention and treatment.

In this analysis we investigate several specific questions:
1. Do individuals with comorbid depression and CAD have higher median WBC counts than individuals without comorbid depression-CAD or individuals with only one diagnosis?
2. Do WBC count trajectories over time change in response to diagnosis with depression or CAD, and does the order of diagnosis matter?
3. Do WBC trajectories over time change in response to initiation of treatment for CAD (statins) and depression (anti-depressants)?
4. Does underlying genetic susceptibility for depression or CAD modify the associations explored above?

Project Purpose(s)

  • Disease Focused Research (Depression and coronary artery disease)
  • Ancestry

Scientific Approaches

We plan to use the EHR data from All of Us as well as the genomic data to investigate these questions. Analysis will mainly be completed in R, and statistical techniques include multivariable logistic regressions for non-repeated measures data and linear mixed effects models for longitudinal data.

Anticipated Findings

We anticipate that individuals with comorbid depression and CAD will exhibit higher WBC counts, and that initiation of treatment for these conditions will decrease WBC counts.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Gene-phewas

The pV122I mutation in the TTR gene is the primary genetic variant associated with hereditary transthyretin-mediated (hATTR) amyloidosis, characterized by abnormal amyloid fibril accumulation leading to cardiac and neurologic complications. Despite 3-4% of African Americans (AAs) carrying pV122I, there's limited…

Scientific Questions Being Studied

The pV122I mutation in the TTR gene is the primary genetic variant associated with hereditary transthyretin-mediated (hATTR) amyloidosis, characterized by abnormal amyloid fibril accumulation leading to cardiac and neurologic complications. Despite 3-4% of African Americans (AAs) carrying pV122I, there's limited research on hATTR development. AAs with this mutation face challenges in accessing testing, resulting in higher mortality and delayed diagnoses. Often, diagnoses occur at advanced stages with complications like heart failure or cardiomyopathy. The insufficient understanding of hATTR-related diseases among AAs, combined with significant underdiagnosis, worsens health disparities, particularly in a population disproportionately affected by systemic racism in healthcare. This study aims to investigate the disease penetrance of the pV122I mutation in heart failure and cardiomyopathy among AAs in the All of Us Program.

Project Purpose(s)

  • Disease Focused Research (cardiovascular disease)
  • Ancestry

Scientific Approaches

To assess the incidence of hATTR-associated heart failure, and cardiomyopathy, we will use Cox proportional hazard. We will investigate the penetrance of V122I and study the incidence of heart failure, or cardiomyopathy among patients with clinical evidence of amyloid deposit. Lastly, we will examine the incidence of heart failure or cardiomyopathy following the first occurrence of the V122I mutation.

Anticipated Findings

Our research anticipates the identification of various genotypes associated with a spectrum of phenotypic outcomes, extending beyond cardiovascular manifestations, concerning individuals of African descent carrying the pV122I mutation. This comprehensive exploration aims to elucidate the broader impact of the mutation on diverse physiological domains. By delving into potential associations with different phenotypic outcomes, our study seeks to provide a nuanced understanding of the multifaceted effects of the pV122I mutation on the health of individuals within the African descent population.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

Duplicate of Data Wrangling in All of Us Program (v7)

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Scientific Questions Being Studied

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Project Purpose(s)

  • Educational
  • Other Purpose (For use with Office hours. notebooks for adding code snippets useful for researchers. This is a placeholder for creating notebooks for best practices among other things)

Scientific Approaches

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Anticipated Findings

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Duplicate of Data Wrangling in All of Us Program (v7)_lim

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Scientific Questions Being Studied

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Project Purpose(s)

  • Educational
  • Other Purpose (For use with Office hours. notebooks for adding code snippets useful for researchers. This is a placeholder for creating notebooks for best practices among other things)

Scientific Approaches

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Anticipated Findings

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Hyeyeun Lim - Graduate Trainee, Baylor College of Medicine

Duplicate of Workshop: Intro to All of Us Genomics Data

This workspace is meant to help researchers get familiar with the All of Us Researcher Workbench. There are five hands-on exercises during the workshop, each with a specific notebook. Exercise 1: Duplicate the workspace & start the cloud environment Exercise…

Scientific Questions Being Studied

This workspace is meant to help researchers get familiar with the All of Us Researcher Workbench. There are five hands-on exercises during the workshop, each with a specific notebook.
Exercise 1: Duplicate the workspace & start the cloud environment
Exercise 2: Looking at the genomic data (notebook)
Exercise 3: GWAS - extracting phenotypic data (notebook)
Exercise 4: GWAS - running Hail GWAS (notebook)
Exercise 5: Advanced GWAS (2 notebooks)

By running the exercises in this workspace, researchers will become more familiar with the genomic data, know how to access the genomic data, see how the genomic data and tools can be used in the Researcher Workbench, and be able to start their own genomic data project.

Project Purpose(s)

  • Other Purpose (This workspace is meant for use during the Introduction to Analyzing All of Us Genomic Data workshop. In this workshop, participants will get hands-on experience using the genomics data running a genome-wide association study (GWAS) using Hail. )

Scientific Approaches

We are using the All of Us dataset in order to run a genome-wide association study (GWAS) using Hail. In the workshop, we will give an introduction to the All of Us Researcher Workbench and demonstrate how to use the Cohort Builder and Jupyter Notebooks to set up a research project. Using Jupyter notebooks, we will create a dataset linking the All of Us phenotypic data to the short read whole genome sequencing (srWGS) data. After running the GWAS steps using Hail, we will visualize the results.

Anticipated Findings

This study is running a genome-wide association study (GWAS) using Hail, using height as the selected phenotypic data. We do not anticipate findings from this example workspace but we expect that workshop participants will be able to apply similar methods to their future research.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Duplicate of Workshop: Intro to All of Us Genomics Data

This workspace is meant to help researchers get familiar with the All of Us Researcher Workbench. There are five hands-on exercises during the workshop, each with a specific notebook. Exercise 1: Duplicate the workspace & start the cloud environment Exercise…

Scientific Questions Being Studied

This workspace is meant to help researchers get familiar with the All of Us Researcher Workbench. There are five hands-on exercises during the workshop, each with a specific notebook.
Exercise 1: Duplicate the workspace & start the cloud environment
Exercise 2: Looking at the genomic data (notebook)
Exercise 3: GWAS - extracting phenotypic data (notebook)
Exercise 4: GWAS - running Hail GWAS (notebook)
Exercise 5: Advanced GWAS (2 notebooks)

By running the exercises in this workspace, researchers will become more familiar with the genomic data, know how to access the genomic data, see how the genomic data and tools can be used in the Researcher Workbench, and be able to start their own genomic data project.

Project Purpose(s)

  • Other Purpose (This workspace is meant for use during the Introduction to Analyzing All of Us Genomic Data workshop. In this workshop, participants will get hands-on experience using the genomics data running a genome-wide association study (GWAS) using Hail. )

Scientific Approaches

We are using the All of Us dataset in order to run a genome-wide association study (GWAS) using Hail. In the workshop, we will give an introduction to the All of Us Researcher Workbench and demonstrate how to use the Cohort Builder and Jupyter Notebooks to set up a research project. Using Jupyter notebooks, we will create a dataset linking the All of Us phenotypic data to the short read whole genome sequencing (srWGS) data. After running the GWAS steps using Hail, we will visualize the results.

Anticipated Findings

This study is running a genome-wide association study (GWAS) using Hail, using height as the selected phenotypic data. We do not anticipate findings from this example workspace but we expect that workshop participants will be able to apply similar methods to their future research.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Ghada Soliman - Other, City University of New York (CUNY)
  • Jennifer Zhang - Project Personnel, All of Us Program Operational Use
  • Christopher Lord - Project Personnel, All of Us Program Operational Use
  • Chris Lord - Project Personnel, All of Us Program Operational Use

Collaborators:

  • Genevieve Brandt - Project Personnel, All of Us Program Operational Use

Increased Risk for Postpartum Depression in IMID-affected Women

Despite evidence that postpartum depression (PDD) has lasting negative outcomes for affected women, and their families alike, there is a gross lack of research to enhance clinical care for women’s mental health. The purpose of this inquiry is to utilize…

Scientific Questions Being Studied

Despite evidence that postpartum depression (PDD) has lasting negative outcomes for affected women, and their families alike, there is a gross lack of research to enhance clinical care for women’s mental health. The purpose of this inquiry is to utilize the All of Us database to discover if there is a correlation between genomic variants and the increased rates of postpartum depression (diagnosed within 12 months of pregnancy) for those impacted by immune-mediated inflammatory diseases (IMIDs) – multiple sclerosis (MS), Crohn’s disease (CD), rheumatoid disorders (RD) [axspA, PsA, RA], and allergic rhinitis (AR) – when age, ancestry, BMI, number of steps, access to health resources, household number, and alcohol intake are taken into account.

Project Purpose(s)

  • Disease Focused Research (multiple sclerosis, Crohn's disease, rheumatoid disorders, allergic rhinitis)

Scientific Approaches

This study will use the surveys, EHR medical histories, and genomic data as provided by the amazing participants in the All of Us database. Using genetic analyses (Genome Wide Association Studies), coding languages such as Linux and R, software such as PLINK and ADMIXTURE, and statistical methods (logistic regression), I will identify areas of the genome among women affected by inflammatory chronic conditions such as multiple sclerosis that have a statistically significant association with risk for postpartum depression.

Anticipated Findings

From various genetic analyses, I anticipate finding significant SNPs (risk alleles) that will highlight potential areas of genetic risk for postpartum development in women affected by immune-mediated inflammatory conditions, as well as provide genomic regions of interest for further study through PheWAS and in vivo cell lines. Ultimately, this research aims to deepen understanding of the genetic framework of PDD in IMID-affected women, potentially improving risk identification and preventive management in clinical settings as well as contributing to the development of pharmaceutical gene therapy treatments.

Demographic Categories of Interest

  • Race / Ethnicity
  • Access to Care

Data Set Used

Controlled Tier

Research Team

Owner:

  • Mary Davis - Early Career Tenure-track Researcher, Brigham Young University
  • Kayden Hadlock - Undergraduate Student, Brigham Young University
  • Hannah Snarr - Undergraduate Student, Brigham Young University
  • Steven Brugger - Graduate Trainee, Brigham Young University
  • Jacob Gwilliam - Undergraduate Student, Brigham Young University
  • Alyks Odell - Undergraduate Student, Brigham Young University

MS and Pregnancy

Determine if there is a correlation between relapse rate from treatment types for MS before pregnancy and after pregnancy. Using this workspace to better understand All of Us as well.

Scientific Questions Being Studied

Determine if there is a correlation between relapse rate from treatment types for MS before pregnancy and after pregnancy. Using this workspace to better understand All of Us as well.

Project Purpose(s)

  • Disease Focused Research (multiple sclerosis)
  • Educational

Scientific Approaches

Using billing ICD codes we will be able to better identify individuals. Looking at patients who have been diagnose with multiple sclerosis, have a pregnancy, and treatment types we hope to see a correlation between the type of treatment effects on relapse before and after pregnancy.

Anticipated Findings

Using these findings, we hope to help contribute to understanding the effects of treatment types and specifically how relapse after pregnancy is effected by that.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Access to Care
  • Education Level

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Kayden Hadlock - Undergraduate Student, Brigham Young University
  • Hannah Snarr - Undergraduate Student, Brigham Young University
  • Steven Brugger - Graduate Trainee, Brigham Young University
  • Alyks Odell - Undergraduate Student, Brigham Young University

Pregnancy in women with autoimmune rheumatic conditions

At this stage, I am exploring the data to formalize a specific research question. However, my ultimate aim to use this dataset to explore various aspects pertaining to pregnancy outcomes in patients with autoimmune rheumatic diseases (ARDs). My major focus…

Scientific Questions Being Studied

At this stage, I am exploring the data to formalize a specific research question. However, my ultimate aim to use this dataset to explore various aspects pertaining to pregnancy outcomes in patients with autoimmune rheumatic diseases (ARDs). My major focus is to assess if there are racial/ethnic differences/disparities in care delivery during pregnancy, and racial differences in comorbidity/multimorbidity burden and impact on pregnancy outcomes for various ARDs.
Importance: It is known that pregnant women with ARDs have more adverse pregnancy outcomes compared to general population of pregnant women. Additionally, compared to White women, studies have showed worse outcomes in women of other racial/ethnic groups. It is important to understand the reason for these disparate outcomes, and to understand if there are differences in care delivery patterns or comorbidities, so that we can identify ways to mitigate the disparities.

Project Purpose(s)

  • Disease Focused Research (Autoimmune diseases)
  • Population Health

Scientific Approaches

I plan to identify pregnant women with ARDs such as systemic lupus erythematous, rheumatoid arthritis, spondyloarthropathy, vasculitis, Sjogren's syndrome, etc. with associated data on their race/ethnicity and other demographic parameters such as socioeconomic status, as available. I plan to do a comparative analysis of various aspects of care delivery among those patients as available in the database, for example, access to care (timing and number of antenatal visits, visits to rheumatologists, medication management), insurance coverage, etc, as available in the dataset. I also plan to compare racial differences in comorbidities and pregnancy outcomes across conditions. Other factors like outcomes based on geographical location will also be compared.

Anticipated Findings

If there are differences in comorbidity patterns among pregnant women of different racial groups, across different rheumatic conditions and across geographical location, which impact outcomes, efforts should be made to address these disparities, such as ways to ensure equitable pregnancy care delivery and comorbidity management. This could possible impact adverse pregnancy outcomes which are major issues for pregnant women with ARDs and their infants/children.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

  • Rashmi Dhital - Research Fellow, University of California, San Diego

Midlife women

Racial disparity in healthcare access among midlife women. To understand health disparity in midlife women.

Scientific Questions Being Studied

Racial disparity in healthcare access among midlife women. To understand health disparity in midlife women.

Project Purpose(s)

  • Population Health

Scientific Approaches

Women aged 40-60 years old will be included. Looking at the survey data, health disparity will be evaluated on the questions related to healthcare access. Geographical distribution, income and educational levels will be examined.

Anticipated Findings

We anticipate racial disparity in the healthcare access and hormonal therapy use among midlife women.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Akshaya Bhagavathula - Early Career Tenure-track Researcher, North Dakota State University

NM_disease_study

NIH All of Us research program opened a new frontier in genomic medicine and holds promise to augment and re-define our understanding of common neuromuscular (NM) diseases and instruct future practice with real-time bedside experiences. The project is funded by…

Scientific Questions Being Studied

NIH All of Us research program opened a new frontier in genomic medicine and holds promise to augment and re-define our understanding of common neuromuscular (NM) diseases and instruct future practice with real-time bedside experiences. The project is funded by Mayo Clinic DLMP to investigate common NM genetic mutations as a pilot.

Project Purpose(s)

  • Disease Focused Research (neuromuscular disease)
  • Drug Development
  • Methods Development
  • Control Set
  • Ancestry

Scientific Approaches

The aims of this pilot study are designed accordingly as below:
Aim 1. Examine the selected common neuromuscular genetic mutations including small variants and copy number variants using CNVpytor.
Aim 2. Apply participant population annotation to contrast NM diseases in a reported Mayo Clinic cohort [PMID: 34706972].
Aim 3. Examine common neuromuscular genetic mutations by age and population groups as a model for an efficient clinical diagnosis algorithm.

Anticipated Findings

Neuromuscular (NM) diseases, such as Charcot-Marie-Tooth disease and genetic muscular dystrophies [PMID: 25832668], are relatively common. Our preliminary analysis revealed a high frequency of common NM gene mutations in other genetic databases. This project is to validate several Mayo Clinic studies in a larger cohort.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Geography

Data Set Used

Registered Tier

Research Team

Owner:

Duplicate of sociodemographic and genetic determinants of infectious diseases

For most infectious diseases, there are considerable disparities in outcomes among diverse racial and ethnic communities. The COVID-19 pandemic brought the issue of racial/ethnic disparities in infectious disease outcomes to the forefront of science and medicine with several studies showing…

Scientific Questions Being Studied

For most infectious diseases, there are considerable disparities in outcomes among diverse racial and ethnic communities. The COVID-19 pandemic brought the issue of racial/ethnic disparities in infectious disease outcomes to the forefront of science and medicine with several studies showing that infection incidence was significantly higher, and the outcomes were more severe among African Americans and Hispanics compared to matched white Americans. We showed a similar pattern for other infectious diseases within our health system, where most infectious diseases are significantly more prevalent in Filipino, African American, and Puerto Rican communities compared to white Americans in New York City. Identifying the factors contributing to disparities in infectious disease outcomes and understanding their relative role in each community is essential for reducing the burden of infectious diseases, improving health equity, and reducing the high costs associated with health inequity.

Project Purpose(s)

  • Disease Focused Research (disease by infectious agent)

Scientific Approaches

we will:
define fine-scale communities by combining self-reported race/ethnicity with genetically inferred ancestry data, compare the prevalence of infectious diseases in sub-groups and identify infectious diseases that are differentially prevalent among sub-groups adjusted for compare demographic, behavioral, and social determinants of health, perform global ancestry inference, test the association of global ancestry proportions with infection outcome accounting for demographic, behavioral, and social determinants of health, for infectious diseases that are significantly associated with genetic ancestry, we will perform admixture mapping

Anticipated Findings

Our results will be valuable in understanding inequities in environmental and sociodemographic contributors of health and the role of these factors in disparities in infectious disease outcomes. Our findings will broaden our understanding of both human evolutionary history and human genetic factors contributing to differential susceptibility to present-day pathogens. Together these results can contribute to developing more effective therapies and public health policies to reduce the burden of infectious disease in an equitable manner. Beyond this project, our fine-scale population structure results, global and local ancestry inferences, and data about the prevalence of infectious diseases and demographic, behavioral, and social determinants of health in communities will be available to the scientific community and All of US users as a resource for future similar studies on other diseases.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Samira Asgari - Early Career Tenure-track Researcher, Icahn School of Medicine at Mount Sinai

Collaborators:

  • Abhijith Biji - Graduate Trainee, Icahn School of Medicine at Mount Sinai

Dental and Oral Health in All of Us Data

I am the co-leader of the OHDSI Dentistry Workgroup. Our group is interested in several areas of observational research related to dentistry and oral health. One proposed use case for the OMOP CDM is to examine disparities in access to…

Scientific Questions Being Studied

I am the co-leader of the OHDSI Dentistry Workgroup. Our group is interested in several areas of observational research related to dentistry and oral health. One proposed use case for the OMOP CDM is to examine disparities in access to dental care as evidenced by the characteristics of people who visit the emergency department for oral care, or who have to receive their oral care as an inpatient due to disabilities. We would like to explore these issues in the All of Us Research study data set.

Project Purpose(s)

  • Disease Focused Research (tooth disease)

Scientific Approaches

This is a highly exploratory and descriptive study.. I anticipate using frequencies and percent. I may also use t-test, ANOVA, chi-square, regression, and correlations depending on my findings in the exploratory analysis. You can get an idea of the condition concepts of interest to me for this analysis here: https://databrowser.researchallofus.org/ehr/conditions/Dental

Anticipated Findings

Since this is highly descriptive, we are not sure what to expect. However, just asking the question and sharing results is a good step in starting the dialogue. We also hope that we can share our methods with the larger OHDSI community to contribute to dental phenotyping and network study activities.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

Duplicate of Demo - PheWAS Smoking

As a demonstration project, this study will present the results of Phenome-Wide Association Studies (PheWAS) to show how the various sources of data contained within All of Us research dataset can be used to inform scientific discovery. We will perform…

Scientific Questions Being Studied

As a demonstration project, this study will present the results of Phenome-Wide Association Studies (PheWAS) to show how the various sources of data contained within All of Us research dataset can be used to inform scientific discovery. We will perform separate PheWAS studies with smoking status as the independent variable. Specific questions include:

1. How can one implement a PheWAS within the All of Us Researcher Workbench?
2. How can one use heterogeneous data sources within the All of Us dataset to explore disease associations using self-reported exposures (Participant Provided Information, or “PPI”) and exposures captured in the electronic medical record (EHR).

Project Purpose(s)

  • Methods Development
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)

Scientific Approaches

As a method for assessing the health burden of smoking on potential observed phenotypes, we implement a Phenome-Wide Association study. A Phenome-wide association study consists of an array of association tests over an indexed representation of the human phenome. In this analysis, we will conduct PheWAS for EHR derived smoking and PPI derived smoking exposures included in the All of Us research dataset. We will be representing "Smoking Exposure” in three ways:
EHR Smoking ICD Billing Codes
Participant Provided Information (PPI) Smoking lifetime 100 cigarettes yes/no
Participant Provided Information (PPI) Smoking lifetime smoking everyday
To perform PheWAS, we will map ICD representations of disease to a common vocabulary of PheCodes. We then use Jupyter Notebooks to create reusable functions to perform PheWAS and generate Manhattan Plots to summarize associations.

Anticipated Findings

For this study, we anticipate that we will be able to replicate known disease associations with smoking exposure. This will serve to demonstrate the quality, utility, and diversity of the All of Us data and tools and the power of gathering multiple data sources for a single phenotype, providing researchers options for study design and validation. Importantly the entire pheWAS package is made available for reuse by researchers in the Workbench, for new hypothesis generation.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Caleb Lareau - Senior Researcher, Memorial Sloan Kettering Cancer Center

Working Duplicate of Epidemiology of Inflammatory Skin Conditions v7 Template

Our focus is on epidemiology of inflammatory skin conditions, such as eczema, psoriasis, hidradenitis suppurativa, and ichthyosis. The scientific questions we intend to study are many, including associations between these inflammatory skin conditions and cardiovascular disease, psychiatric comorbidities, allergic diseases…

Scientific Questions Being Studied

Our focus is on epidemiology of inflammatory skin conditions, such as eczema, psoriasis, hidradenitis suppurativa, and ichthyosis. The scientific questions we intend to study are many, including associations between these inflammatory skin conditions and cardiovascular disease, psychiatric comorbidities, allergic diseases and rheumatologic disease. Some inflammatory skin conditions, such as eczema, are common, while others like ichthyosis are rare. Across the spectrum, epidemiology of dermatologic disease is understudied. We hope to shed light on associations between both common and rare inflammatory skin conditions and potential risk factors.

Project Purpose(s)

  • Disease Focused Research (Inflammatory skin disease)

Scientific Approaches

We plan to use logistic regression in univariate and multivariate analyses. These will primarily be cross-sectional studies or case-control studies. Covariates that we plan to include in our models are disease-related EHR diagnoses as well as demographic factors of age, sex, race and ethnicity, and survey data including smoking history and recreational drug use.

Anticipated Findings

We hope to better describe the burden of inflammatory skin conditions among different racial, ethnic and age groups, and we hope to show novel associations between inflammatory skin conditions and other diseases including cardiovascular disease, autoimmune disease, and metabolic disease. These data will help improve the treatment of inflammatory skin diseases.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Shayaa Muhammad - Project Personnel, City University of New York (CUNY)
  • Mitchel Wride - Graduate Trainee, Yale University

Duplicate of SDoH Subtyper for AoU controlled tier dataset v7

Exploring 19 SDoH variables defined on prior Dementia Workspace with relation to delayed care and inability to afford care within the realm of hypertension, diabetes, and osteoarthritis.

Scientific Questions Being Studied

Exploring 19 SDoH variables defined on prior Dementia Workspace with relation to delayed care and inability to afford care within the realm of hypertension, diabetes, and osteoarthritis.

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes mellitus, osteoarthritis, and hypertension)
  • Educational
  • Methods Development
  • Ancestry

Scientific Approaches

Descriptive analyses of the cohort and subsequent demographics will be done with Seaborn and R. Bi-clustering will be done with ExplodeLayout and Bipartite Modularity.

Anticipated Findings

Certain subtypes of these disease groups may have more SDoH variables answered that may help with future interventions. Developing a generalizable method to analyze AoU data is also important.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Weibin Zhang - Project Personnel, University of Texas Medical Branch (UTMB) at Galveston

Update Rare Variant Computational Predictor Optimization

The consequence of the vast majority of human genetic variation is unknown. Numerous computational predictors of the effects of genetic variants exist. Understanding the capability of these predictors to detect disease-causing variants is essential in the broader effort to understand…

Scientific Questions Being Studied

The consequence of the vast majority of human genetic variation is unknown. Numerous computational predictors of the effects of genetic variants exist. Understanding the capability of these predictors to detect disease-causing variants is essential in the broader effort to understand how specific genetic variants contribute to human diseases.

Project Purpose(s)

  • Methods Development
  • Ancestry

Scientific Approaches

I will assess the performance of state-of-the-art computational predictors in the detection of rare genetic variants associated with disease.

Anticipated Findings

I anticipate that some predictors will perform better than others. This will inform the use of this class of predictors in studying genetic variation in human disease.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Deep Learning Final Project - Transfer Learning to Better Represent Minorities

We aim to study whether health outcomes can be better predicted with transfer learning to benefit minority groups who have had less data to help create models that can accurately help them. We want to reproduce results similar to this…

Scientific Questions Being Studied

We aim to study whether health outcomes can be better predicted with transfer learning to benefit minority groups who have had less data to help create models that can accurately help them. We want to reproduce results similar to this paper for our final project in CS 541 - Deep Learning at Worcester Polytechnic Institute

Gao, Y., & Cui, Y. (2020). Deep transfer learning for reducing health care disparities arising from biomedical data inequality. Nature Communications, 11(1), 5131. https://doi.org/10.1038/s41467-020-18918-3

Project Purpose(s)

  • Educational

Scientific Approaches

We plan to use deep learning, specifically transfer learning. We aim to learn from genomic datasets and survey responses to try and predict health biomarkers and to improve performance on less well-represented racial and ethnic groups.

Anticipated Findings

We anticipate that the use of transfer learning will enable the model to perform better on minority classes than previously possible, and thus enable more accurate predictions of at-risk individuals who have less data to represent them.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

Duplicate of Phenotype - Breast Cancer (v7)

The Notebooks in this Workspace can be used to implement well-known phenotype algorithms in one’s own research.

Scientific Questions Being Studied

The Notebooks in this Workspace can be used to implement well-known phenotype algorithms in one’s own research.

Project Purpose(s)

  • Disease Focused Research (breast cancer)
  • Population Health
  • Educational
  • Methods Development
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

Ning Shang, George Hripcsak, Chunhua Weng, Wendy K. Chung, & Katherine Crew. Breast Cancer. Retrieved from https://phekb.org/phenotype/breast-cancer.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Registered Tier

Research Team

Owner:

1 - 25 of 8307
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.