Research Projects Directory

Research Projects Directory

16,047 active projects

This information was updated 2/13/2025

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

Depression Treatment Efficacy

I seek to understand how genetic, lifestyle, and demographic factors influence depression treatment efficacy.

Scientific Questions Being Studied

I seek to understand how genetic, lifestyle, and demographic factors influence depression treatment efficacy.

Project Purpose(s)

  • Educational

Scientific Approaches

I intend to create a machine learning model. I will need to access mental health data, such as PHQ-9 scores, treatment modalities, such as SSRI prescribed, and genetic data, such as known genes that metabolize SSRIs.

Anticipated Findings

Many patients have difficulty finding the correct treatment for Major Depressive Disorder (MDD). Typically, a myriad of factors contribute to whether or not patients respond well to treatment. There is currently not a good way for clinicians to decide what treatment patient's will respond best to. This model seeks to understand the features that most contribute to MDD treatment efficacy.

Demographic Categories of Interest

  • Disability Status

Data Set Used

Registered Tier

Research Team

Owner:

[Version 8] Food Insecurity and Neighborhood Deprivation in MASLD

Metabolic dysfunction-associated steatotic liver disease (MASLD) disproportionately affects socioeconomically disadvantaged populations. While food insecurity and neighborhood deprivation independently predict adverse liver outcomes, their synergistic effect on MASLD development remains unknown. We aimed to investigate whether neighborhood deprivation modifies the association…

Scientific Questions Being Studied

Metabolic dysfunction-associated steatotic liver disease (MASLD) disproportionately affects socioeconomically disadvantaged populations. While food insecurity and neighborhood deprivation independently predict adverse liver outcomes, their synergistic effect on MASLD development remains unknown. We aimed to investigate whether neighborhood deprivation modifies the association between food insecurity and incident MASLD risk in US adults.

Project Purpose(s)

  • Disease Focused Research (Metabolic Dysfunction-Associated Steatotic Liver Disease)

Scientific Approaches

We will conduct a retrospective cohort study using data from the All of Us Research Program. Food insecurity was assessed using self-reported surveys, and neighborhood deprivation was measured using ZIP code-based indices. The primary outcome was incident MASLD, identified through diagnostic codes and presence of metabolic conditions. Cox proportional hazards models will estimate the risk of MASLD in food-insecure versus food-secure individuals, adjusting for demographic factors (age, sex, race/ethnicity), socioeconomic factors (insurance, education, income), lifestyle factors (active smoking), and metabolic syndrome-associated conditions (obesity, hypertension, hyperlipidemia, and type 2 diabetes). We will conduct interactive analyses to assess the potential synergistic effect of food insecurity and neighborhood deprivation on MASLD risk.

Anticipated Findings

MASLD incidence is increased in food-insecurity group, compared to food-secure group. Food insecurity and neighborhood deprivation may show a positive interaction.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Yong Eun - Research Associate, New York City Health & Hospitals

Food Insecurity and Neighborhood Deprivation in MASLD

Metabolic dysfunction-associated steatotic liver disease (MASLD) disproportionately affects socioeconomically disadvantaged populations. While food insecurity and neighborhood deprivation independently predict adverse liver outcomes, their synergistic effect on MASLD development remains unknown. We aimed to investigate whether neighborhood deprivation modifies the association…

Scientific Questions Being Studied

Metabolic dysfunction-associated steatotic liver disease (MASLD) disproportionately affects socioeconomically disadvantaged populations. While food insecurity and neighborhood deprivation independently predict adverse liver outcomes, their synergistic effect on MASLD development remains unknown. We aimed to investigate whether neighborhood deprivation modifies the association between food insecurity and incident MASLD risk in US adults.

Project Purpose(s)

  • Disease Focused Research (Metabolic Dysfunction-Associated Steatotic Liver Disease)

Scientific Approaches

We will conduct a retrospective cohort study using data from the All of Us Research Program. Food insecurity was assessed using self-reported surveys, and neighborhood deprivation was measured using ZIP code-based indices. The primary outcome was incident MASLD, identified through diagnostic codes and presence of metabolic conditions. Cox proportional hazards models will estimate the risk of MASLD in food-insecure versus food-secure individuals, adjusting for demographic factors (age, sex, race/ethnicity), socioeconomic factors (insurance, education, income), lifestyle factors (active smoking), and metabolic syndrome-associated conditions (obesity, hypertension, hyperlipidemia, and type 2 diabetes). We will conduct interactive analyses to assess the potential synergistic effect of food insecurity and neighborhood deprivation on MASLD risk.

Anticipated Findings

MASLD incidence is increased in food-insecurity group, compared to food-secure group. Food insecurity and neighborhood deprivation may show a positive interaction.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Yong Eun - Research Associate, New York City Health & Hospitals

(SOORIN) Demonstration project - controlled tier

The aim of the study is to identify clinical, environmental, and genetic risk factors for disease and treatment outcomes and to develop precision medicine strategies. We are interested in demographics (sex, race/ethnicity, etc.), socioeconomic factors, environmental factors, clinical factors, and…

Scientific Questions Being Studied

The aim of the study is to identify clinical, environmental, and genetic risk factors for disease and treatment outcomes and to develop precision medicine strategies. We are interested in demographics (sex, race/ethnicity, etc.), socioeconomic factors, environmental factors, clinical factors, and genomic data (WGS/array).

Project Purpose(s)

  • Educational

Scientific Approaches

We plan to identify patients using ICD-9/10 and SNOMED codes. Chi-squared tests and t-tests will be used to compare cases and controls to identify risk factors. Logistic regression analysis and Cox proportional hazards models will be employed to develop prediction models.

Anticipated Findings

We anticipate that these studies will greatly enhance our knowledge of treatment risk factors and contribute to the development of treatment strategies in diverse U.S. populations.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • SOORIN HWANG - Undergraduate Student, Sungkyunkwan University, School of Pharmacy

Collaborators:

  • Jeong Yee - Teacher/Instructor/Professor, Sungkyunkwan University, School of Pharmacy
  • gayeong seo - Graduate Trainee, Sungkyunkwan University, School of Pharmacy

HAP 464 Antidepressant Analysis

I'm studying a class on how to analyze EHR data using a research project that investigates antidepressant response for various subpopulations.

Scientific Questions Being Studied

I'm studying a class on how to analyze EHR data using a research project that investigates antidepressant response for various subpopulations.

Project Purpose(s)

  • Educational

Scientific Approaches

We will use Python and SQL to manipulate and analyze data. We will use likelihood ratios for feature selection. We will create predictive models using logistic regressions. Population characteristics will be explored as well.

Anticipated Findings

I should be able to learn the basics EHR data analysis. For the project, the class will be able to identify predictors of antidepressant response andhow to predict antidepressant response.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Registered Tier

Research Team

Owner:

  • Ngoc Pham - Undergraduate Student, George Mason University

lung cancer treatment response

I intend to explore available data on lung cancer to learn more about the treatment responses. Lung cancer is the most deadly cancer, therefore, gaining more insights on treatment response can potentially aid in improving survival of patients.

Scientific Questions Being Studied

I intend to explore available data on lung cancer to learn more about the treatment responses. Lung cancer is the most deadly cancer, therefore, gaining more insights on treatment response can potentially aid in improving survival of patients.

Project Purpose(s)

  • Disease Focused Research (lung cancer)

Scientific Approaches

In this study, the exploration of relationship between treatment types and survival of lung cancer patients will be investigated.

Anticipated Findings

I hope to reveal more insights about treatment response for lung cancer. Lung cancer is the deadliest cancer, thus understanding different treatment methods could aid in improving survival of patients.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Elizabeth Kim - Undergraduate Student, University of California, San Diego

Collaborators:

  • Tianyi Chen - Undergraduate Student, University of California, San Diego

scad_exploration

Work on building a predictive model for SCAD.

Scientific Questions Being Studied

Work on building a predictive model for SCAD.

Project Purpose(s)

  • Disease Focused Research (Spontaneous Coronary Dissection)

Scientific Approaches

We are unsure of that at this point

Anticipated Findings

ML Model

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Epilepsy

what are the most effective strategies for reducing the epilepsy treatment gap in lower-income patients in the United States? It is relevant to public health because I had a sister with Epilepsy.

Scientific Questions Being Studied

what are the most effective strategies for reducing the epilepsy treatment gap in lower-income patients in the United States? It is relevant to public health because I had a sister with Epilepsy.

Project Purpose(s)

  • Educational

Scientific Approaches

Scientific approaches, research methods and too
- Models using rodents

Brain imaging techniques like MRI, &EEG, genetic analysis

Electrophysiological recordings

Computational modeling

Clinical Trials with patients

Advanced techniques like optogenetics to manipulate specific neurons in brains.

I am looking at sex assigned at birth and age at first occurrence as my variables.

The dataset I will be looking at includes different conditions for Epilepsy: Epilepsy, Localized- related Epilepsy, Generalized- convulsive epilepsy, Idiopathic generalized epilepsy, Epilepsy partialis convulsive, Refractory generalized convulsive, Refractory idiopathic generalized, Epilepsy not refractory

Anticipated Findings

Anticipated Findings: Female suffers from epilepsy more than males or other sex types assigned at birth.

When looking at the graph for epilepsy in different age groups. The graph appears to be bell-shaped- Data is normally distributed.

I can add to the existing research on epilepsy.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Duplicate of Social determinants of health and atrial fibrillation

This workspace contains notebooks compiling data on patients with atrial fibrillation (AF) including demographic information, baseline characteristics, comorbidities, and medications. The project's primary aim is to explore the relationship between social determinants of health (e.g. socioeconomic status, level of education,…

Scientific Questions Being Studied

This workspace contains notebooks compiling data on patients with atrial fibrillation (AF) including demographic information, baseline characteristics, comorbidities, and medications. The project's primary aim is to explore the relationship between social determinants of health (e.g. socioeconomic status, level of education, healthcare access, race, and gender) and AF burden. Secondary aims include identification of significant differences in management (e.g. rate/rhythm control, ablation procedures, and appropriate anticoagulation) and outcomes among cohorts. By gathering and analyzing these data, we can better understand healthcare disparities among AF patients and consider ways to improve outcomes.

Project Purpose(s)

  • Disease Focused Research (atrial fibrillation)
  • Population Health
  • Educational
  • Methods Development

Scientific Approaches

In this cross-sectional, population-based study, we will use All of Us baseline data from patient (age>18) provided information (PPI) surveys, fitbit and electronic health record (EHR) data and retrospectively examined the prevalence of AF using OMOP concept IDs. We will extract data on social determinants of health, systemic conditions and medications for this cohort, as well as physical measurements and vital signs. We will calculate a CHA2DS2_VASc score using python and BigQuery to retrieve comorbid conditions. We will use similar methods to compile medications, relevant procedures (e.g catheter ablation), lab measurements including systolic blood pressure, diastolic blood pressure, heart rate, race, gender, income, education provided by participants.

Anticipated Findings

We anticipate healthcare disparities to be re-demonstrated among AF patients across social determinants of health. By identifying specific ways in which different AF patients are managed, we can formulate targeted strategies to improve AF outcomes in underserved populations.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Taylor Tso - Research Fellow, University of California, Irvine

Duplicate of How to Work with Genomics Data (CRAM and IGV)_v8

This workspace and its notebooks neither ask nor answer any scientific questions. The purpose of this workspace is to serve as a tutorial which shows how to localize the All of Us (AoU) CRAM files individually or in groups via…

Scientific Questions Being Studied

This workspace and its notebooks neither ask nor answer any scientific questions. The purpose of this workspace is to serve as a tutorial which shows how to localize the All of Us (AoU) CRAM files individually or in groups via the CRAM manifest in addition to showing how to render the Integrated Genome Viewer (IGV) on the AoU workbench to explore the CRAM files.

Project Purpose(s)

  • Methods Development

Scientific Approaches

This workspace conducts no study and applies no scientific approaches. This workspace and its notebooks are tutorials for localizing AoU CRAM files with R commands and using IGV to explore their contents. The methods and tools employed include R system commands for localizing individual CRAM files, an R for loop for localizing multiple CRAM files by referencing the manifest, and the commands for importing and rendering IGV to view the localized CRAM files.

Anticipated Findings

There will be no findings or contribution to scientific knowledge as there is no study being conducted nor questions asked. Informal 'findings' include the usability of the aforementioned tools and AoU CRAM files on the All of Us workbench.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Gloria Riechi - Graduate Trainee, North Carolina A&T State University

Eating Disorder Sample

I am curious to know the updated sample size of participants with eating disorder diagnoses in the new version 8 of the All of Us data.

Scientific Questions Being Studied

I am curious to know the updated sample size of participants with eating disorder diagnoses in the new version 8 of the All of Us data.

Project Purpose(s)

  • Disease Focused Research (eating disorder)

Scientific Approaches

I plan to examine the eating disorder diagnostic codes in hopes of producing a sample consisting of the eating disorder participants in the version 8 dataset.

Anticipated Findings

I anticipate to find a higher sample of All of Us participants with eating disorder diagnoses in the version 8 dataset than I had found in the version 7 dataset.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Bronte Neal - Undergraduate Student, University of Hawaii at Manoa

BIMI 6400 Sample Workspace

We are not seeking to answer a scientific question with this workspace. Instead, this workspace will be used to show students from the BIMI 6400 course from Tulane University how to answer research questions using All of Us data.

Scientific Questions Being Studied

We are not seeking to answer a scientific question with this workspace. Instead, this workspace will be used to show students from the BIMI 6400 course from Tulane University how to answer research questions using All of Us data.

Project Purpose(s)

  • Educational

Scientific Approaches

Students will interact with sample procedures for accessing and analyzing data using statistical methods. This includes both demographic and genetic information since students will learn how to perform Genome Wide Association Studies and Phenome Wide Association Studies.

Anticipated Findings

We do not anticipate novel scientific findings from the work done in this workspaces since it is seeking an educational purpose.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Delayed Medical care

Currently few is known about the extent and reasons for delayed medical care in the United States. Estimation of the magnitude of delayed drivers would help to avoid inappropriate delay through designing an Intervention to solve the reasons. The current…

Scientific Questions Being Studied

Currently few is known about the extent and reasons for delayed medical care in the United States. Estimation of the magnitude of delayed drivers would help to avoid inappropriate delay through designing an Intervention to solve the reasons. The current study will us the All of Us to explore the burden of delayed medical care and underline factors. This will in turn help to prevent the consequences of delayed medical hospitalizations and emergency admissions.

Project Purpose(s)

  • Social / Behavioral

Scientific Approaches

A cross-sectional analysis of the All of Us Cohort will be conduct with the aim of estimating the proportion of delayed medical care and associated healthcare access and utilization survey section of the AoU dataset will be used to access the necessary variable or the primary outcomes. add sociodemographic data will be retrieved as needed. The data will be analyzed in r-programming in jupyter notebook using the general analysis

Anticipated Findings

The study findings will be used as an input to the health care policy makers and clinicians to improve continuity of care. on top of that, the finds a preliminary data for future studies to develop a robust model that can be scalable or transportable for clinical practice.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Duplicate of All of Us chronic conditions Fitbit analysis

Objective: To access differences in Fitbit measures across various chronic conditions, such as diabetes, Covid and long Covid, hypertension, heart diseases, and others. Our hypothesis is that individuals with chronic conditions will have poorer Fitbit measure health outcomes than those…

Scientific Questions Being Studied

Objective: To access differences in Fitbit measures across various chronic conditions, such as diabetes, Covid and long Covid, hypertension, heart diseases, and others. Our hypothesis is that individuals with chronic conditions will have poorer Fitbit measure health outcomes than those without chronic conditions.

Project Purpose(s)

  • Disease Focused Research (chronic conditions)
  • Population Health
  • Social / Behavioral

Scientific Approaches

Dataset: develop a dataset of Fitbit users with and without certain chronic conditions.
We will describe the sample in terms of sociodemographics. We will use the combination of feature engineering and machine learning techniques to assess differences between groups.

Anticipated Findings

We expect to find differences in heart rate and activity levels, and sleep across different disease groups as well as heterogeneities across sociodemographic groups. The findings will help develop passive characterization and predictive models of chronic conditions.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Citina Liang - Graduate Trainee, University of Southern California

Menopause v8

We aim to understand how menopause symptoms manifest across the population. How reliable is detection of menopause from retrospective data? What clusters of expression do menopause symptoms manifest in?

Scientific Questions Being Studied

We aim to understand how menopause symptoms manifest across the population. How reliable is detection of menopause from retrospective data? What clusters of expression do menopause symptoms manifest in?

Project Purpose(s)

  • Disease Focused Research (menopause)

Scientific Approaches

We will extract women ages 30-55 with menopause related codes. We will collect the top symptoms for these women and determine what types of clusters appear. We will compare this to the patient race and age of first menopause onset to determine if differences relate to these factors.

Anticipated Findings

We hypothesize that there are differences in menopause presentation across the population. We hypothesize that these differences can be detected in retrospective EHR data.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Sarika Pasumarthy - Undergraduate Student, University of California, Berkeley
  • Irene Chen - Graduate Trainee, Massachusetts Institute of Technology

BAP_lrmethyl

Environmental exposures imprint on our genomes by altering the epigenetic state of genes. These epigenetic events are associated with aging and disease risk. However, historical technological barriers and the data collection scale required for association studies has rendered it impossible…

Scientific Questions Being Studied

Environmental exposures imprint on our genomes by altering the epigenetic state of genes. These epigenetic events are associated with aging and disease risk. However, historical technological barriers and the data collection scale required for association studies has rendered it impossible to comprehensively link environmental exposures, epigenetic state and disease risk, limiting the development of diagnostics and discovery of therapeutic targets. The All of Us project is uniquely positioned to address this research gap due to the large size and diversity of human donors, rich metadata, and longitudinal follow up for disease outcomes.

Project Purpose(s)

  • Methods Development

Scientific Approaches

All of Us generated long-read DNA methylation data on bulk peripheral blood samples from 12,000 donors. We will benchmark existing tools and develop new tools to identify cell types within the methylation data. We will compare these tools with single-cell methods. Using cell type proportions, we will make associations with age, ancestry and diverse environmental exposures. Due to the large sample size and detailed donor metadata, this study provides an opportunity to comprehensively and mechanistically link environment, cell diversity, and propensity for disease. These insights will pave the way for precise diagnostic and therapeutic strategies, potentially leading to novel biomarkers and intervention targets.

Anticipated Findings

With the unique paired metadata and long-read DNA methylation data from the All of Us project, we aim to establish connections between lifestyle factors, disease phenotypes, and individual epigenetic states. Our work will enhance early disease diagnosis and facilitate the discovery of therapeutic targets.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Rui Yang - Research Fellow, Broad Institute

Machine learning approaches to Intersectional LGBTQ+ Mental Health (v8 CT)

Mental and behavioral health conditions are major concerns among LGBTQ+ people. Post-traumatic stress disorder and cooccurring mental health conditions (such as substance use and suicidal ideation) are more prevalent among LGBTQ+ people compared to cisgender and heterosexual people, likely due…

Scientific Questions Being Studied

Mental and behavioral health conditions are major concerns among LGBTQ+ people. Post-traumatic stress disorder and cooccurring mental health conditions (such as substance use and suicidal ideation) are more prevalent among LGBTQ+ people compared to cisgender and heterosexual people, likely due to the stress of stigma and discrimination, and lack of access to health services and affirming treatment modalities. Additional marginalization can further lead to compounding stressors and reduced resources, such as experiencing racism or sexism, which can increase the risk of discrimination and mental health effects. The research questions will include: what are the associations between LGBTQ-related discrimination and PTSD compared to their cisgender, heterosexual counterparts? How do these findings change when disaggregating by gender identity and sexual identity? And what is the influence of additional marginalization or area resources on mental conditions in this population?

Project Purpose(s)

  • Disease Focused Research (Psychiatric disorders (PTSD, GAD, MDD, SUD))
  • Population Health

Scientific Approaches

We will use data from the Basics survey, 3-digit zip code linked neighborhood variables (the area deprivation index and the fractional components), greenspace (measured by the normalized difference in vegetation index, NDVI), the Electronic Health Record data for diagnosis of PTSD, tobacco use disorder, alcohol use disorder, other substance use disorders, generalized anxiety disorder, and suicidal ideation or attempts, and the Everyday Discrimination Survey. We will use Cox proportional hazards analysis to account for the longitudinal nature of the survey, followed by multi-level analysis of individual heterogeneity and discriminatory accuracy (MAIDHA) analysis, causal forest analysis, mode-recursive subgroup partitioning (SIDES and GUIDE algorithm), and finally structural equation modeling to investigate intersectional effects of discrimination on mental health outcomes.

Anticipated Findings

The goal of the study is to identify which groups, based on their intersectional positionality as LGBTQ+ people, are at increased risk for mental health conditions. The study will report measured experiences of discrimination, and note the likely importance of unmeasured discrimination or lack of access associated that LGBTQ+ people face. It will also identify the influence of additional facets of marginalization and their contribution to the risk of mental health among some LGBTQ+ groups compared to others, to describe the complexity of participants' experiences. And most notably, the focus on discrimination and neighborhood resources will also identify potential modifiable factors that are correlated to the risk of mental health conditions, that may support additional resources.

Demographic Categories of Interest

  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Younga Lee - Research Fellow, Mass General Brigham

Collaborators:

  • Sophia Kim - Research Assistant, Mass General Brigham
  • Henri Garrison-Desany - Research Fellow, Harvard T. H. Chan School of Public Health

Duplicate of CYP2C19 how we metabolize SSRIs

We intend to study how well the population metabolizes different Selective serotonin reuptake inhibitors (SSRIs), tricyclic antidepressants and benzodiazepines using gene CYP2C19. This could be used to address and reduce the bridge in URB population using AOURP. This data could…

Scientific Questions Being Studied

We intend to study how well the population metabolizes different Selective serotonin reuptake inhibitors (SSRIs), tricyclic antidepressants and benzodiazepines using gene CYP2C19. This could be used to address and reduce the bridge in URB population using AOURP. This data could bring better treatment for our population while representing URBs and promoting precision medicine.
This will be used for prelaminar data.

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Educational
  • Drug Development
  • Ancestry

Scientific Approaches

This workbench aims to capture the variations in drug metabolism attributed to CYP2C19 genetic profiles across Puerto Rican Hispanics and Caucasian populations for the drug classes of Selective Serotonin Reuptake Inhibitors (SSRI), tricyclic antidepressants, and Benzodiazepines while also taking into account gender disparities. The dataset is to update the demographics data of the corresponding population and the CYP2C19 polymorphism with their clinical records detailing the drug doses, efficacy measure and side effect frequency.
Demographic information and lifestyle will also be collected to contextualize the genetic and pharmaceutical findings.
This will be used for prelaminar data.

Anticipated Findings

The anticipated findings are that our population are slow metabolizers. We hypothesis this due to diagnosis such as Non-Alcoholic Fat Liver (NAFL), and our diets/ lifestyles. This will contribute to represent URBs in precision medicine and pharmacogenetics. This will be used for prelaminar data.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Suaih Morales - Project Personnel, University of Puerto Rico Comprehensive Cancer Centre

Collaborators:

  • Ronnie Ramirez - Administrator, University of Puerto Rico Comprehensive Cancer Centre

BRCA1

AMD is the leading cause of irreversible progressive visual impairment in the elderly, with approximately 196 million people affected worldwide. This number is projected to rise to 288 million in the coming decades. (3,4) AMD primarily affects the macula, a…

Scientific Questions Being Studied

AMD is the leading cause of irreversible progressive visual impairment in the elderly, with approximately 196 million people affected worldwide. This number is projected to rise to 288 million in the coming decades. (3,4) AMD primarily affects the macula, a critical area of the retina responsible for central vision, which is essential for tasks such as reading, driving, and recognizing faces. Given the growing prevalence of AMD, identifying its risk factors is crucial for early detection and intervention.

Project Purpose(s)

  • Educational

Scientific Approaches

BRCA1 and BRCA2 (BRCA1/2) are essential tumor suppressor genes involved in DNA repair, genomic stability, and the regulation of oxidative stress responses. (1, 2)
Mutations in these genes impair DNA repair mechanisms, increasing oxidative stress, genomic instability, and cellular vulnerability. These effects may extend to ocular tissues, including the lens and retina, where the accumulation of oxidative damage can contribute to the development of cataracts and age-related macular degeneration (AMD).(12) Understanding how BRCA1/2 mutations affect DNA repair in retinal and lens cells may uncover new pathways to these age-related ocular diseases.

Anticipated Findings

Age is the most significant risk factor for AMD, with its prevalence increasing with each decade of life. (6,7)Genetic predispositions, particularly specific gene variants, also contribute to the risk, with Caucasians being more susceptible due to genetic and environmental factors. (8) Lifestyle factors, such as smoking, contribute to oxidative stress and vascular damage, further increasing AMD risk.(9)Systemic conditions like cardiovascular disease, obesity, and diabetes also elevate susceptibility by promoting inflammation and reducing retinal health.(10) Furthermore, women may face a slightly higher risk due to hormonal and longevity differences.(11)

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Ying Zheng - Other, University of California, Los Angeles

Mizzu

Our research aims to develop and validate computational methods for analyzing large-scale genomic and clinical data to better understand the genetic and environmental factors influencing disease risk and health outcomes. Specifically, we seek to identify genetic variants, transcriptomic signatures, and…

Scientific Questions Being Studied

Our research aims to develop and validate computational methods for analyzing large-scale genomic and clinical data to better understand the genetic and environmental factors influencing disease risk and health outcomes. Specifically, we seek to identify genetic variants, transcriptomic signatures, and population-level health disparities associated with common and complex diseases, such as cardiovascular disease, diabetes, and cancer.

This research is critical for advancing precision medicine, enabling more accurate risk prediction, early diagnosis, and targeted treatment strategies. By leveraging the All of Us dataset, which includes diverse population data, we aim to ensure that our findings are broadly applicable across different ancestry groups, ultimately reducing health disparities and improving personalized healthcare.

Project Purpose(s)

  • Methods Development

Scientific Approaches

Our study will utilize genomic, clinical, and demographic datasets from the All of Us Research Program to investigate genetic associations, disease risk factors, and health disparities. We will apply bioinformatics and statistical methods, including genome-wide association studies (GWAS), machine learning models, and survival analysis, to identify key genetic and environmental contributors to disease.

We will preprocess and analyze the data using Python, R, and high-performance computing (HPC) platforms. Specific tools include PLINK for genetic data processing, TensorFlow/PyTorch for deep learning models, and statistical frameworks like Scikit-learn and Bioconductor. We will also incorporate single-cell and bulk RNA-seq analysis, ancestry inference, and network-based approaches to improve disease prediction and treatment strategies.

Anticipated Findings

Our study anticipates identifying novel genetic variants, environmental factors, and gene-environment interactions that contribute to disease susceptibility and health disparities. By leveraging large-scale genomic, clinical, and demographic data from the All of Us Research Program, we expect to uncover associations between genetic markers and disease risk, improve ancestry-informed risk predictions, and refine biomarker discovery for precision medicine.

Our findings will contribute to scientific knowledge by enhancing the understanding of genetic predisposition to diseases, informing personalized treatment strategies, and improving health equity by identifying population-specific risk factors. Additionally, the development and validation of computational pipelines for multi-omics integration will provide scalable, reproducible tools for future biomedical research.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

Dupl. Investigation of gene-environment interactions for cardiometabolic disease

Cardiometabolic diseases are rapidly growing in prevalence worldwide. To facilitate better prevention and treatment strategies, it is critical to understand both the genetic and environmental components that underlie these diseases as well as the interactions between the genetic and environmental…

Scientific Questions Being Studied

Cardiometabolic diseases are rapidly growing in prevalence worldwide. To facilitate better prevention and treatment strategies, it is critical to understand both the genetic and environmental components that underlie these diseases as well as the interactions between the genetic and environmental factors. Here, we plan to investigate previously identified cardiometabolic disease risk variants for interactions with other cardiometabolic disease risk factors to further understand how these known variants impact cardiometabolic predispositions and pinpoint key environmental factors to address in clinical settings.

Project Purpose(s)

  • Disease Focused Research (cardiometabolic diseases)

Scientific Approaches

We will leverage the individual level genotype data and phenotype data of relevant traits/outcomes for cardiometabolic diseases. We will conduct association studies and construct genetic risk scores from risk variants, as well as perform targeted phenotype comparisons and genotype-by-trait interactions analyses.

Anticipated Findings

Through this research we aim to identify key gene-environment interactions for cardiometabolic diseases from specific variants or variants groups and environmental factors. This can provide potential insight for molecular mechanisms of cardiometabolic disease risk variants that may be informative for prevention and treatment developments, as well as for the prioritization of modifiable environmental risk factors.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Asha Kar - Graduate Trainee, University of California, Los Angeles

Duplicate of Beginner Intro to AoU Data and the Workbench

This workspace contains multiple notebooks that assess users' understanding of the workbench and OMOP. These notebooks are meant to help users check their knowledge not only on Python, R, and SQL, but also on the general data structure and data…

Scientific Questions Being Studied

This workspace contains multiple notebooks that assess users' understanding of the workbench and OMOP. These notebooks are meant to help users check their knowledge not only on Python, R, and SQL, but also on the general data structure and data model used by the All of Us program.

Project Purpose(s)

  • Educational

Scientific Approaches

There are no scientific approach used in this workspace because it is meant for educational purposes only. We will cover all aspects of OMOP, and hence will use most datasets available in the workbench.

Anticipated Findings

We do not anticipate to have any findings. Instead, we are educating people on the use of the workbench and the common data model OMOP used by the program.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Built Environment and Health

This study aims to explore the relationship between perceived neighborhood environments and physical activity levels, measured through Fitbit data, and cardiometabolic health outcomes in underrepresented populations. Additionally, we will integrate objective neighborhood environment data from external sources, focusing on active…

Scientific Questions Being Studied

This study aims to explore the relationship between perceived neighborhood environments and physical activity levels, measured through Fitbit data, and cardiometabolic health outcomes in underrepresented populations. Additionally, we will integrate objective neighborhood environment data from external sources, focusing on active transportation and urban planning policies. The scientific question is whether perceived and objective neighborhood environments influence physical activity and health outcomes in underrepresented populations. This is important for promoting equitable public health interventions and urban planning efforts that foster healthy living in vulnerable communities.

Project Purpose(s)

  • Population Health
  • Social / Behavioral

Scientific Approaches

We will utilize data from the All of Us program, including perceived neighborhood environment survey data, objective neighborhood features (e.g., active transportation infrastructure), Fitbit activity data, and health outcomes related to cardiometabolic health. Analytical methods will include multilevel modeling to examine the associations between neighborhood factors, physical activity, and health, accounting for potential confounders. Geographic information systems (GIS) will be used to assess objective neighborhood characteristics, while Fitbit data will provide continuous measures of physical activity.

Anticipated Findings

We anticipate finding that both perceived and objective neighborhood environments are significant predictors of physical activity and cardiometabolic health, especially among underrepresented populations. Individuals from these communities who live in objectively supportive neighborhoods and have positive perceptions of their environment may engage in higher levels of physical activity, leading to better health outcomes. This study will provide critical insights into the role of environmental factors in health disparities and inform policies aimed at creating equitable, health-promoting environments.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Qianxia Jiang - Early Career Tenure-track Researcher, University of Central Florida

Urticaria project

I am intending to study if there is a connection between urticaria and rheumatological disease. Although other studies have found an association of urticaria with SLE and other autoimmune conditions, it is important to understand this risk in order to…

Scientific Questions Being Studied

I am intending to study if there is a connection between urticaria and rheumatological disease. Although other studies have found an association of urticaria with SLE and other autoimmune conditions, it is important to understand this risk in order to avoid doing unnecessary labs and driving up medical costs for patients. I want to also look at specific demographics (age, gender, race, other medical conditions)to find other predictors that raise the likelihood of someone having an autoimmune condition.

Project Purpose(s)

  • Disease Focused Research (urticaria)

Scientific Approaches

I will perform a case control analysis of patients with urticaria and use those without urticaria that are matched by age, ethnicity and sex. Odds ratios will be calculated at 95%confidence intervals with the Chi-squared test.

Anticipated Findings

I anticipate to find that urticaria is associated with slightly elevated risk for autoimmune conditions, but is generally benign in most patients. I expect to find higher risk in women 18-40, as this population is most vulnerable to autoimmune disease. However, due to the association of urticaria with rheumatoid arthritis, women age 30-60 may also have an increased risk for rheumatological disease. Urticaria can be worked up unnecessarily; however, this study will help practitioners target the highest risk populations and having evidence based reasoning for diagnostic choices to avoid high medical costs.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Geography
  • Access to Care

Data Set Used

Registered Tier

Research Team

Owner:

Rare immune disease haplotypes (Dataset v8)

Several population-specific variants associated with various immune disorders (e.g., APECED) have been previously identified, conferring different levels of risk to developing the disorder(s), depending on an individual's genetic ancestral population. We would like to investigate rare variants associated with immune…

Scientific Questions Being Studied

Several population-specific variants associated with various immune disorders (e.g., APECED) have been previously identified, conferring different levels of risk to developing the disorder(s), depending on an individual's genetic ancestral population. We would like to investigate rare variants associated with immune diseases, including the haplotype context, to understand the complexity and prevalence of these variants/haplotypes in various populations.

Project Purpose(s)

  • Disease Focused Research (immune system disease)
  • Ancestry

Scientific Approaches

This study will use whole genome sequencing data to determine the frequency and prevalence of rare variants associated with immune disorders and the number of different haplotypes that exist for a particular variant by examining genomes of carriers of the variant identified. To do this we will employ haplotype construction and identity by descent methods. If sufficient numbers of each unique haplotype are identified, we will determine the age of the mutation on each mutation-bearing haplotype identified using the Gandolfo method (Genetics August 1, 2014 vol. 197 no. 4 1315-1327).

Anticipated Findings

Understanding the number of independent mutation-bearing haplotypes and ages of those haplotypes can give us a better perspective of the etiology and genetic history of immune diseases and the likelihood and risk of unrelated individuals having a child who develops the disorder or is a carrier. These findings may help us develop better diagnostic methods for rare immune disorders.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

  • Andrew Oler - Research Associate, National Institute of Allergy and Infectious Diseases (NIH - NIAID)
1 - 25 of 16047
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.