Research Projects Directory

Research Projects Directory

17,903 active projects

This information was updated 5/9/2025

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

427 projects have 'COVID' in the scientific questions being studied description
< Go back to All Projects View or enter a new search query

Summer 2025 Research

This research aims to help us understand how health behaviors changes before, during, and after the COVID-19 pandemic. It is important to understand how people’s health behaviors have changed over time, especially since the pandemic disrupted daily life in many…

Scientific Questions Being Studied

This research aims to help us understand how health behaviors changes before, during, and after the COVID-19 pandemic. It is important to understand how people’s health behaviors have changed over time, especially since the pandemic disrupted daily life in many ways, affecting physical activity, diet, and mental wellbeing. However, it is still unclear whether people have returned to their pre-pandemic habits or if some changes have lasted longer. Using data from the All of Us Research Hub, this research will explore how different groups adapted to these challenges and aims to identify key trends that could help public health experts better understand recovery patterns and long-term effects.

Project Purpose(s)

  • Educational

Scientific Approaches

Along with researching previous studies and published works on studies similar to this one, this study intends to use machine learning techniques to identify existing patterns in the data. The datasets will primarily consist of surveys done within the years 2018-2024 on physical health, mental health, and lifestyle changes. This study will also examine whether different groups, based on factors like age, gender, or income, experienced different challenges and recoveries.

Anticipated Findings

We anticipate that our findings from this study will show significant changes in physical and mental health during and after the COVID-19 pandemic compared to before it. We also anticipate that different groups of people will have different experiences and results. This study hopes to uncover findings that could help design better programs to support people's well-being in the future and share these findings in a way that makes them useful for public health planning.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Summer 2025 Research

This research aims to help us understand how health behaviors changes before, during, and after the COVID-19 pandemic. It is important to understand how people’s health behaviors have changed over time, especially since the pandemic disrupted daily life in many…

Scientific Questions Being Studied

This research aims to help us understand how health behaviors changes before, during, and after the COVID-19 pandemic. It is important to understand how people’s health behaviors have changed over time, especially since the pandemic disrupted daily life in many ways, affecting physical activity, diet, and mental wellbeing. However, it is still unclear whether people have returned to their pre-pandemic habits or if some changes have lasted longer. Using data from the All of Us Research Hub, this research will explore how different groups adapted to these challenges and aims to identify key trends that could help public health experts better understand recovery patterns and long-term effects.

Project Purpose(s)

  • Educational

Scientific Approaches

Along with researching previous studies and published works on studies similar to this one, this study intends to use machine learning techniques to identify existing patterns in the data. The datasets will primarily consist of surveys done within the years 2018-2024 on physical health, mental health, and lifestyle changes. This study will also examine whether different groups, based on factors like age, gender, or income, experienced different challenges and recoveries.

Anticipated Findings

We anticipate that our findings from this study will show significant changes in physical and mental health during and after the COVID-19 pandemic compared to before it. We also anticipate that different groups of people will have different experiences and results. This study hopes to uncover findings that could help design better programs to support people's well-being in the future and share these findings in a way that makes them useful for public health planning.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Olivia King - Undergraduate Student, College of Charleston

Replication and validation of combinatorial genetic risk factors for long COVID

Long COVID is a debilitating chronic condition that has affected over 100 million people globally. Despite considerable global research, traditional genetic studies have identified a single gene linked to long COVID, with little insight into the mechanisms underlying this complex…

Scientific Questions Being Studied

Long COVID is a debilitating chronic condition that has affected over 100 million people globally. Despite considerable global research, traditional genetic studies have identified a single gene linked to long COVID, with little insight into the mechanisms underlying this complex heterogeneous disease. Using PrecisionLife’s unique combinatorial approach to analyzing complex, chronic diseases, Taylor et al. (2023) identified 73 genetic associations with long COVID, including mechanistic differences between different patient subgroups. These genetic associations are reflected in combinatorial disease signatures, i.e., combinations of SNP genotypes that are significantly over- or under-enriched in long COVID patients. This study aims to replicate and validate those signatures in a diverse patient population. Validated signatures will then be used as the basis for a clinical decision support tool that can be used to stratify patients based on genetic risk and mechanistic subcategorization.

Project Purpose(s)

  • Disease Focused Research (Long COVID)
  • Methods Development
  • Ancestry

Scientific Approaches

For each Long COVID disease signature from Taylor et al. (2023), we will generate summary statistics (e.g., # cases & controls, odds ratio, p-value) to evaluate the overall degree of replication in a patient cohort comprised of long COVID patients and healthy controls. Signatures with odds ratio <1 will be flagged as non-replicating. We will also test whether the count of disease signatures possessed by each patient is significantly associated with case-control status. This test will be repeated in ancestry-specific cohorts to identify potential challenges for health equity.
For each signature, we will evaluate the contribution of each component SNP to disease risk by comparing the odds ratio for patients with the full signature to the odds ratio for patients with the broader signature excluding the focal SNP. SNPs will be removed from the signature when the odds ratio of the latter exceeds the former. This refinement process will be repeated using a 5-fold cross validation approach.

Anticipated Findings

The main output of this study will be a set of combinatorial disease signatures that are associated with elevated risk of Long COVID in multiple datasets. Each signature will be paired with summary statistics (e.g., odds ratio, p-value), allowing us to assess the identify and annotate signatures that are individually significant. We expect to further demonstrate that a risk score based on the cumulative effects of refined signatures is significantly correlated with prevalence of long COVID and that this correlation is significant in all broad ancestry groups and not just patients with European ancestry.

Validated signatures will be further clustered based on shared mechanistic hypotheses as identified in the Taylor et al. (2023) manuscript. We expect to demonstrate that these signatures can be used to stratify the population, opening potential for precision medicine-based treatment of long COVID.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Duplicate of Analysis of COVID-19 clinical outcomes and imputed HLA

T cells, specifically cytotoxic CD8+ T cells, are critical for the clearance of intracellular viral pathogens like SARS-CoV-2. Despite these obvious benefits, there is limited research on T cell responses to SARS-CoV-2, mainly due to the complexity of human T…

Scientific Questions Being Studied

T cells, specifically cytotoxic CD8+ T cells, are critical for the clearance of intracellular viral pathogens like SARS-CoV-2. Despite these obvious benefits, there is limited research on T cell responses to SARS-CoV-2, mainly due to the complexity of human T cell antigen processing and presentation. Recent literature has postulated the considerable variability in host HLA genetics as an explanation for the differential clinical severity of COVID-19 between individuals. Thus, select HLA alleles may have more (or less) propensity to be strongly affected by SARS-CoV-2 and subsequent variants. Our central hypothesis is that differences in CD8+ T-cell epitope recognition mediated by host MHC genes may affect the differential clinical severity of COVID-19 and the risk and clinical presentation of Post-acute sequelae of SARS-CoV-2 infection (PASC) in the All of Us national clinical dataset.

Project Purpose(s)

  • Disease Focused Research (COVID-19)
  • Population Health
  • Methods Development
  • Ancestry

Scientific Approaches

Utilizing electronic health records and genomic information provided by the All of Us dataset, we propose a retrospective cohort study to investigate whether certain HLA alleles are associated with severe or mild COVID-19 infection. This bioinformatics study will utilize the All of Us Controlled Tier dataset in order to access anonymized electronic health records and short-read whole genome of the MHC Class I and II regions within the All of Us Researcher Workbench, a secure cloud-based resource with statistical analysis software available. HLA genotyping will be imputed using the “R” package HIBAG/HISAT on All of Us provided Single Nucleotide Polymorphisms array and short read Whole Genome Sequencing data. We plan to compare All of Us determined alleles and COVID clinical outcomes against previously determined HLA alleles estimated to have protected against or at risk of severe COVID-19 infection.

Anticipated Findings

We anticipate confirming previously published as well as novel associations between HLA alleles and COVID-19 disease severity. We hope these findings will help contribute to identifying patients who are at greater risk of experiencing severe infection, such as those in need of organ transplantations, cancer patients, and demographics already at increased risk.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Grace Kim - Graduate Trainee, Louisiana State University Health Sciences Center, New Orleans

Collaborators:

  • San Chu - Project Personnel, Louisiana State University Pennington Biomedical Research Center
  • Nayane Silva - Student, Louisiana State University Health Sciences Center, New Orleans

Re-purposing Computable Phenotypes for Public Health Disease Surveillance

This study proposes a novel application for a well-established method of cohort identification in biomedical research, known as computable phenotyping, for EHR-based public health surveillance of chronic diseases. At the core of the proposed research study is the repurposing of…

Scientific Questions Being Studied

This study proposes a novel application for a well-established method of cohort identification in biomedical research, known as computable phenotyping, for EHR-based public health surveillance of chronic diseases. At the core of the proposed research study is the repurposing of already developed and validated EHR-based computable phenotyping algorithms for disease surveillance while assessing those algorithms’ transferability or portability to two national data repositories, All of Us Research Program and National COVID Cohort Collaborative (N3C), and establishing the concordance between repurposed computable phenotypes within and across two distinct data networks. The outcome measure for evaluating computable phenotype performance will be disease prevalence estimates.

Project Purpose(s)

  • Disease Focused Research (hypertension and depression)
  • Population Health
  • Educational
  • Methods Development
  • Control Set

Scientific Approaches

EHR data will be used to apply algorithms designed for patient cohort identification from a number of large research networks (eMERGE/PheKB, PCORnet, OHDSI and MDPHnet ) to ascertain disease prevalence estimates for a number of chronic diseases and conditions. Performance of re-purposed algorithms will be compared within All of Us and between All of Us and N3C. Prevalence estimates will be validated against those from the most recent traditional national surveillance surveys (i.e. American Community Survey, NHANES, BRFSS).

Anticipated Findings

Using computable phenotyping algorithms for disease surveillance is a novel application. Re-using already developed and validated algorithms for disease surveillance is also a novel approach which would maximize the utility of resources spent in developing and validating each algorithm.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Combinatorial risk factors for ME/CFS and long COVID

Despite considerable research into ME/CFS and intense global efforts to understand the biological mechanisms of long COVID, traditional genetic studies have not reported replicable genetic findings for either, with little insight into the mechanisms underlying these complex heterogeneous diseases. PrecisionLife…

Scientific Questions Being Studied

Despite considerable research into ME/CFS and intense global efforts to understand the biological mechanisms of long COVID, traditional genetic studies have not reported replicable genetic findings for either, with little insight into the mechanisms underlying these complex heterogeneous diseases.
PrecisionLife have developed a unique combinatorial analytics approach and used it to identify the first replicable disease signatures for both ME/CFS and long COVID. These disease signatures comprise of combinations of SNP genotypes that are significantly over- or under-enriched in ME/CFS and/or long COVID patients.
This study aims to analyze, replicate and validate these signatures in a diverse patient population. Validated signatures will then be used as the basis for a health care tool that can be used to stratify patients based on genetic risk and mechanistic subcategorization.

Project Purpose(s)

  • Disease Focused Research (Post-viral fatigue syndrome)
  • Methods Development
  • Ancestry

Scientific Approaches

For each disease signature, we will generate summary statistics (e.g., # cases, # controls, odds ratio, p-value) to evaluate the overall degree of replication in All of Us cohorts comprised of ME/CFS or long COVID patients and healthy controls. Signatures with odds ratio <1 will be flagged as “non-replicating”. We will also test whether the count of disease signatures possessed by each patient is significantly associated with case-control status. This test will be repeated in ancestry-specific cohorts.
For each signature, we will also evaluate the contribution of each component SNP-genotype to disease risk by comparing the odds ratio for patients with the full signature to the odds ratio for patients with the broader signature excluding the focal SNP. SNP-genotypes will be removed from the signature when the odds ratio of the latter exceeds the former. This replication/refinement process will be repeated using a 5-fold cross validation approach.

Anticipated Findings

The main output of this study will be a set of combinatorial disease signatures that have been demonstrated to be associated with elevated risk of ME/CFS or long COVID in multiple datasets. Each signature will be paired with an odds ratio and p-value, allowing us to assess the identify signatures that are individually significant. We expect to further demonstrate that a risk score based on the cumulative effects of refined signatures is significantly correlated with ME/CFS or long COVID and that this correlation is significant in all broad ancestry groups (e.g., African-American, Hispanic, Asian) and not just patients with white European ancestry.
Validated signatures will be further clustered based on shared mechanistic hypotheses and we will overlay phenotypic data in each of the patient subgroups to assess any differences. We expect to demonstrate that these signatures can be used to stratify the population, opening potential for precision medicine based treatments.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Sayoni Das - Senior Researcher, PrecisionLife Ltd.

Collaborators:

  • Matthew Pearson - Other, PrecisionLife Ltd.
  • Marianna Sanna - Other, PrecisionLife Ltd.
  • Jason Sardell - Other, PrecisionLife Ltd.

ADRD/MCI and COVID-19 Vaccination

Research Aim 1: To determine the difference in COVID-19 vaccination rates (at least one dose, two doses, and boosters or three full doses) between ADRD/MCI individuals and those without ADRD/MCI. Research Aim 2: To assess the variation in adverse reactions…

Scientific Questions Being Studied

Research Aim 1: To determine the difference in COVID-19 vaccination rates (at least one dose, two doses, and boosters or three full doses) between ADRD/MCI individuals and those without ADRD/MCI.
Research Aim 2: To assess the variation in adverse reactions such as swelling, tiredness, muscle pain, chills, fever, following the COVID-19 vaccination in individuals with ADRD/MCI compared to those without ADRD/MCI.
Research Aim 3: To investigate the influence of social determinants of health on COVID-19 vaccination rates and vaccine hesitancy in individuals with ADRD/MCI.

Project Purpose(s)

  • Population Health
  • Social / Behavioral

Scientific Approaches

The data source for this study protocol will be the All of Us (AoU) Researcher Workbench, We established a workspace named "ADRD/MCI and COVID-19 Vaccination" within the AoU Researcher Workbench.

A retrospective cross-sectional study will be conducted utilizing data from the All of Us (AoU) Researcher Workbench. Relevant data fields are extracted from sources including demographic information, COVID-19 Vaccine Survey, Basic Survey, Health Access & Utilization, Social Determinants of Health, and Electronic Health Record (EHR) data. Data on vaccination, adverse reactions and vaccine hesitancy will be collected through COVID-19 vaccine survey questionnaires. Propensity score matching and binary logistic regression will be applied to assess the vaccination rates and vaccine hesitancy, while controlling for demographic characteristics and social determinants of health factors.

Anticipated Findings

The proposed study will contribute to scientific knowledge in several ways: First, it will be a comprehensive analysis of vaccination coverage for individuals with ADRD/MCI by examining the first, second, and booster shots. Second, this proposed study will assess adverse reactions to COVID-19 vaccines among individuals with ADRD/MCI and those without. This information can offer valuable data on the safety and tolerability of vaccines in vulnerable populations. Healthcare professionals can then make more informed decisions about vaccine administration and monitoring. Third, the proposed study will integrate social determinants of health into its analysis. recognizing the potential influence of social determinants of health on vaccination rates is crucial.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

T2D and CVD Outcomes Among Black and Hispanic Populations

The specific scientific research question is: How does type 2 diabetes (T2D) and cardiovascular disease (CVD) affect Black and Hispanic populations in the United States and how does it differ between those with and without COVID-19 and in different regions…

Scientific Questions Being Studied

The specific scientific research question is: How does type 2 diabetes (T2D) and cardiovascular disease (CVD) affect Black and Hispanic populations in the United States and how does it differ between those with and without COVID-19 and in different regions of the United States? These questions will be assessed by investigating:
a. Is there greater prevalence of T2D and CVD outcomes among Black and Hispanic populations with COVID-19 and in different regions of the United States?
b. Are there genetic susceptibilities to T2D, CVD, and COVID-19 among Black and Hispanic populations that impact outcomes and vary by geographic region?

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Ancestry

Scientific Approaches

This study will include a cohort of Blacks, Hispanics, and non-Hispanic White participants aged 18 and older. Data included will include data from survey, electronic health records, physical measurements, and genetic data. Descriptive statistics will be calculated, and logistic regression will be used to assess odds ratios for associations. We will analyze common and rare variants from whole-genome sequencing data to control for genetic susceptibility to T2D and CVD.

Anticipated Findings

The anticipated findings from this study will provide a greater insight into the magnitude of the public health burden of COVID-19, T2D,a nd CVD among Blacks and Hispanics and will illustrate the differences in patient profiles and outcomes. Our goal is to make a significant impact in addressing T2D and CVD disparities among Blacks and Hispanics by quantifying the disproportionate effect of COVID-19 on Black and Hispanic T2D and CVD outcomes in different regions of the United States, thereby informing the need for more heath services among these populations The All of Us Research Program Workbench can provide a unique nationwide and U.S. State comparison on these associations.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Access to Care

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Jason Karnes - Early Career Tenure-track Researcher, University of Arizona
  • Grace Leito - Graduate Trainee, University of Arizona
  • Anthony Vicenti - Project Personnel, University of Arizona
  • shuai yang - Graduate Trainee, University of Arizona

DB8 of CRS study

What are some of the significant characteristics of Covid 19 patients who lost sense of smell. Why important: to understand the potential cause of the loss of smell for Covid 19 Patients.

Scientific Questions Being Studied

What are some of the significant characteristics of Covid 19 patients who lost sense of smell.
Why important: to understand the potential cause of the loss of smell for Covid 19 Patients.

Project Purpose(s)

  • Disease Focused Research (covid 19)
  • Methods Development

Scientific Approaches

Build ML models to discover the potentail patterns for the Covid 19 patients who had smell lose

Anticipated Findings

Find significant features that can predict the smell lose for Covid 19 patients and potentially guide the recovery process of the patients

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Renjie Hu - Early Career Tenure-track Researcher, University of Houston

Collaborators:

  • Zain Mehdi - Graduate Trainee, Houston Methodist Research Institute
  • Wajih Hassan Raza - Graduate Trainee, University of Houston
  • Tania Banerjee - Early Career Tenure-track Researcher, University of Houston
  • Roshan Dongre - Graduate Trainee, Houston Methodist Research Institute
  • Khoa Nguyen - Student, University of Houston
  • Natalia Freire - Undergraduate Student, University of Houston
  • Najm Khan - Graduate Trainee, Rutgers, The State University of New Jersey
  • Meher Gajula - Graduate Trainee, University of Houston
  • Likhitha Reddy Kesara - Graduate Trainee, University of Houston
  • Koyal Ansingkar - Graduate Trainee, Houston Methodist Research Institute
  • Jagan Mohan Reddy Dwarampudi - Graduate Trainee, University of Houston
  • Faizaan Khan - Graduate Trainee, Houston Methodist Research Institute
  • Ethan Hoang - Undergraduate Student, University of Houston
  • Ying Lin - Early Career Tenure-track Researcher, University of Houston
  • Sicong Chang - Graduate Trainee, University of Houston
  • Aatin Dhanda - Graduate Trainee, Rutgers, The State University of New Jersey
  • Aakash Agarwal - Graduate Trainee, University of Houston

Duplicate of Long_Covid_CTDv8

Our primary goal is to understand correlation between long covid and major depressive disorder. This will provide hypotheses guiding clinical and research interventions focused on depression in post-covid patients.

Scientific Questions Being Studied

Our primary goal is to understand correlation between long covid and major depressive disorder. This will provide hypotheses guiding clinical and research interventions focused on depression in post-covid patients.

Project Purpose(s)

  • Disease Focused Research (Long COVID)

Scientific Approaches

We will examine the relationship between major depressive disorder and long covid diagnosis. We will use the EHR-curated diagnoses, laboratory values as well as machine learning algorithms to accurately predict phenotypes. Then we will use statistical methods to investigate correlation between these diagnosis.

Anticipated Findings

We expect to find a higher prevalence of major depressive disorder in patients with long covid. We also expect to find lower levels of activity and less regular sleep with these diagnosis which would motivate studies/interventions to reduce these health disparities.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Christopher Chen - Undergraduate Student, Vanderbilt University

CAHB

What long-term trends in physical activity have emerged in post-COVID health data, and how do these trends correlate with longevity?

Scientific Questions Being Studied

What long-term trends in physical activity have emerged in post-COVID health data, and how do these trends correlate with longevity?

Project Purpose(s)

  • Population Health
  • Social / Behavioral

Scientific Approaches

Longitudinal Analysis: Examine changes in physical activity patterns over time and their impact on longevity.
Statistical Software: R and Python for data analysis, leveraging libraries such as Pandas for data manipulation and Matplotlib/Seaborn for visualization.

Anticipated Findings

The analysis is likely to identify significant trends in physical activity post-COVID, detailing how these trends correlate with longevity and health outcomes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Ashton Carter - Undergraduate Student, Chapman University

Loneliness and Older Adults

Social isolation and loneliness have had significant health consequences for older adults throughout the COVID-19 pandemic. However, little is known about the prevalence and correlates of loneliness among older adults, especially among those from underrepresented demographics..

Scientific Questions Being Studied

Social isolation and loneliness have had significant health consequences for older adults throughout the COVID-19 pandemic. However, little is known about the prevalence and correlates of loneliness among older adults, especially among those from underrepresented demographics..

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Population Health
  • Social / Behavioral

Scientific Approaches

Descriptive statistics will be used to characterize the prevalence of loneliness (using the UCLA Loneliness Scale Short Form) among older adults overall and by each sociodemographic characteristic. Logistic regressions will be used to estimate the associations between loneliness and depression and suicidal ideation (using PHQ-9 data), adjusting for age, sex, race, ethnicity, and socioeconomic factors.

Anticipated Findings

We believe that a significant number of older adults will have high scores of loneliness throughout the COVID-19 pandemic, especially among socioeconomically disadvantaged groups. We also hypothesize that the odds of self-reporting high depression as well as suicidal ideation will be elevated among those reporting high levels of loneliness compared to those not reporting high levels of loneliness.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Kevin Yang - Research Fellow, University of California, San Diego
  • Jaclyn Bergstrom - Project Personnel, University of California, San Diego

Collaborators:

  • Khusnnora Satybaldiyeva - Graduate Trainee, University of California, San Diego

Discrimination, Depression, Suicide

As part of a grad school course, we plan to look at the association of everyday discrimination during COVID with depressive and suicidal symptoms.

Scientific Questions Being Studied

As part of a grad school course, we plan to look at the association of everyday discrimination during COVID with depressive and suicidal symptoms.

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Educational

Scientific Approaches

We will use the COVID-19 Participant Experience (COPE) survey and the Patient Health Questionnaire (PHQ-9). We will conduct mixed effects modeling and lagged analyses. We may also conduct mediation analyses.

Anticipated Findings

We anticipate that people who experience higher levels of discrimination will be more likely to have increased symptoms of depression and suicidality.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Registered Tier

Research Team

Owner:

  • Sarah Lee - Graduate Trainee, University of Massachusetts Medical School

COVID + Autoimmune Disease

We are hoping to see if there is any association between COVID infection and/or vaccination and autoimmune connective tissue diseases such as scleroderma and dermatomyositis.

Scientific Questions Being Studied

We are hoping to see if there is any association between COVID infection and/or vaccination and autoimmune connective tissue diseases such as scleroderma and dermatomyositis.

Project Purpose(s)

  • Disease Focused Research (Dermatomyositis, scleroderma)

Scientific Approaches

We plan to search the database for people with COVID infection and/or vaccination and evaluate if there are any associations with dermatomyositis and scleroderma.

Anticipated Findings

We think there may be a positive correlation. This would be helpful to know for patients with these diseases.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Sophia Manduca - Graduate Trainee, New York University, Grossman School of Medicine

Collaborators:

  • Steven Friedman - Research Associate, New York University, Grossman School of Medicine
  • Soutrik Mandal - Other, New York University, Grossman School of Medicine
  • Jill Shah - Graduate Trainee, New York University, Grossman School of Medicine

COVID-19, Sleep, PAL and Lung Function

COVID-19 presents with scarring of flung tissue often resulting in reduced lung compliance, which could compromise the ventilation and quality of life. This study aims to evaluate the association between the COVID-19 profile and selected clinical and functional parameters (such…

Scientific Questions Being Studied

COVID-19 presents with scarring of flung tissue often resulting in reduced lung compliance, which could compromise the ventilation and quality of life. This study aims to evaluate the association between the COVID-19 profile and selected clinical and functional parameters (such as sleep quality, fatigue, cardiorespiratory fitness, and physical activity level). It will also explore the distribution of these outcomes by race and gender.

Project Purpose(s)

  • Disease Focused Research (COVID-19)
  • Educational

Scientific Approaches

COVID-19 cohort using clinical and functional datasets.
The research method is a cross-sectional design using secondary data analysis of the All of Us dataset.

Research question: What is the association between COVID-19 Profile (previous infection, severity, vaccination status) and lung function, PAL, sleep, and quality of life; and if this association is different by gender or race?

Anticipated Findings

Hypothesis: Previous history of severe COVID-19 infection could be associated with poor quality of life, reduced PAL, and poor sleep quality; and this association could be different by gender and race.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Tafadzwa Machipisa - Research Fellow, University of Pennsylvania
  • Joseph Aneke - Early Career Tenure-track Researcher, Hampton University
  • Graham Chakafana - Early Career Tenure-track Researcher, Hampton University

Risk of Life-Threatening Infections Secondary to COVID-19 Diagnosis

We will investigate the risk of life-threatening infections secondary to COVID-19 diagnosis. Such infections may include sepsis, endocarditis, meningitis, encephalitis, and other central nervous system infections in patients who experienced a wide range of COVID-19 severity, from mild symptoms to…

Scientific Questions Being Studied

We will investigate the risk of life-threatening infections secondary to COVID-19 diagnosis. Such infections may include sepsis, endocarditis, meningitis, encephalitis, and other central nervous system infections in patients who experienced a wide range of COVID-19 severity, from mild symptoms to life-threatening hospitalization. We will include participants in the database who contracted COVID-19 and evaluate their risk of secondary infection over a follow-up period of three months and beyond, a previously understudied temporal relationship. Additionally, we intend to explore participant and public perspectives on COVID-19 for potential associations with secondary infection risk.
This study will be the first investigation on this question using a nationally scaled cohort in the United States, the first to evaluate risk over a follow-up period longer than three months, and an opportunity to contribute to risk findings reported in other health databases.

Project Purpose(s)

  • Disease Focused Research (COVID-19, secondary infections to COVID-19)
  • Social / Behavioral

Scientific Approaches

We will use the All of Us database to build cohorts via its built-in software, identifying individuals with a positive COVID-19 test or COVID-related hospitalization and matching them to individuals without a COVID-19 diagnosis based on factors such as age, sex, and comorbidities. Risk will likely be assessed using hazard ratios through Cox regression models or similar methods, with significant attention given to accounting for confounding factors such as co-infections and other causes of life-threatening infections.

Anticipated Findings

We anticipate that individuals with a history of COVID-19 will have an increased risk of life-threatening secondary infections compared to those without prior COVID-19. We also hypothesize that greater COVID-19 severity correlates with a higher risk of secondary infection. Such findings would confirm previous investigations on this topic. This study will be the first to provide extensive follow-up on post-COVID-19 infection risk within a nationally representative U.S. cohort. We expect to detect potential risk factors influencing secondary infection susceptibility, such as participant demographics, disease severity, comorbidities, and social determinants of health. Our findings may support prior studies using other databases by investigating previously unstudied COVID-19 patients/cases or reveal novel trends in secondary infections, contributing to a broader understanding of post-COVID-19 complications.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Yesol Sapozhnikov - Research Fellow, University of Idaho
  • Jonathan Moore - Research Fellow, University of Idaho

COVID-19 vaccine and HIV

People living with HIV appears to have an elevated risk of the severe coronavirus 19 (COVID-19) outcomes, with a poorer prognosis. It is imperative to achieve high COVID-19 vaccination coverage rates in this group. This project aims to understand the…

Scientific Questions Being Studied

People living with HIV appears to have an elevated risk of the severe coronavirus 19 (COVID-19) outcomes, with a poorer prognosis. It is imperative to achieve high COVID-19 vaccination coverage rates in this group. This project aims to understand the COVID-19 vaccine hesitancy and uptake among people living with HIV comparing to people living without HIV. Understanding the determinants of vaccine hesitancy among people living with HIV and making tailored measures to alleviate hesitancy would help improve the coverage of COVID-19 vaccination in this population.

Project Purpose(s)

  • Disease Focused Research (Human immunodeficiency virus infectious disease, COVID-19, vaccine)
  • Population Health
  • Social / Behavioral

Scientific Approaches

We will build HIV and COVID-19 datasets using data all different domains of EHR and surveys. We will use R or Python to program and coding the datasets. The statistical methods involve descriptive statistics (e.g., chi-square, t-test), regression models (e.g., logistic regression, Cox proportional hazard modelling), advanced matching methods (e.g., propensity score matching) and other advanced statistical methods.

Anticipated Findings

Understanding whether and how HIV population have a different COVID-19 vaccine hesitancy and/or uptake will inform tailored messaging to build vaccine confidence, address questions about vaccine benefits, and support informed vaccination decision-making to promote COVID-19 vaccine uptake among this population, particularly underrepresented HIV population.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

  • Xueying YANG - Research Fellow, University of South Carolina

Collaborators:

  • Ruilie Cai - Graduate Trainee, University of South Carolina
  • Jiajia Zhang - Late Career Tenured Researcher, University of South Carolina

tabpfn_controlled

We intend to investigate how genetic data would influence long-term COVID prediction. We hypothesise that genetic data might improve long-term COVID prediction performance.

Scientific Questions Being Studied

We intend to investigate how genetic data would influence long-term COVID prediction. We hypothesise that genetic data might improve long-term COVID prediction performance.

Project Purpose(s)

  • Disease Focused Research (Long COVID)

Scientific Approaches

We included all patients with EHR data and at least one of the mobile, survey, or genetic data. We will use machine learning models for such predictions.

Anticipated Findings

We hypothesize that genetic data would improve model performance. We will identify important features that are related to long COVID.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Srushti Gangireddy - Project Personnel, Vanderbilt University Medical Center
  • Christopher Guardo - Project Personnel, Vanderbilt University Medical Center

Spring 2025 of FC Gamma Receptor (IIA) Mutations and HIV

The intent of this project is to study how mutations of the Fc Gamma Receptor (IIA) affect an individual's susceptibility to HIV. Since Fc Gamma Receptors have a hand in the humeral and innate immune response, polymorphisms have been associated…

Scientific Questions Being Studied

The intent of this project is to study how mutations of the Fc Gamma Receptor (IIA) affect an individual's susceptibility to HIV. Since Fc Gamma Receptors have a hand in the humeral and innate immune response, polymorphisms have been associated with susceptibility to certain conditions and illnesses, such as Covid-19 and specific cancers. In this case, we wish to determine if there is a relationship between polymorphisms and susceptibility HIV, a global health issue that has claimed the lives of millions.

Project Purpose(s)

  • Educational

Scientific Approaches

We plan to use data sets from the human genome database to analyze the prevalence of HIV in individuals with FcγRIIA polymorphism. The tools we plan to use are simple data analysis strategies and R programming to conduct this study.

Anticipated Findings

We are anticipating to find that there is a possible relationship between FcγRIIA polymorphisms and susceptibility to HIV, given that it has links to other chronic illnesses and increased susceptibility to infection. Our findings could potentially alter the way we think about conditions such as HIV, as well as disease prevention itself.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Yanett Alegria - Undergraduate Student, Arizona State University
  • Mohga Talib - Undergraduate Student, Arizona State University
  • Gerardo Rodriguez - Undergraduate Student, Arizona State University

Activity and sleep differences in wearable data

We wish to characterize sleep, activity, and other wearable data within the All of Us Research Program Cohort, to better understand population averages and differences across demographic groups. This may include time series analyses to determine how different life events…

Scientific Questions Being Studied

We wish to characterize sleep, activity, and other wearable data within the All of Us Research Program Cohort, to better understand population averages and differences across demographic groups. This may include time series analyses to determine how different life events (e.g. the COVID-19 pandemic, an individual diagnosis, seasonality) affect wearable data.

Project Purpose(s)

  • Population Health

Scientific Approaches

We will begin with basic characterization of the distributions of wearable data across different demographic groups and expand as time and billing credits allow.

Anticipated Findings

To our knowledge, the WEAR study includes the largest study-sponsored distribution of wearable devices to participants. This presents a unique opportunity to understand the data of people who would not otherwise purchase the devices for themselves.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

Covid + DM + Scleroderma

We are hoping to see if there is any association between COVID infection and/or vaccination and autoimmune connective tissue diseases such as scleroderma and dermatomyositis.

Scientific Questions Being Studied

We are hoping to see if there is any association between COVID infection and/or vaccination and autoimmune connective tissue diseases such as scleroderma and dermatomyositis.

Project Purpose(s)

  • Disease Focused Research (Dermatomyositis, scleroderma)

Scientific Approaches

We plan to search the database for people with COVID infection and/or vaccination and evaluate if there are any associations with dermatomyositis and scleroderma.

Anticipated Findings

We think there may be a positive correlation. This would be helpful to know for patients with these diseases.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Sophia Manduca - Graduate Trainee, New York University, Grossman School of Medicine

Collaborators:

  • Kaitlin Martins - Research Assistant, New York University, Grossman School of Medicine

Impact of Hominin-specific variants

Modern humans differ significantly from our closest evolutionary relatives, such as Neandertals and Denisovans. Although genomic changes unique to modern or archaic humans have been identified, their functional implications remain largely unknown. We focus on changes that may affect neurodevelopment,…

Scientific Questions Being Studied

Modern humans differ significantly from our closest evolutionary relatives, such as Neandertals and Denisovans. Although genomic changes unique to modern or archaic humans have been identified, their functional implications remain largely unknown. We focus on changes that may affect neurodevelopment, metabolism, and behavior to identify changes that result in phenotypes that differ between modern and extinct humans. Moreover, studies have shown that in all non-African humans nowadays, around 2% of our genome is contributed by Neandertals, and many of these archaic variants are associated with disease-related phenotypes, such as pain sensation or severe COVID. Hence, studying these modern or archaic hominin-specific changes will help us to understand not only the biological roots of the difference between us and Neanderthals but also the underlying mechanism of disease-related phenotypes.

Project Purpose(s)

  • Ancestry

Scientific Approaches

We will use large-scale genomic datasets, such as All of Us, to investigate the persistence and effects of archaic variants - those inherited from Denisovans, Neandertals, or common ancestral populations. By integrating high-coverage archaic hominin genomes with present-day human genotype data, we will identify variants of archaic origin that are still segregating at low frequency in modern populations. Using statistical and population genetics methods, including allele frequency analysis and phenotype association testing, we aim to assess the functional impact of these variants. We will apply custom Python/R scripts for data processing and analysis. This approach allows us to test hypotheses about archaic introgression and its contribution to human phenotypic diversity and disease susceptibility in contemporary populations.

Anticipated Findings

This study will enhance our understanding of hominin-specific genetic variants and their physiological consequences. By investigating the modern and archaic human-specific changes, we aim to elucidate how these genetic modifications influence metabolic pathways, cellular function, and overall physiological traits. Our findings may reveal evolutionary trade-offs associated with these variants, shedding light on their potential roles in energy metabolism, adaptation to environmental pressures, and disease susceptibility. By integrating genomic, biochemical, and functional analyses, this research will provide new insights into the evolutionary forces shaping human physiology. More broadly, it will contribute to the growing knowledge of how archaic introgression and modern human-specific adaptations have shaped genetic diversity and trait variation in present-day populations.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Shin-Yu Lee - Research Fellow, Okinawa Institute of Science and Technology School Corporation

Duplicate of AOU_Recover_Long_Covid_v6-[uses_v8]

The purpose of this workspace was to implement the published XGBoost machine learning (ML) model, which was developed using the National COVID Cohort Collaborative’s (N3C) EHR repository to identify potential patients with PASC/Long COVID in All of Us Research Program.

Scientific Questions Being Studied

The purpose of this workspace was to implement the published XGBoost machine learning (ML) model, which was developed using the National COVID Cohort Collaborative’s (N3C) EHR repository to identify potential patients with PASC/Long COVID in All of Us Research Program.

Project Purpose(s)

  • Disease Focused Research (Long COVID)

Scientific Approaches

To achieve this objective, data science workflows were used to apply ML algorithms on the Researcher Workbench. This effort allowed an expansion in the number of participants used to evaluate the ML models used to identify risk of PASC/Long COVID and also serve to validate the efforts of one team and providing insight to other teams. These models were implemented within the All of Us Controlled Tier data (C2022Q2R2), which was last refreshed on June 22, 2022. We intend to provide a step-by-step guide for the implementation of N3C's ML Model for identification of PASC/Long COVID Phenotype in the All of Us dataset.

Anticipated Findings

We intend to provide a step-by-step guide for the implementation of N3C's ML Model for identification of PASC/Long COVID Phenotype in the All of Us dataset.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

(CDRv8) Leveraging wearables for secondary prevention

We are interested in utilizing data from a wearable Fitbit device to develop a model to investigate 1) how daily step count, night-time sleep duration, and resting heart rate prior to and immediately after disease diagnosis are associated with disease…

Scientific Questions Being Studied

We are interested in utilizing data from a wearable Fitbit device to develop a model to investigate 1) how daily step count, night-time sleep duration, and resting heart rate prior to and immediately after disease diagnosis are associated with disease complications. We will look at multiple diseases, including Type 2 Diabetes and COVID-19.

Project Purpose(s)

  • Disease Focused Research (COVID-19, Type 2 Diabetes)

Scientific Approaches

We will use Fitbit, covariate (age, sex, SES, etc.), and outcome data.

We will use self-supervised learning to train a model to determine which features are most important for secondary prevention.

Anticipated Findings

We anticipate that this analysis will inform guidance for preventing disease progression and for identifying individuals at risk of disease complications.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Evelynne Fulda - Graduate Trainee, National Human Genome Research Institute (NIH - NHGRI)
  • Bennett Waxse - Research Fellow, National Institute of Allergy and Infectious Diseases (NIH - NIAID)

Duplicate of AOU_Recover_Long_Covid_v6

The purpose of this workspace was to implement the published XGBoost machine learning (ML) model, which was developed using the National COVID Cohort Collaborative’s (N3C) EHR repository to identify potential patients with PASC/Long COVID in All of Us Research Program.

Scientific Questions Being Studied

The purpose of this workspace was to implement the published XGBoost machine learning (ML) model, which was developed using the National COVID Cohort Collaborative’s (N3C) EHR repository to identify potential patients with PASC/Long COVID in All of Us Research Program.

Project Purpose(s)

  • Disease Focused Research (Long COVID)

Scientific Approaches

To achieve this objective, data science workflows were used to apply ML algorithms on the Researcher Workbench. This effort allowed an expansion in the number of participants used to evaluate the ML models used to identify risk of PASC/Long COVID and also serve to validate the efforts of one team and providing insight to other teams. These models were implemented within the All of Us Controlled Tier data (C2022Q2R2), which was last refreshed on June 22, 2022. We intend to provide a step-by-step guide for the implementation of N3C's ML Model for identification of PASC/Long COVID Phenotype in the All of Us dataset.

Using older v6 version as it was used in the original study I'm duplicating here.

Anticipated Findings

We intend to provide a step-by-step guide for the implementation of N3C's ML Model for identification of PASC/Long COVID Phenotype in the All of Us dataset.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

1 - 25 of 427
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.