Research Projects Directory

Research Projects Directory

3,684 active projects

This information was updated 2/5/2023

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

Climate and Dengue Relations

Our research group chose the topic of dengue fever to research for the purpose of a group project as an assignment in our biology class at Arizona State University. Each of us have our own reasons for our interest in…

Scientific Questions Being Studied

Our research group chose the topic of dengue fever to research for the purpose of a group project as an assignment in our biology class at Arizona State University. Each of us have our own reasons for our interest in this disease and found the focus of climatological effects on dengue to be something worth studying. Furthermore, the topic of vector-borne illness is an area we collectively wanted to research. After some preliminary research, we found that there wasn’t a lot of research completed on dengue within the United States. While this disease isn’t currently considered to be epidemic in the U.S., we found it to be common in the territories of the U.S. such as Puerto Rico.

The research question we wish to address is:

In areas of the United States impacted by dengue, do external factors such as temperature and humidity have an effect on the case numbers and the severity of dengue fever symptoms?

Project Purpose(s)

  • Disease Focused Research (dengue disease)
  • Control Set

Scientific Approaches

We plan to use the data on this platform to research dengue cases within the unites states. We want to compare symptom severity, case number prevalence and transmission rates to local climate data provided by the National Oceanic and Atmospheric Administration (NOAA). Our aim is to contribute to dengue research within the U.S., especially with its relation to climatological factors such as rainfall and temperature.

The data provided by the NOAA will be utilized to track weather in U.S. regions affected by dengue, our primary focus is on the U.S. territory of Puerto Rico. The variables we are tracking are temperature, precipitation, and humidity. We plan to compile data starting in the year of 2021, and establishing a monthly average for the variables listed. Correlation analysis will be used between weather variables and dengue cases.

Anticipated Findings

We anticipate finding dengue cases to increase during months that have increased humidity, perspiration and high temperatures. Our findings would contribute to current research since there is limited research on dengue within the U.S. in relation to climate. This disease has high global impact, and while it is thoroughly researched in areas that it is considered to be epidemic, there are gaps in the research completed within the U.S.

With climate change, the risk for vector borne illnesses increases, especially for diseases that are transmitted by arthropods. Climate related study on vector borne diseases could be an important area of research within the near future. We aim to improve the current research on this topic and raise awareness on this issue.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Example

This is a test workspace to let me be familiar with using all of us. These results will not be sent to anyone.

Scientific Questions Being Studied

This is a test workspace to let me be familiar with using all of us. These results will not be sent to anyone.

Project Purpose(s)

  • Educational

Scientific Approaches

This is a test workspace to let me be familiar with using all of us. These results will not be sent to anyone.

Anticipated Findings

This is a test workspace to let me be familiar with using all of us. These results will not be sent to anyone.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Benjamin Hu - Undergraduate Student, Ohio State University

Minority Patients and COPD Risk Factors Prediction

This study aims to explore minority patients (Hispanic/Latino/Spanish – Black/African/African American) who have been diagnosed with Chronic Obstructive Pulmonary Disease (COPD) to answer the following questions: 1. What are the common risk factors between patients who have be diagnosed with…

Scientific Questions Being Studied

This study aims to explore minority patients (Hispanic/Latino/Spanish – Black/African/African American) who have been diagnosed with Chronic Obstructive Pulmonary Disease (COPD) to answer the following questions:

1. What are the common risk factors between patients who have be diagnosed with COPD?
2. What is the impact of social determinants of health on survived COPD patients?
3. What are the most important causes of the metabolic syndrome in COPD?
4. Is there a relationship between COPD and other diseases such as lungs cancer and cardiovascular disease?
5. What is the relationship between COPD and physical activity?
6. What is the relationship between COPD and hospital readmission?

Project Purpose(s)

  • Educational

Scientific Approaches

In this study, we will build a cohort of minority patients who have been diagnosed with COPD, and use data analytics methods to investigate the extent of impact of some factors on patients health.

Anticipated Findings

This study anticipates findings that will contribute to the body of knowledge surrounding COPD. Also, this study expects to show positive associations between COPD and some risk factors.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Gregory Ramsey - Mid-career Tenured Researcher, Morgan State University

GSD XV

The purpose of this project is to investigate health outcomes of glycogen storage disease type XV, a rare disorder described in less than 100 patients that is related to pathogenic variants in the GYG1 gene. The penetrance of the cardiac…

Scientific Questions Being Studied

The purpose of this project is to investigate health outcomes of glycogen storage disease type XV, a rare disorder described in less than 100 patients that is related to pathogenic variants in the GYG1 gene. The penetrance of the cardiac pathogenic variant is unknown. This dataset potentially has the ability to provide a more accurate estimate of cardiac disease in patients with pathogenic variants.

Project Purpose(s)

  • Disease Focused Research (glycogen storage disease XV)

Scientific Approaches

I will investigate health outcomes in participants with pathogenic variants in the GYG1 gene that underlie GSD XV. In particular, I am interested in the presence or absence of cardiac disease (abnormal ECGs, cardiac diagnoses, etc) in individuals with pathogenic variants.

Anticipated Findings

I anticipate I will be able to identify a cohort with GYG1 variants and that from this cohort determine the prevalence of cardiac disease and age of onset of symptoms, if any, from GYG1 pathogenic variants.

Demographic Categories of Interest

  • Age
  • Disability Status

Data Set Used

Controlled Tier

Research Team

Owner:

Chronic Biological Stress and Adult-onset Allergies

I will be exploring if biological stress caused by chronic conditions can affect or induce allergies in adults. Despite the increase in cases in recent years, I understand that there have been limited research/studies on adult-onset allergies. I hope to…

Scientific Questions Being Studied

I will be exploring if biological stress caused by chronic conditions can affect or induce allergies in adults. Despite the increase in cases in recent years, I understand that there have been limited research/studies on adult-onset allergies. I hope to be able to understand if there is any correlation between these two variables and if it is positive or negative.

Project Purpose(s)

  • Disease Focused Research (hypersensitivity reaction type I disease)

Scientific Approaches

I will be looking into the data on individuals who have chronic diseases and have also been diagnosed with adult-onset allergies. I also hope to use genomic data to examine how the genes have mutated in these individuals.

Anticipated Findings

I anticipate that there may be a positive correlation between adult individuals having chronic diseases and their later diagnosis of allergies. I hope my findings can assist in improving therapies for individuals with allergies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

PheWAS

A Phenome-wide association study consists of an array of association tests over an indexed representation of the human phenome. We want to ask what the connection between SNP and phenotype is. In our research, we are interested in It is…

Scientific Questions Being Studied

A Phenome-wide association study consists of an array of association tests over an indexed representation of the human phenome. We want to ask what the connection between SNP and phenotype is. In our research, we are interested in It is important because it can identify gene variance and provide a new thought for genetic treatment.

Project Purpose(s)

  • Ancestry

Scientific Approaches

We will use the PheWAS R package and the plink data in All of Us, connect the result with the EHR record, and find the connection between phenotype and SNP(76genes).

Anticipated Findings

The result we expect is based on the gene list our collaborators provide, we find the potential connection in specific SNP and phenotype.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Puran Nepal - Project Personnel, University of Miami
  • Mary Davis - Early Career Tenure-track Researcher, Brigham Young University
  • Jacob McCauley - Mid-career Tenured Researcher, University of Miami

General Data Preparation for UTMB/UTH Collaboration

Exploring 19 SDoH variables defined on prior Dementia Workspace with relation to delayed care and inability to afford care within the realm of hypertension, diabetes, and osteoarthritis.

Scientific Questions Being Studied

Exploring 19 SDoH variables defined on prior Dementia Workspace with relation to delayed care and inability to afford care within the realm of hypertension, diabetes, and osteoarthritis.

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes mellitus, osteoarthritis, and hypertension)
  • Educational
  • Methods Development
  • Ancestry

Scientific Approaches

Descriptive analyses of the cohort and subsequent demographics will be done with Seaborn and R. Bi-clustering will be done with ExplodeLayout and Bipartite Modularity.

This specific workspace is going to be my (bokov's) place to run normal (non-genomic) code as an individual researcher using data pre-computed on Genomic Data Extraction for UTMB/UTH Collaboration. In this project it will be merged with other AoU data sources. Production code and results will be uploaded to the production bucket (URI will be communicated to team members) into the SHARED_PRODUCTION_CODE for code and SHARED_STAGING_DATA for data.

Anticipated Findings

Certain subtypes of these disease groups may have more SDoH variables answered that may help with future interventions. Developing a generalizable method to analyze AoU data is also important.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Weibin Zhang - Project Personnel, University of Texas Medical Branch (UTMB) at Galveston
  • Daniel Bao - Graduate Trainee, University of Texas Medical Branch (UTMB) at Galveston
  • Alex Bokov - Other, University of Texas Health Science Center, San Antonio
  • Suresh Bhavnani - Late Career Tenured Researcher, University of Texas Medical Branch (UTMB) at Galveston

COVID racial health disparities

I am interested in exploring whether the racial health disparity patterns we saw for COVID in the SF Bay Area are similar to those in the All of Us Data. Our study, "Racial/Ethnic, Biomedical, and Sociodemographic Risk Factors for COVID‑19…

Scientific Questions Being Studied

I am interested in exploring whether the racial health disparity patterns we saw for COVID in the SF Bay Area are similar to those in the All of Us Data. Our study, "Racial/Ethnic, Biomedical, and Sociodemographic Risk Factors for COVID‑19 Positivity and Hospitalization in the San Francisco Bay Area," was published in the Journal of Racial and Ethnic Health Disparities.

Project Purpose(s)

  • Population Health

Scientific Approaches

I plan to use a logistic regression. The dataset will largely mirror the one we used from the UC San Francisco EHR.

Anticipated Findings

I am hoping to further understand why racial health disparities manifest across various data sets.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Wendy Cho - Mid-career Tenured Researcher, University of Illinois at Urbana Champaign

potential risk factor gastric cancer

Tumorigenesis of gastric cancer follows the sequence of hyperplasia, metaplasia, dysplasia, and finally neoplasia. The underlying genetic, environmental, and even socioeconomic backgrounds have not been fully understood and linked yet. Here, we are trying answer this by using the controlled…

Scientific Questions Being Studied

Tumorigenesis of gastric cancer follows the sequence of hyperplasia, metaplasia, dysplasia, and finally neoplasia. The underlying genetic, environmental, and even socioeconomic backgrounds have not been fully understood and linked yet. Here, we are trying answer this by using the controlled tier dataset from all of us.

Project Purpose(s)

  • Educational

Scientific Approaches

To answer this question, we are going to include cohorts with gastric metaplasia, gastric dysplasia, and gastric cancers. We are first going to analyze the GWAS of these individuals and see if there are common genetic variants within the same groups and between different stages of gastric cancers. We are also planning in combining their survey and monitoring data in answering these questions.

Anticipated Findings

We anticipate finding multiple already known mutations from the GWAS but we would expect to find other mutations that span across different stages of gastric tumorigenesis. Moreover, we would expect to find links between environmental and genetic backgrounds. These could provide 1) screening for the possibilities of the occurrence of gastric cancer, and 2) ways to lower the risk of getting this.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Genomic Data Extraction for UTMB/UTH Collaboration

Exploring 19 SDoH variables defined on prior Dementia Workspace with relation to delayed care and inability to afford care within the realm of hypertension, diabetes, and osteoarthritis.

Scientific Questions Being Studied

Exploring 19 SDoH variables defined on prior Dementia Workspace with relation to delayed care and inability to afford care within the realm of hypertension, diabetes, and osteoarthritis.

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes mellitus, osteoarthritis, and hypertension)
  • Educational
  • Methods Development
  • Ancestry

Scientific Approaches

Descriptive analyses of the cohort and subsequent demographics will be done with Seaborn and R. Bi-clustering will be done with ExplodeLayout and Bipartite Modularity.

This specific workspace is going to be my (bokov's) hub for sharing pre-computed datasets and production code with the team in order to simplify development and conserve credits. The team will be provided with paths to the appropriate buckets.

This workspace uses a DataProc cluster, so it should only be used for working with genomic data, and cheaper workspaces can then copy the results out of this workspace's default bucket (currently, the SHARED_STAGED_DATA subfolder) to their local environments instead of everyone having to build it up each time.

Anticipated Findings

Certain subtypes of these disease groups may have more SDoH variables answered that may help with future interventions. Developing a generalizable method to analyze AoU data is also important.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Weibin Zhang - Project Personnel, University of Texas Medical Branch (UTMB) at Galveston
  • Daniel Bao - Graduate Trainee, University of Texas Medical Branch (UTMB) at Galveston
  • Alex Bokov - Other, University of Texas Health Science Center, San Antonio
  • Suresh Bhavnani - Late Career Tenured Researcher, University of Texas Medical Branch (UTMB) at Galveston

Duplicate again of Introductory example of GWAS with type 2 diabetes phenotype

Not applicable - this workspace is intended to be an introductory example of how to do a genome-wide association study on the All of Us genomic data that individuals can easily click through and understand.

Scientific Questions Being Studied

Not applicable - this workspace is intended to be an introductory example of how to do a genome-wide association study on the All of Us genomic data that individuals can easily click through and understand.

Project Purpose(s)

  • Educational

Scientific Approaches

Not applicable - this workspace is intended to be an introductory example of how to do a genome-wide association study on the All of Us genomic data that individuals can easily click through and understand.

Anticipated Findings

Not applicable - this workspace is intended to be an introductory example of how to do a genome-wide association study on the All of Us genomic data that individuals can easily click through and understand.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

PRS Uncertainty

We plan to explore how uncertainty in polygenic risk scores applies across population models, including admixed populations.

Scientific Questions Being Studied

We plan to explore how uncertainty in polygenic risk scores applies across population models, including admixed populations.

Project Purpose(s)

  • Educational
  • Methods Development
  • Ancestry

Scientific Approaches

We plan to calculate polygenic risk scores and their uncertainty on an individual and population-basis using genetic and phenotypic data across diverse populations.

Anticipated Findings

We anticipate finding that existing polygenic risk scoring methods underperform in diverse and admixed populations, and hope to motivate development of new methods that will reduce the uncertainty and/or error for individuals from such backgrounds.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Karen Feng - Graduate Trainee, Stanford University

Genomic Analysis for Females Having Cancer

Cancer is a major burden of disease worldwide. Each year, tens of millions of people are diagnosed with cancer around the world, and more than half of the patients eventually die from it. I aim at the genetic differences in…

Scientific Questions Being Studied

Cancer is a major burden of disease worldwide. Each year, tens of millions of people are diagnosed with cancer around the world, and more than half of the patients eventually die from it. I aim at the genetic differences in different social groups for females who are at risk for cancers. My questions are 1. would genetic differences affect the survival rate and mortality for cancers? 2. are there some environmental factors that affect genes that cause cancers?

Project Purpose(s)

  • Educational

Scientific Approaches

Aim at females having some main cancers, build datasets from survey data, genomics data, and EHR data, such as Family Health History data, Conditions data, and so on; stratify them by demographic characteristics, and environmental factors; use logistical models, differential analysis, and survival analysis by Python and R; visualize and interpret the final results.

Anticipated Findings

Genetic differences will affect the survival rate and mortality for cancers; there are some environmental factors that affect genes that cause cancers; females in poor social-economic status may be vulnerable to some cancer. Risk factors for cancers may vary by different environmental factors so different kinds of people can have different plans to prevent cancer according to the results of this research. Besides, the authority can distribute public resources to prevent cancer referring to the results of this research.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Yaqi Zou - Project Personnel, New Jersey Institute of Technology

Collaborators:

  • Qingzhao Yu - Mid-career Tenured Researcher, Louisiana State University Health Sciences Center, New Orleans

Chronic Osteomyelitis Prevalence and Risk Factors

We are interested in investigating the prevalence and pre-disposing risk factors for chronic osteomyelitis in the adult population. This is important to better understand which people may be at a higher risk for developing chronic osteomyelitis after the initial infection.…

Scientific Questions Being Studied

We are interested in investigating the prevalence and pre-disposing risk factors for chronic osteomyelitis in the adult population. This is important to better understand which people may be at a higher risk for developing chronic osteomyelitis after the initial infection. Physicians will then be able to utilize more aggressive therapies for patients at higher risk and hopefully improve their overall treatment.

Project Purpose(s)

  • Disease Focused Research (Chronic Osteomyelitis)

Scientific Approaches

We will conduct a cross sectional analysis of the demographic, health, and survey information for adults with chronic osteomyelitis. This will allow us to run statistical analysis of predisposing conditions and prevalence.

Anticipated Findings

We expect to find that the incidence of chronic osteomyelitis in the US adult population has continued to increase due to increasing rates of comorbid conditions. We also hope to identify any other factors that may predispose individuals to developing chronic osteomyelitis in an effort to better guide therapy for higher risk patients.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Vikram Shaw - Graduate Trainee, Baylor College of Medicine
  • David Hsiou - Graduate Trainee, Baylor College of Medicine

OA Access to Care

In adult patients with osteoarthritis of the knee or hip, how does limited access to care affect their disease control compared to patients with no reported issues accessing care? This question will offer insight into the treatment of osteoarthritis and…

Scientific Questions Being Studied

In adult patients with osteoarthritis of the knee or hip, how does limited access to care affect their disease control compared to patients with no reported issues accessing care?

This question will offer insight into the treatment of osteoarthritis and the importance of investigating a wide scope of factors for disease control.

Project Purpose(s)

  • Disease Focused Research (osteoarthritis)

Scientific Approaches

The database will be used to organize cohorts of patients based on the inclusion criteria of osteoarthritis, and then separate them further based on self-reported access to healthcare. Patients from the two groups will be matched based on treatment modalities, and reported pain control will be analyzed.

Anticipated Findings

I suspect that access to care will have a statistically significant impact on the control of osteoarthritis, irrespective of treatment modality. This will encourage the consideration of patient factors in the treatment of osteoarthritis that are outside of the simplicity of treatment choices.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Vikram Shaw - Graduate Trainee, Baylor College of Medicine
  • David Hsiou - Graduate Trainee, Baylor College of Medicine

Explore Hypertension Data

This workspace is intended for educational purposes at BCM's 2023 UBR Faculty Summit to learn how to use the Researcher Workbench by viewing and analyzing data for hypertension.

Scientific Questions Being Studied

This workspace is intended for educational purposes at BCM's 2023 UBR Faculty Summit to learn how to use the Researcher Workbench by viewing and analyzing data for hypertension.

Project Purpose(s)

  • Educational

Scientific Approaches

This workspace is intended for educational purposes at BCM's 2023 UBR Faculty Summit to learn how to use the Researcher Workbench by viewing and analyzing data for hypertension.

Anticipated Findings

This workspace is intended for educational purposes at BCM's 2023 UBR Faculty Summit to learn how to use the Researcher Workbench by viewing and analyzing data for hypertension.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Duplicate of How to Backup Notebooks and Intermediate Results

Not applicable - these utility notebooks do not perform any analyses.

Scientific Questions Being Studied

Not applicable - these utility notebooks do not perform any analyses.

Project Purpose(s)

  • Other Purpose (Demonstrate to workbench users how to create snapshots of notebooks and backups of intermediate results stored in other files such as plot images and derived data.)

Scientific Approaches

Not applicable - these utility notebooks do not perform any analyses.

Anticipated Findings

Not applicable - these utility notebooks do not perform any analyses.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Duplicate of How to Work with All of Us Survey Data (v6)

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data. What should you expect? By running the notebooks in this workspace, you should get familiar with how to query…

Scientific Questions Being Studied

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data.

What should you expect?
By running the notebooks in this workspace, you should get familiar with how to query PPI questions/surveys, what the frequencies of answers for each question in each PPI module are.

Project Purpose(s)

  • Educational
  • Methods Development
  • Other Purpose (This is an All of Us Tutorial Workspace created by the Researcher Workbench Support team. It is meant to provide instruction for key Researcher Workbench components and All of Us data representation.)

Scientific Approaches

By running the notebooks in this workspace, you should get familiar with how to query PPI questions/surveys, what the frequencies of answers for each question in each PPI module are.

Anticipated Findings

By reading and running the notebooks in this Tutorial Workspace, researchers will learn the following:
- how to query the survey data,
- how to summarize PPI modules, and questions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • PAUL THURAS - Project Personnel, University of Minnesota

Readmission in Heart Failure

Readmission rates among patients hospitalized with acute heart failure have been notoriously resistant to quality improvement and administrative initiatives - and some of these programs have been associated with adverse consequences. Part of the problem is that prediction of readmission…

Scientific Questions Being Studied

Readmission rates among patients hospitalized with acute heart failure have been notoriously resistant to quality improvement and administrative initiatives - and some of these programs have been associated with adverse consequences. Part of the problem is that prediction of readmission risk, and accordingly, selecting patients for additional interventions, has been limited by the fact that predictive models based on biomedical variables have limited accuracy for readmission.

The research question that we will try to answer by using the AoU database is whether the richer information on social determinants of health included in this database can improve prediction of patients who are at higher risk for short-term readmission and therefore provide the basis for selecting patients in need for additional support or more rigorous follow-up. A secondary goal of the project is to quantify the relative contribution of biomedical and socioeconomic factors to heart failure readmission risk.

Project Purpose(s)

  • Disease Focused Research (Heart Failure)
  • Population Health

Scientific Approaches

For this purpose, we will select patients who have an inpatient encounter associated with a standard set of ICD codes that are used to capture heart failure in AoU (ICD-10 CM I50 codes and ICD-9 CM 428 codes). we'll use this as the index encounter to describe the patient characteristics and variables that are potential predictors of readmission.

We will identify then patients who have a repeat inpatient encounter associated with the same codes within two predefined periods: 30 and 90 days. Although 30-day readmissions rate is the administrative standard, we have previously demonstrated that biological vulnerability after an admission for heart failure persists for up to 90 days.

We will deploy both traditional logistic regression and machine learning models (boosted trees) to predict readmission in the specified time frames. The goal of the traditional model is to compare predictive accuracy over existing models and analyze the relative contribution of each factor to readmission.

Anticipated Findings

We expect that the results of this study will allow us to:

1. Quantify the ability of standard biomedical variables that are available in the electronic health record during an admission for heart failure to predict readmission. This may allow further research to focus on specific components of care (for example, preservation or improvement of renal function) that are disproportionately affecting the readmission risk.

2. Identify social determinants of health that can substantially improve our ability to predict readmission risk for heart failure and identify patients who might benefit from additional interventions and more intensive follow up. We expect that this is an important component of readmission risk, as previous models have demonstrated notoriously low accuracy for readmissions, which in contrast with the reasonable accuracy of models that predict mortality in heart failure.

Demographic Categories of Interest

  • Age

Data Set Used

Registered Tier

Research Team

Owner:

PRS for Alzheimer's

PRS have been demonstrated to have low portability between ancestry groups (Duncan et al., 2019). This is largely due to the fact that 78% of GWAS has been performed in individuals of European ancestry (Gurdasani et al., 2019). Generous participants…

Scientific Questions Being Studied

PRS have been demonstrated to have low portability between ancestry groups (Duncan et al., 2019). This is largely due to the fact that 78% of GWAS has been performed in individuals of European ancestry (Gurdasani et al., 2019). Generous participants in the All of Us research program are from diverse genetic ancestry backgrounds, allowing for the statistical power necessary to test methods to improve PRS performance in individuals of non-European ancestry.

Specifically, I will be exploring the performance of the PRS for Alzheimer's Disease derived in European ancestry cohorts in various genetic ancestry groups. Increasing PRS performance in different ancestry groups is imperative to equitable implementation of PRS to improve health outcomes.

Project Purpose(s)

  • Disease Focused Research (Alzheimer's disease)
  • Ancestry

Scientific Approaches

Datasets:

Cohort: Whole Genome Sequencing

Alzheimer's vs Control: Non-Alzheimer's

Derive PRS that includes all ancestry groups and then individual PRS for each ancestry group. Generate a generalized linear model that includes age, sex at birth, and PRS as covariates. Additionally, generate a model that also includes the number of APOE-ε4 and APOE-ε2 alleles.

Anticipated Findings

We expect that PRS scores trained in European ancestry will have lower accuracy for predicting Alzheimer's disease in cohorts with non-European ancestry. We also predict that training a new PRS in more diverse populations will make a PRS that is more accurate across all ancestries.

Overall, these findings would contribute to the growing body of literature supporting the need for including more diverse ancestry groups in genomic studies. Importantly, this will improve health-outcomes in minoritized groups.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

  • Micah Hysong - Graduate Trainee, University of North Carolina, Chapel Hill

AOU_DATAPROD_CLUSTER

We would like to conduct Genome wide association studies (GWAS) on PAD case and control data. Individual PAD GWAS will be meta analyzed to identify important loci associated with PAD. We will conduct GWAS using Regenie an meta analyze the…

Scientific Questions Being Studied

We would like to conduct Genome wide association studies (GWAS) on PAD case and control data. Individual PAD GWAS will be meta analyzed to identify important loci associated with PAD. We will conduct GWAS using Regenie an meta analyze the results using Metal.

Project Purpose(s)

  • Disease Focused Research (peripheral artery disease)
  • Ancestry

Scientific Approaches

We will curate PAD case and controls and we will conduct GWAS using Regenie an meta analyze the results using Metal.

Anticipated Findings

We aim to meta-analyse the data with other cohorts and plan to identify new loci associated with PAD

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Buu Truong - Research Fellow, Broad Institute

PRS for Blood Cell Traits

PRS have been demonstrated to have low portability between ancestry groups (Duncan et al., 2019). This is largely due to the fact that 78% of GWAS has been performed in individuals of European ancestry (Gurdasani et al., 2019). Generous participants…

Scientific Questions Being Studied

PRS have been demonstrated to have low portability between ancestry groups (Duncan et al., 2019). This is largely due to the fact that 78% of GWAS has been performed in individuals of European ancestry (Gurdasani et al., 2019). Generous participants in the All of Us research program are from diverse genetic ancestry backgrounds, allowing for the statistical power necessary to test methods to improve PRS performance in individuals of non-European ancestry. Specifically, I will be exploring the performance of the PRS for a variety of test quantitative traits, including blood cell indices. Moreover, I will use the All of Us data to test/train new PRS in non-European ancestry groups and assess if this leads to higher performance. Increasing PRS performance in different ancestry groups is imperative to equitable implementation of PRS to improve health outcomes.

Project Purpose(s)

  • Ancestry

Scientific Approaches

Datasets: Cohort: Whole Genome Sequencing Derive PRS that includes all ancestry groups and then individual PRS for each ancestry group. Generate a generalized linear model that includes age, sex at birth, and PRS as covariates.

Anticipated Findings

We expect that PRS scores trained in European ancestry will have lower accuracy for predicting blood cell indices in cohorts with non-European ancestry. We also predict that training a new PRS in more diverse populations will make a PRS that is more accurate across all ancestries. Overall, these findings would contribute to the growing body of literature supporting the need for including more diverse ancestry groups in genomic studies. Importantly, this will improve health-outcomes in minoritized groups.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

  • Micah Hysong - Graduate Trainee, University of North Carolina, Chapel Hill

Exploring Hypertension Data Types

Not applicable – this Workspace is intended for educational purposes for the 2023 UBR Faculty Summit to learn how to use the Researcher Workbench by analyzing a data type for hypertension.

Scientific Questions Being Studied

Not applicable – this Workspace is intended for educational purposes for the 2023 UBR Faculty Summit to learn how to use the Researcher Workbench by analyzing a data type for hypertension.

Project Purpose(s)

  • Educational

Scientific Approaches

Not applicable – this Workspace is intended for educational purposes for the 2023 UBR Faculty Summit to learn how to use the Researcher Workbench by analyzing a data type for hypertension.

Anticipated Findings

Not applicable – this Workspace is intended for educational purposes for the 2023 UBR Faculty Summit to learn how to use the Researcher Workbench by analyzing a data type for hypertension.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Genetics and Glaucoma

We are studying genomic variation in glaucoma and would like to use the AllofUs dataset to validate prior GWASes. This question is important because a lot of studies have been performed on populations of European descent, but it is important…

Scientific Questions Being Studied

We are studying genomic variation in glaucoma and would like to use the AllofUs dataset to validate prior GWASes. This question is important because a lot of studies have been performed on populations of European descent, but it is important to look at genetic variation in non-European populations as well. AllofUs has a diverse enrollment, which is perfect for this study.

Project Purpose(s)

  • Disease Focused Research (Glaucoma)
  • Ancestry

Scientific Approaches

For this study, we plan on performing a GWAS to identify variants associated with glaucoma in the AllofUs dataset. For cases, we plan on including patients with POAG diagnosis codes and surgical codes and genomic data, while the control cohort would consist of adults with genomic data but without POAG. Tools we plan on using include HAIL in accordance with the All of Us example notebooks to perform the GWAS. Data visualization methods will include Manhattan plots, QQ plots, and identification of significant genetic variants associated with glaucoma.

Anticipated Findings

Using the AllofUs dataset, which contains data from diverse populations, we are hoping to validate previous GWAS results that have mostly been performed on European populations. Our findings would contribute by identifying genetic variants associated with glaucoma in non-European populations. Furthermore, in the future we also hope to use these results to study the relative contribution of genetic variation versus social determinants to glaucoma.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Sally Baxter - Research Fellow, University of California, San Diego
  • Bonnie Huang - Graduate Trainee, Northwestern University

Insurance Fraud

What is the best way to identify insurance fraud? My fellow colleagues and I would like to use machine learning to validate or focus the identification of fraud thereby streamlining the process and reducing the waiting time for those who…

Scientific Questions Being Studied

What is the best way to identify insurance fraud?

My fellow colleagues and I would like to use machine learning to validate or focus the identification of fraud thereby streamlining the process and reducing the waiting time for those who have filed legitimate claims.

Project Purpose(s)

  • Educational
  • Methods Development

Scientific Approaches

Dataset hasn't been identified yet. We are hoping to use random forest algorithm along with a few other machine learning algorithms to identify fraud.

Anticipated Findings

The anticipated finding is that these methods will be able to discern fraudulent claim applications thereby streamlining the process resulting is lower waiting times for legitimate users.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

1 - 25 of 3684
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.