Research Projects Directory

Research Projects Directory

Information about each research project within the Workbench is available in the Research Projects Directory below. Approved researchers provide their project’s research purpose, description, populations of interest and more. This information helps All of Us ensure transparency on the type of research being conducted.

At this time, all listed projects are using data in the Registered Tier. The Registered Tier contains individual-level data from electronic health records, survey answers, and physical measurements. These data have been altered to protect participant privacy.

Note: Researcher Workbench users provide information about their research projects independently. Any views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program.

Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

There are currently 291 active workspaces. This information was updated on 12/5/2020.

Sort By Title:

Kidney Stone Diagnosis by Race

Project Purpose(s)

  • Disease Focused Research (Kidney Stones)
  • Population Health ...

Scientific Questions Being Studied

Kidney stone incidence differs among ethnicities. We are assessing the differences in incidence of kidney stones among different ethnicities as a first step toward reconciling health disparities.

Scientific Approaches

We will use the allofus cohort to assess incidence of kidney stones across ethnicities, and then subdivide by ethnicity and compare differences. We will assess for statistical significance.

Anticipated Findings

We anticipate that differences will exist among incidence for different ethnicities which will help inform improvements in health disparities.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care

Research Team

Owner:

  • Connor Forbes - Research Fellow, Vanderbilt University Medical Center

Leaning How to Backup Notebooks and Intermediate Results

Project Purpose(s)

  • Other Purpose (I am using this workbench to learn how to create versions (revisions) of notebooks and intermediate results stored in other files such as plot images and derived data.) ...

Scientific Questions Being Studied

I am using these featured utility notebooks to learn how to save notebooks and intermediate results.

Scientific Approaches

Not available.

Anticipated Findings

to learn notebook features and result saving options. (these utility notebooks do not perform any analyses.)

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Md Mesbah Uddin - Research Fellow, The Broad Institute

Learning "Cardiovascular Risk Scoring" using the Demo

Project Purpose(s)

  • Disease Focused Research (Cardiovascular disease )
  • Other Purpose (This workbench is a clone of the demo workbench "Cardiovascular Risk Scoring". I am using this workbench to get familiar with the CVD phenotypes available, and how to perform analyses using the available tools in the platform.) ...

Scientific Questions Being Studied

Following research questions were explored (in the original Analysis) using this workbench:
1- Can we use All of Us data to calculate the cardiovascular pooled score?
2- Can we identify the scores that we calculate within a year of All of Us enrollment?
3- Will the risk score per race group be different?

Scientific Approaches

Here, my scientific interest is to learn how to perform analysis using this platform, using this featured workbench.
In the original analysis
>>In this project, we plan on using the AHA algorithm/equation to calculate the cardiovascular risk scores ( https://ahajournals.org/doi/full/10.1161/01.cir.0000437741.48606.98). Further, we want to demonstrate the usage of smoking and race data collected by the program, which are data that usually researchers use natural language processing to extract, to facilitate the calculation of cardiovascular risk score.
We will calculate the scores using 1- Data manipulation: Using python and BigQuery to: A- Retrieve medications (diabetes), lab measurements including systolic blood pressure, diastolic blood pressure, cholesterol, race, and smoking information provided by participants 2- Visualization: A- Creating histogram for calculated scores using python visualization library Matplotlib<<

Anticipated Findings

this workbench will facilitate the onboarding to the platform to perform actual analysis.
Original anticipated findings as reported in the workbench:
>>For this study, we anticipate demonstrating the validity and importance of the data collected by the program and can be challenging to extract from medical records (smoking status) by showing by calculating the cardiovascular risk within 10 years. We expect to find: 1) the easiness in using data from different sources (EHR and survey data) to build a model or calculate a risk. 2) the heterogeneity in All of Us population where underrepresented population in clinical trials or clinical data set are more present in the All of Us 3) the cardiovascular risk score is different in racial groups.<<

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Md Mesbah Uddin - Research Fellow, The Broad Institute

legacy_codes

Project Purpose(s)

  • Disease Focused Research (hypertension) ...

Scientific Questions Being Studied

What is the prevalence of hypertension (HTN) defined using an electronic health record definition from eMERGE among UBR groups defined by race/ethnicity, income and education?

Do treatment patterns for HTN (using medication sequencing analysis) vary by UBR groups defined by race/ethnicity, income and education, and in geographic regions based on grouping states?

Scientific Approaches

Not available.

Anticipated Findings

There may be disparities in HTN across racial and income groups of policy interest.

Demographic Categories of Interest

  • Income Level

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

LIPIDS_GWAS

Project Purpose(s)

  • Disease Focused Research (CVD, diabetes, lipids, liver disease)
  • Population Health ...
  • Drug Development
  • Ancestry

Scientific Questions Being Studied

We will investigate lipids and how lipids play a role in cardiovascular disease risk. All the studies we have conducted in lipids are mostly based on either European cohorts like UK-biobank and multi-ethnic cohorts like TOPmed. We would like to understand how the lipid patterns and role in mechanism of cardiovascular disease risk.
1.) Are there any genetic risk factors observed specific to the All OF US cohort than others?
2.) how are other diseases like Diabetes/obesity are linked to lipids?
3.) Are there any underlying mechanisms that help us in understanding disease prognosis and helps towards therapeutics?

This study will give an idea of how lipid patterns play a role in understanding disease risk in the United States.

Scientific Approaches

1.) We will use Hail(developed by the Broad Institute) to do initial quality control checks for the dataset.
2.) Generation of PCs and relatedness matrix using GENESIS package in R
3.) We will extract lipids and statins use status from Health records. (if already extracted, we will use those)
4.) We will conduct a whole scale genome-wide association study using lipids profiles which are adjusted for statins (4 lipids traits - HDL, LDL, TC, TG) and model is adjusted with age, sex, pcs.(Tool - Not decided)
5.) The significant variants identified will be replicated in other cohorts like TOPmed, UK-biobank, etc
6.) We will also perform rare variant burden test using tools available like STAAR package in R
7.)Based on initial findings we will conduct downstream analysis which is to be determined.

Anticipated Findings

For this study, We expect there will be novel genes/variants which are specific to the USA population. Given a diverse population structure, we expect to find many more genes involved that were not previously discovered which helps us understand underlying mechanisms of disease.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • akhil pampana - Project Personnel, The Broad Institute

Collaborators:

  • Thomas Gilliland - Research Fellow, The Broad Institute

Liver Transplant

Project Purpose(s)

  • Disease Focused Research (Liver Transplant) ...

Scientific Questions Being Studied

Our specific questions are about if MELDNa labs contribute to sex differences seen in liver transplant. We plan to use lab values (creatinine, INR, bilirubin, and sodium) to reconstruct MELDNa scores and compare lab and MELDNa values between sexes within control, liver disease cases, and transplant recipients. We next plan to build a sex-adjusted MELDNa score in conjunction with VUMC.

Scientific Approaches

We plan to use lab values, ICD codes, and CPT codes to define lab traits, liver disease, and liver transplant.

Anticipated Findings

Our findings will contribute to the body of scientific knowledge by showing that all labs in the MELDNa score show sex differences that contribute to lower MELDNa score in females. A sex-adjusted MELDNa score will help close the gap between males and females in transplant.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Julia Sealock - Graduate Trainee, Vanderbilt University

Collaborators:

  • Francis Ratsimbazafy - Other, All of Us Program Operational Use

Longitudinal Aging Biomarkers

Project Purpose(s)

  • Disease Focused Research (Age-related disease) ...

Scientific Questions Being Studied

Biomarkers of aging predict current and future age-related morbidity and mortality, distinguishing between individuals of the same chronological age. However, most of these biomarkers were developed using cross-sectional data. Our questions are: 1) what is the normative trajectory of biomarkers of aging based on panels of clinical lab tests (glucose, albumin, etc), 2) are there different types of trajectories for these biomarkers, and 3) what factors predict the longitudinal change in these biomarkers.

Scientific Approaches

We will calculate previously published biomarkers of aging, selected based on the availability of longitudinal data in the All of Us data set. Individuals will need at least 3 independent measurements. Then we will fit linear and LOESS models to individuals, cluster individuals based on their trajectories, and determine what baseline characteristics determine slopes of the biomarkers.

Anticipated Findings

We anticipate that different factors will determine the rate of change of aging biomarkers compared to the cross-sectional values of biomarkers. We also anticipate that there will be distinct subtypes of aging with distinct risk factors and outcomes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Albert Higgins-Chen - Research Fellow, Yale University

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do

Project Purpose(s)

  • Disease Focused Research (cancer)
  • Ancestry ...
  • Other Purpose (Testing UI, UX, and programming tools.)

Scientific Questions Being Studied

Testing UI, UX, and programming tools.

Scientific Approaches

Test ABC

Anticipated Findings

I guess we'll find out.

Demographic Categories of Interest

  • Race / Ethnicity
  • Sex at Birth
  • Disability Status

Research Team

Owner:

  • Dylan Klomparens - Project Personnel, All of Us Program Operational Use

LR_Pancreatic Cancer Prediction

Project Purpose(s)

  • Disease Focused Research (pancreatic cancer) ...

Scientific Questions Being Studied

Most of the clinical decision support systems currently used in practice depend on simple statistical models and machine learning algorithms like logistic regression. With the advancement of artificial intelligence science and the availability of powerful computational resources, we can develop a more accurate and personalized prediction model. We propose to develop a deep learning based pancreatic cancer predictive model that provides high prediction accuracy with minimum false negatives and false positives rates.

Scientific Approaches

We propose to develop a sequential deep learning model, either based on recurrent neural network (RNN) architecture or Transformers. We will train the pancreatic cancer predictive model architecture on a cohort from this large and diverse dataset. We will define the study cohort and identify eligible cases as patients who have been diagnosed with pancreatic cancer based on ICD codes which will be further validated using relevant laboratory results or medication records. Our controls will be patients who were never get diagnosed with any cancer disease in their health history.

Anticipated Findings

The key innovation of our proposed project is not limited to the development of a deep learning based model to predict patient risk to develop pancreatic cancer, but the focus on the implementability of the developed model. Translational research has always been an interesting area with the majority of focus on transferring findings from laboratory settings to actual clinical trials in hospital settings. Similarly, our approach for model development considers the feasibility of the model implementation in clinical settings rather than only focusing on prediction accuracy like the majority of existing studies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Laila Rasmy - Graduate Trainee, University of Texas Health Science Center, Houston

LungCancerScreening

Project Purpose(s)

  • Disease Focused Research (lung cancer)
  • Methods Development ...

Scientific Questions Being Studied

Lung cancer continues to be the leading cause of deaths from malignancy worldwide. There have been widespread efforts to develop safe and effective screening methods to detect lung cancer at an earlier stage. The US Preventive Services Task Force (USPSTF) recommends screening for lung cancer in individuals aged 55-80 years, who have a smoking history of 30 pack-years or more, and who either currently smoke or quit within the past 15 years. However, data has shown that only a third of patients diagnosed with lung cancer in the USA meet the USPSTE screening criteria, suggesting that many potential high-risk individuals are not eligible for low-dose CT screening. Therefore, there is an urgent need to seek more sophisticated risk assessment methods incorporating clinical data, and to identify those at high risk and optimize the lung cancer screening criteria.

Scientific Approaches

Recently, the wide availability of Electronic Health Records (EHRs) has created a continuously growing repository of clinical data, which provides new opportunities for population-based studies on a large scale and at low-cost. Comprehensive and longitudinal data captured in EHRs such as patient demographics, diagnoses medications, laboratory and procedures provide unique opportunities to construct inexpensive risk screening profiles for patients. The proposed project will develop and evaluate an advanced informatics platform to optimize lung cancer screening criteria using EHRs. We will then create methods and tools to conduct PheWAS to characterize the phenotypic abnormalities associated with patient eligibility for screening and risk factors.

Anticipated Findings

This project will produce an integrated informatics platform with novel methods to enable efficient and effective cancer-phenotype data-model creation, cancer-study clinical data normalization, novel PheWAS study, and intelligent cancer patient risk prediction. If successful, the project will significantly impact systematic cancer data integration and discovery, optimize lung cancer screening criteria, and ultimately improve the outcomes of lung cancer screening.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Guoqian Jiang - Mid-career Tenured Researcher, Mayo Clinic

Collaborators:

  • jie na - Project Personnel, Mayo Clinic

Lupus Manifestations and Social Determinants of Health in Men and Women

Project Purpose(s)

  • Disease Focused Research (systemic lupus erythematosus) ...

Scientific Questions Being Studied

We want to identify and characterize important disease manifestations, serologic profiles, and treatments for patients with systemic lupus erythematosus (SLE). There are known to be differences in these between men and women, and also among different racial groups within the United States. We are interested in also studying the variation of these with sex, race, and socioeconomic factors. Additionally comorbidities and other health characteristic such as BMI or smoking history are believed to be important in disease outcomes including other system involvement or cardiovascular events. There have not been many large databases or study cohorts with SLE patients that include detailed information about these factors that are believed to be clinically important so the All of Us database provides a unique opportunity for observational study of these factors.

Scientific Approaches

We plan to report observational data of a cross-sectional cohort of all adult patients from the All of Us database who have a diagnosis of systemic lupus erythematosus (SLE) in EHR records. We will make a descriptive analysis of baseline characteristics including sex, age, race, annual household income, education level, self-reported quality of life, and smoking status. Subsequent analyses will compare the presence of organ specific manifestations of SLE and of cardiovascular events between male and female participants and between races using presence of EHR codes. We will use self-reported survey data to describe the frequency of barriers to healthcare access. We will also estimate the validity of self-report of SLE by comparing patient reported history of SLE with EHR documentation of this diagnosis.

Anticipated Findings

Male patients with systemic lupus erythematosus (SLE) are described across multiple studies to often develop more severe disease. Patients with SLE in the United among specific racial groups, including black and Hispanic patients, have also been described to develop more severe disease. These are attributed to a combination of genetic variance in the underlying disease and to social and environmental factors. Therefore, we anticipate men may have more frequent organ specific disease manifestations than women with SLE. Cardiovascular events are more common in men so we anticipate a higher rate of events in men with SLE. There is limited data on whether serologic positivity important to SLE is more prevalent among men than women so this prediction is less clear. This will be valuable as there is limited data characterizing SLE in men in a large cohort, or in characterizing the barriers to healthcare access among patients with SLE, which are potential targets for care improvement.

Demographic Categories of Interest

  • Race / Ethnicity
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Deepak Nag Ayyala - Early Career Tenure-track Researcher, Augusta University