Research Projects Directory

Research Projects Directory

Information about each research project within the Workbench is available in the Research Projects Directory below. Approved researchers provide their project’s research purpose, description, populations of interest and more. This information helps All of Us ensure transparency on the type of research being conducted.

At this time, all listed projects are using data in the Registered Tier. The Registered Tier contains individual-level data from electronic health records, survey answers, and physical measurements. These data have been altered to protect participant privacy.

Note: Researcher Workbench users provide information about their research projects independently. Any views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program.

Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

There are currently 291 active workspaces. This information was updated on 12/5/2020.

Sort By Title:

Access to care and ophthalmology outcomes

Project Purpose(s)

  • Population Health ...

Scientific Questions Being Studied

I am seeking to understand how barriers to healthcare and other socioeconomic factors influence outcomes in common ophthalmologic diseases.

Scientific Approaches

I plan to use data gathered from research surveys to compare patients with unrestricted access to care to patients who face significant challenges to accessing regular care.

Anticipated Findings

Socioeconomic factors such as health insurance, income, employment, significantly influence outcomes in ophthalmologic diseases.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Alison Chan - Graduate Trainee, University of California, San Diego

ADHD

Project Purpose(s)

  • Disease Focused Research (attention deficit hyperactivity disorder) ...

Scientific Questions Being Studied

ADHD in adults is not a well-studied disease. The bulk of ADHD research focuses on the youth (< 18 years) and the All of Us dataset contains an important cohort of ~2700 people diagnosed with ADHD. Through this adult cohort, we will examine the relationship between medications taken specifically for ADHD and other medications (e.g. opioids, SSRIs) that may be indicative of other mental health diseases. We hope to elucidate any relationships between ADHD and various mental health diseases such as depression through this exploration of medication data.

Scientific Approaches

We will subset our dataset to people who have been diagnosed with ADHD or more specific subtypes. From this cohort, we will examine the medication history of each individual and if they have any other medical diagnoses. Then, we will run statistical analyses to determine whether non-prescribed ADHD medications are significantly increased in usage within the ADHD cohort and identify the features of those individuals that may be risk factors for such medications.

Anticipated Findings

We hope to discover whether adults diagnosed with ADHD are more susceptible to other mental health diseases or drug abuse through their medication history. With this knowledge, individuals diagnosed with ADHD may be further evaluated for other mental health diseases and given the appropriate course of treatment.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jonathan Lam - Graduate Trainee, University of California, San Diego

Collaborators:

  • Caitlin Guccione - Graduate Trainee, University of California, San Diego
  • Cindy Reynolds - Research Fellow, University of California, San Diego

ADHD and Healthcare

Project Purpose(s)

  • Disease Focused Research (attention deficit hyperactivity disorder) ...

Scientific Questions Being Studied

Do individuals with attention-deficit hyperactivity disorder (ADHD) have the same access to healthcare as individuals without ADHD? How does this access change across the lifespan?

Scientific Approaches

We plan to use the All of Us dataset to compare healthcare access in individuals with and without ADHD using regression analysis.

Anticipated Findings

We expect individuals with ADHD to have reduced access to healthcare due to the effects of ADHD symptoms on everyday functioning. This study would help identify potential ways to improve the health of those with ADHD.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Chanelle Gordon - Research Associate, Boys Town National Research Hospital

Aevus Precision Rx

Project Purpose(s)

  • Disease Focused Research (Type 2 Diabetes)
  • Methods Development ...
  • Control Set

Scientific Questions Being Studied

Aevus is engineering a ML -powered Prescription Guidance Platform - Aevus Precision Rx for Endocrinologists/PCPs to tailor prescriptions for Type 2 diabetes patients. In doing so, Aevus intends to improve T2D drug outcomes, reduce medication switching and lower treatment costs. There are 11 drug classes used to treat T2D with over 100 different medications for a doctor to choose from prior to prescribing a treatment plan to a patient. The doctor is thus faced with a problem of plenty - On average, it takes an Endocrinologist/PCP 18-24 months to arrive at a stable drug compound and dosage of that compound to treat a T2D patient. A portion of the astronomical health care spending also goes into treating Adverse Drug Reactions, which are frequent and drain the health system of $3.5 B annually. Precision Rx will act as a real-time point-of-care clinical decision support tool that will interface as a software between the physician and the EHR.

Scientific Approaches

Approach: 1) Initial data exploration through visualizations of variable distributions in relation to frequency among Diabetes Type 2 patients, as well as medications, 2) Statistical testing of variables for significance in regards to outcome that includes correlation, covariance, univariate linear regressions, univariate logistic regressions, multivariate linear and logistic regressions including lasso regularization, and Bayesian model averaging. 3) Evaluation and scaled significance of variable interactions from statistical testing in respect to the outcome, 4) First testing of prototype machine learning models and evaluation of models against a benchmark for performance, 5) Model selection narrowed to a narrow subset of machine learning algorithms, 6) Testing of combinations of top significant variables found during statistical exploration in the chosen model in order to determine the final set of input predictors, 7) Tuning of final prediction model.

Anticipated Findings

The goal of our modelling outcome is to generate a scale that will rank the 11 drug classes used to treat Type 2 Diabetes in decreasing order of potential toxicity to the patient. The above mentioned overarching ML model will comprise of several sub-ML models, the ensemble of which will produce an umbrella model that can rank all the drug classes. Each sub-model will tackle the question of 'Is drug class X suitable for unique patient XYZ or not?'. Precision Rx is intended to be a tool that can easily integrate into the existing clinical workflow of a physician with minimal change to their routine. There have been some advancements in the space of automated clinical decision support for Type 2 Diabetes in the past - tools like Epocrates have EHR integration capabilities that allow for physicians to make more accurate prescription decisions based purely off ADA guidelines. These are thus static tools that are not making use of EHR and Big Data Analysis to make a more data-driven decision

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Nayyar Ahmed - Other, University of Pittsburgh

AFib epidemiology

Project Purpose(s)

  • Population Health
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

The overall goal of this study, as a Demonstration project, is to evaluate the ability of the All of Us Research Program data to replicate epidemiologic patterns of atrial fibrillation (AF), a common arrhythmia, previously described in other setting. We will address this goal with these two aims:
• Specific Aim 1. To determine the association of race and ethnicity with the prevalence and incidence of atrial fibrillation (AF). We hypothesize than non-whites will have lower prevalence and incidence of AF than whites.
• Specific Aim 2. To estimate associations of established risk factors for AF with the prevalence and incidence of AF. We hypothesize that increased body mass index, higher blood pressure, diabetes, smoking and a prior history of cardiovascular diseases will be associated with increased prevalence and incidence of AF.

Scientific Approaches

We will select all All of Us participants who self-reported sex at birth male or female, whose self-reported race was white, black or Asian, as well as those who self-reported being Hispanics.

Atrial fibrillation (AF) will be identified from self-reports in the medical survey or from electronic health records (EHR).

Clinical factors will be identified from EHR and study measurements (blood pressure, weight, height).

We will evaluate the association of demographic (age, sex, race/ethnicity) and clinical (body mass index, blood pressure, smoking, cardiovascular diseases) factors with prevalence of self-reported AF and prevalence of AF in the EHR, as well as incident AF ascertained from the EHR.

Anticipated Findings

The overall goal of this project is to evaluate the prevalence and incidence of atrial fibrillation (AF), overall and by race/ethnicity, as well as to confirm the association of established risk factors for AF in the All of Us Research participants. We expect to confirm associations between demographic and clinical variables previously reported in the literature, demonstrating the value of the All of Us Research Program data to address questions regarding this common cardiovascular disease.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Alvaro Alonso - Late Career Tenured Researcher, Emory University

Collaborators:

  • Vignesh Subbian - Early Career Tenure-track Researcher, University of Arizona
  • Peter Buto - Graduate Trainee, Emory University
  • Aniqa Alam

agreement demonstration project u of florida

Project Purpose(s)

  • Methods Development
  • Other Purpose (AoURP Demonstration Project) ...

Scientific Questions Being Studied

We will measure the agreement of PPI with EHR data and PM with EHR data. We will analyze whether various participant characteristics are associated with higher/lower levels of agreement, broken out by various types of information (demographics, conditions, procedures, prescriptions, etc.)

Scientific Approaches

Not available.

Anticipated Findings

We will be able to report to future AoURP data users what levels of agreement exist b/w PPI/EHR and PM/EHR to inform their study designs.

Demographic Categories of Interest

  • Sex at Birth
  • Geography
  • Education Level
  • Income Level

Research Team

Owner:

  • William Hogan - Late Career Tenured Researcher, University of Florida

Collaborators:

  • Dominick Lemas - Project Personnel, University of Florida
  • Jiang Bian
  • Matthew McConnell - Project Personnel, University of Florida

AIAN & Covid

Project Purpose(s)

  • Population Health ...

Scientific Questions Being Studied

Try to identify AI/AN population in All of Us database. Potentially look at health disparity in AI/AN.

Scientific Approaches

Retrospective observational study. Extract AI/AN population for the database. Compare Covid severity of disease and mortality between AI/AN population and other race/ethnicity.

Anticipated Findings

Hypothesis: AI/AN population have inferior health outcome than other race/ethnicity. Partially because of higher prevalence of diabetes and obesity in AI/AN population.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Jenny Chang - Project Personnel, University of California, Irvine

Antibiotic resistance patterns

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

I am exploring this data in the context of a graduate school course to analyze factors contributing to antibiotic resistance development. Antibiotic resistance is one of the foremost problems in infectious disease treatments. I seek to analyze patterns in antibiotic prescription regimens that lead to antibiotic resistance. Using these patterns, I plan to implement a machine learning model to identify these patterns in clinical practice and recommend alternative practices to healthcare providers.

Scientific Approaches

I will examine cohorts of inpatients who have been prescribed antibiotics for confirmed bacterial infections. Patients will be separated into those with antibiotic resistance development, requiring the prescription of a new class of antibiotics and/or broad-spectrum antibiotics, and those without antibiotic resistance development. Hierarchical clustering will be used to analyze separation of these cohorts by various factors. A machine learning model such as a neural network will then be implemented with these factors to predict antibiotic resistance in testing set.

Anticipated Findings

I anticipate finding specific prescription patterns in the antibiotic resistance cohort, such as prescription of a specific antibiotic or early discontinuation of antibiotics, that are not present in the non-resistant cohort. Development of a model to predict antibiotic resistance development would be helpful in clinical practice to advise healthcare providers on choosing treatment regimens that avoid antibiotic resistance development.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Joanna Coker - Graduate Trainee, University of California, San Diego

AoU Common Data Elements

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

We want to analyze the data quality for AllofUs data using the common data elements from eMerge project.

Scientific Approaches

We will use python to run basic analysis to calculate the prevalence of each common data element in AllofUs data set.

Anticipated Findings

I want to find the coverage of common data elements should be high.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Xinzhuo Jiang - Project Personnel, Columbia University

Collaborators:

  • Karthik Natarajan - Other, All of Us Program Operational Use

Applied biomedical informatics graduate course

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

I'm exploring the workbench as part of an applied biomedical informatics graduate course and that I’ll be leveraging AoU for educational purposes.

Scientific Approaches

I'm exploring the workbench as part of an applied biomedical informatics graduate course and that I’ll be leveraging AoU for educational purposes.

Anticipated Findings

I'm exploring the workbench as part of an applied biomedical informatics graduate course and that I’ll be leveraging AoU for educational purposes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Eugene Jeong - Graduate Trainee, Vanderbilt University

ARI Disease Sets

Project Purpose(s)

  • Disease Focused Research (Autoimmune diseases) ...

Scientific Questions Being Studied

The goal of our research is to determine prevalence of autoimmune diseases, individually and as a class of disease, in the US. This work will help understand the likelihood of having autoimmune disease and we hope it will improve the ability of doctors to diagnose patients as it will establish the prior probability of having one of these many diseases.

Scientific Approaches

We will create three data sets for analysis:

1. A list of diseases rated in the following ways:

a. Evidence Class
i. Strong evidence it is autoimmune
ii. Moderate evidence it is autoimmune
iii. Weak evidence for autoimmunity
iv. A comorbidity of autoimmune disease
v. Symptom or symptom set with no known mechanism

b. Autoinflammatory versus autoimmune flag

c. “Not always autoimmune” flag – to indicate diseases that could have alternative mechanisms of cause

2. A list of patients, anonymized, with socioeconomic, geographic and other data that would be of interest to patients and public health officials to understand which communities are affected by these diseases
3. Outcomes data for patients over time assessing quality of life using PROMIS metrics

Anticipated Findings

The current NIH estimate of 23.5 million people with autoimmune disease was a guess by a knowledgable clinician, but has no scientific support. As a consequence, there are numerous figures in the public sphere and nobody knows which one is correct.

Many reports say autoimmune diseases are on the increase, but since the number is unknown, it is impossible to say whether this is a public health issue or not. Having a methodology that can be used to recompute the number of people with autoimmune disease will help us understand if these reports are true.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Aaron Abend - Senior Researcher, Autoimmune Registry

Collaborators:

  • Jeffrey Green - Project Personnel, Autoimmune Registry

ARI Workspace

Project Purpose(s)

  • Disease Focused Research (Autoimmune diseases) ...

Scientific Questions Being Studied

The goal of our research is to determine prevalence of autoimmune diseases, individually and as a class of disease, in the US. This work will help understand the likelihood of having autoimmune disease and we hope it will improve the ability of doctors to diagnose patients as it will establish the prior probability of having one of these many diseases.

Scientific Approaches

We will create three data sets for analysis:

1. A list of diseases rated in the following ways:

a. Evidence Class
i. Strong evidence it is autoimmune
ii. Moderate evidence it is autoimmune
iii. Weak evidence for autoimmunity
iv. A comorbidity of autoimmune disease
v. Symptom or symptom set with no known mechanism

b. Autoinflammatory versus autoimmune flag

c. “Not always autoimmune” flag – to indicate diseases that could have alternative mechanisms of cause

2. A list of patients, anonymized, with socioeconomic, geographic and other data that would be of interest to patients and public health officials to understand which communities are affected by these diseases
3. Outcomes data for patients over time assessing quality of life using PROMIS metrics

Anticipated Findings

The current NIH estimate of 23.5 million people with autoimmune disease was a guess by a knowledgable clinician, but has no scientific support. As a consequence, there are numerous figures in the public sphere and nobody knows which one is correct.

Many reports say autoimmune diseases are on the increase, but since the number is unknown, it is impossible to say whether this is a public health issue or not. Having a methodology that can be used to recompute the number of people with autoimmune disease will help us understand if these reports are true.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Aaron Abend - Senior Researcher, Autoimmune Registry

Collaborators:

  • Jeffrey Green - Project Personnel, Autoimmune Registry
  • Eric Chen - Project Personnel, Autoimmune Registry
  • Priya Padathula - Project Personnel, Autoimmune Registry
  • Darrison Haftarczyk - Research Assistant, Autoimmune Registry

ascvd_underrepresented

Project Purpose(s)

  • Disease Focused Research (arteriosclerotic cardiovascular disease)
  • Population Health ...
  • Social / Behavioral
  • Drug Development
  • Methods Development
  • Ancestry

Scientific Questions Being Studied

It is unclear if traditional clinical risk scores can adequately quantify the risk of atherosclerotic cardiovascular disease in underrepresented study populations. Additionally, the genetic determinants of risk factors for atherosclerotic risk disease, such as polygenic risk scores and monogenic mutations, have only been well-studied and in European populations.

Questions:
1. How does the risk of atherosclerotic cardiovascular disease risk differ between commonly represented and underrepresented populations?
2. Do traditional clinical risk scores for atherosclerotic cardiovascular perform similarly in diverse populations? If not, can this be addressed with new or modified clinical risk scores?
3. Can we develop polygenic risk scores for atherosclerotic cardiovascular disease and related risk factors for diverse populations?
4. What is the prevalence of genetic cardiovascular disorders in the population? Are there disparities in the recognition of these disorders between populations?

Scientific Approaches

Study methodology will include:
1. Prospective cohort analyses for incident atherosclerotic cardiovascular with subgroup analyses for distinct populations. Cox-proportional hazard models will be used for time-to-event analyses and logistic regression will be used for case-control status. Models will be adjusted for covariates such as age, sex,

2. Comparison of observed versus predicted risk of atherosclerotic cardiovascular risk using clinical risk tools such as the Pooled Cohort Equations, QRISK3, and the modified Framingham risk score.

3. Comparison of the genetic architecture of optimized polygenic risk scores in commonly represented and underrepresented populations

4. Case-control and continuous genetic association studies for ASVCVD and related risk factors.

Anticipated Findings

The goals of this study’s outcomes are to:

1. Identify potential disparities and gaps in the current practice of atherosclerotic cardiovascular disease between study populations.

2. Develop tools to improve the prediction of atherosclerotic cardiovascular risk in underrepresented populations so that these individuals can be appropriately identified for preventative medicine.

3. Develop polygenic risk scores for atherosclerotic cardiovascular disease and related risk factors in underrepresented populations. This outcome is critical to improve the equitability of genomic medicine in clinical practice.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Mark Trinder - Graduate Trainee, The Broad Institute

Collaborators:

  • Romit Bhattacharya - Research Fellow, The Broad Institute

Asian Americans and Type 2 Diabetes

Project Purpose(s)

  • Disease Focused Research (diabetes mellitus) ...

Scientific Questions Being Studied

Despite having a lower body mass index (BMI) than other racial/ethnic groups, Asian Americans are more likely to develop Type 2 diabetes. In fact, Asian Americans have a 60% higher risk of Type 2 diabetes than non-Hispanic whites. Given the growing public health attention on the elevated risk of type 2 diabetes among Asian Americans, clinicians are expected to intensify efforts to screen this population. In 2015, the American Diabetes Association (ADA) changed the BMI cut point for screening Asian Americans for prediabetes and type 2 diabetes to 23 kg/m2 (vs. 25 kg/m2) based on the evidence that this population is at an increased risk for diabetes at lower BMI levels relative to the general population. In this study, we aim to exam the prevalence of type 2 diabetes at a BMI of ≥ 23 kg/ m2 in the Asian American cohort of the All of Us Research Program. Are Asian Americans more likely to develop Type 2 diabetes at a lower BMI?

Scientific Approaches

First, we will select the Asian American cohort from the demographic dataset. We will then select a subset of participants who had a medical condition of Type 2 diabetes, or diabetes, or glycated hemoglobin greater than or equal to 6.5%, or serum glucose greater than or equal to 126 mg/dL. We will use their height and weight under the physical measurements to calculate their BMI. We will exam the prevalence of Type 2 diabetes for BMI < 23, BMI 23-25, and BMI >25, as well as by gender and age group (18-44 years old, 45 -64 years old, and > 65 years old). Furthermore, we will assess their employment status and health insurance status.

Anticipated Findings

We expect to find a higher number of Asian Americans diagnosed with Type 2 diabetes at a lower BMI, below the recommended screening cutpoint BMI >25. We also expect to find a higher number of Asian Americans diagnosed with Type 2 diabetes at a younger age, before the recommended screening age 45 years old.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Fornessa Randal - Senior Researcher, Asian Health Coalition

Collaborators:

  • Helen Lam - Senior Researcher, Asian Health Coalition

Asian Represnetation

Project Purpose(s)

  • Other Purpose (Describe the demographic characteristics of the Asian American cohort in the All of Us Research Program.) ...

Scientific Questions Being Studied

The ability of the ALL of Us Research Program to collect a wide range of patient information is critical, so is to increase the number and the portion of racial/ethnic minority participants. Without representation from diverse racial/ethnic minority populations to ensure the generalization of findings, the advance of precision medicine will only magnify health disparities. Asian Americans are one of the fastest-growing racial/ethnic groups. Unlike other racial/ethnic minority groups, the diversity and complexity of the Asian Americans are remarkable and represent more than 30 different. Thus, the specific aim of this study to find out whether the Asian American cohort in the All of Us Research Program reflects the general Asian population in the U.S.

Scientific Approaches

First, we will select the Asian American cohort from the demographic dataset. We will then summarize the demographic characteristics (age, gender, educational attainment, and annual household income) of the Asian American cohort using the self-reported data from the Participant Provided Information dataset. Finally, we will compare the finding to the 2013-2017 American Community Survey 5-year estimates.

Anticipated Findings

We expect that the Asian American cohort in the All of Us Research Program will be younger and have higher educational attainment than the general Asian population in the U.S.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Helen Lam - Senior Researcher, Asian Health Coalition

Collaborators:

  • Paula Lozano - Early Career Tenure-track Researcher, University of Chicago
  • Fornessa Randal - Senior Researcher, Asian Health Coalition
  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Francis Ratsimbazafy - Other, All of Us Program Operational Use

Asthma

Project Purpose(s)

  • Disease Focused Research (Asthma)
  • Methods Development ...
  • Ancestry

Scientific Questions Being Studied

Exploring diversity of childhood onset asthma. Would like to see the prevalence of the condition in the current dataset.

Scientific Approaches

Would like to explore genetic associations with childhood onset asthma.

Anticipated Findings

Identifying population specific risk factors for asthma.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Keoki Williams - Late Career Tenured Researcher, Henry Ford Health System

Asthma and COPD Demonstration Project

Project Purpose(s)

  • Disease Focused Research (Asthma, COPD)
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy.) ...

Scientific Questions Being Studied

Asthma and Chronic Obstructive Pulmonary Disease (COPD) are the two most common respiratory diseases. Therefore, understanding the characteristics and frequency of participants with either of these diseases in the US by analyzing the All of Us database has public health implications. We will analyze the relationship of diagnosis for each disease to known risk factors and covariates including gender, age, race, BMI, smoking status, level of education and common comorbidities.

Scientific Approaches

Standard statistical analysis will be used to determine the frequency of asthma or COPD diagnoses in the EHR and surveys in relationship to gender, age, BMI, race, educational status as a reflection of socioeconomic status (SES), regions of the country, co-morbidities and smoking status. Results will be compared to published data including the CDC. Since COPD is under diagnosed in the US, we will compare the demographics and health characteristics of participants with a significant history of smoking (but no diagnosis of COPD) to participants with a COPD diagnosis. In addition, we will examine the medication data in the EHR for comparison with the published guidelines for treatment of both of these diseases.

Anticipated Findings

We anticipate a higher frequency of asthma in women compared men but that men are more likely to have reported that asthma started in childhood. In addition, we anticipate a higher frequency of asthma and less medication use in African-Americans compared to non-Hispanic whites. Since COPD is mainly due to cigarette smoking, we anticipate that participants with a diagnosis of COPD are more likely to be men of middle to older age with a significant history of tobacco exposure. In addition, we expect that a proportion of participants with a significant history of smoking will not have a diagnosis of COPD either by EHR or survey but may report symptoms of airways disease. Also, the degree of misclassification of the two diseases will be determined; for example, a young adult with a limited degree of tobacco exposure diagnosed as COPD is more likely to have asthma. In both diseases, we expect an increase frequency of common comorbidities including increased BMI.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Lisa White - Project Personnel, University of Arizona

Collaborators:

  • Deborah Meyers

Atrial Fibrillation and Race

Project Purpose(s)

  • Disease Focused Research (atrial fibrillation) ...

Scientific Questions Being Studied

Exploring number of minorities who have atrial fibrillation anywhere in the electronic health record.

Scientific Approaches

Gathering total number of people who have had atrial fibrillation within the database and then looking at the number of people from each race with AF. Also looking at inflammation.

Anticipated Findings

Prior research has found that minorities, particularly African Americans and Hispanics, have lower risk for AF. Looking at why this is.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Lisa White - Project Personnel, University of Arizona

Automated Phenotyping

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

To compare existing automated phenotyping algorithms, compare their performance across different cohorts over different diseases, and develop novel automated phenotyping algorithms thereof.

Scientific Approaches

Traditional statistical and epidemiological methods as well as machine learning methods applied on patients' EHR.

Anticipated Findings

Models that predict or categorize disease likelihood or score patients for a specific disease. Identified disease subtypes based off of those models.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Henry Zheng - Graduate Trainee, University of California, Los Angeles

Back Pain in women

Project Purpose(s)

  • Other Purpose (Training for new position with research support.) ...

Scientific Questions Being Studied

back pain in pregnant women - using as a demo for practice using the workbench tools

Scientific Approaches

learning to use workbench tools for new role with research support

Anticipated Findings

I will learn how to use the tools so that I may assist our researchers with questions in the future

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Melissa Patrick - Project Personnel, All of Us Program Operational Use

Barrett's Esophagus

Project Purpose(s)

  • Other Purpose (To conduct research to understand the demographics, risk factors, and outcomes of people diagnosed with Barrett's esophagus. The purpose of this research is to increase our scientific understanding of this condition with the hope of improving management and potential risks. ) ...

Scientific Questions Being Studied

To conduct research to understand the demographics, risk factors, and outcomes of people diagnosed with Barrett's esophagus. The purpose of this research is to increase our scientific understanding of this condition with the hope of improving management and potential risks. This research is exploratory in nature at this point. Specific hypotheses to be tested will be added here at a later date and will be fully outlined before any investigation of such has begun.

Scientific Approaches

The primary dataset will be a Barrett's esophagus cohort. This may be compared with a matched cohorts from the general AOU population or individuals with gastroesophageal reflux, for example. Descriptive statistics and regression models will be used to assess demographics, risk factors and risks of Barrett's esophagus. This research is exploratory in nature at this point. Specific hypotheses to be tested will be added here at a later date and will be fully outlined before any investigation of such has begun.

Anticipated Findings

This research is exploratory in nature at this point. Anticipated findings and contributions to scientific knowledge will become more clear when specific hypotheses to be tested are added here at a later date.

Demographic Categories of Interest

  • Age
  • Sex at Birth
  • Geography
  • Education Level
  • Income Level

Research Team

Owner:

  • Michael Cook - Mid-career Tenured Researcher, NIH

Blood Pressure Audit

Project Purpose(s)

  • Disease Focused Research (hypertension)
  • Ancestry ...

Scientific Questions Being Studied

To explore the genetic determinants of blood pressure

Scientific Approaches

GWAS for consortium inclusion. Examining systolic, diastolic, and pulse pressure phenotypes.

Anticipated Findings

Adding to SNP associations for blood pressure traits, to be included in meta-analysis with other large data sets

Demographic Categories of Interest

  • Race / Ethnicity
  • Sex at Birth

Research Team

Owner:

  • Jacob Keaton - Research Fellow, NIH

BMI_test

Project Purpose(s)

  • Disease Focused Research (metabolic dysfunction as a comorbidity of cancer)
  • Population Health ...
  • Ancestry

Scientific Questions Being Studied

Obesity as a phenotype is heterogeneous. In the study of obesity-related cancer, markers of metabolically dysfunctional obesity (lipid, adipokine, etc.) will alter the association between exposures and outcome.

Scientific Approaches

I will use the All of US data set v3 to explore the link between BMI, weight, and other markers of obesity and the occurrence of cancer.

Anticipated Findings

I anticipate that exposure variables will be differential between healthy obese and unhealthy obese, and that these differences will have an effect upon cancer. these findings may help to serve as markers of high-risk obesity as a comorbidity in cancer, and have the potential for dietary intervention.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Education Level
  • Income Level

Research Team

Owner:

  • Michael Behring - Research Fellow, University of Alabama at Birmingham

BMIF 7391

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

I am exploring this data as part of a graduate course on clinical and translational biomedical informatics. Currently, we are learning about existing tools for cohort building. The AOU workbench provides a unique opportunity to gain hands-on experience working with a cohort building tool.

Scientific Approaches

I will use the workbench to explore the different tools, processes, and data available.

Anticipated Findings

I am hoping to learn about the design and implementation of this workbench and how these factors might promote collaboration and innovation in the future.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Leigh Anne Tang - Graduate Trainee, Vanderbilt University

BMIF 7391 Clinical Informatics Course Workspace

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

I will be using this workspace to explore tools, data, and processes across the All of Us cohort as part of a graduate-level course on biomedical informatics. I would also like to understand the space of the data and the types of questions that can be formulated using it, pursuant to a more-developed class project later in the semester.

Scientific Approaches

I anticipate using a broad range of datasets and tools as I work on assignments for my course. I am interest in qualitative as well as quantitative methods, and would like to make more use of various machine learning packages available through Python to increase my familiarity with them.

Anticipated Findings

As I will be using this workspace for educational purposes, I do not have a defined set of findings; I expect, however, that learning how to best leverage the AoU dataset will contribute to my skills as a researcher and data scientist, and help me construct future studies using AoU data that will contribute to the general body of scientific knowledge.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Kimberley Kondratieff - Graduate Trainee, Vanderbilt University

BMIF Graduate Course

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

I am exploring the All of Us workbench as part of coursework in biomedical informatics at the graduate level.

Scientific Approaches

I will use the tools, processes, and data available through the workbench to review longitudinal characteristics for the cohort of All of Us participants, such as BMI trends and survey responses.

Anticipated Findings

I anticipate this will guide my education in how to use All of Us Data. This educational experience will help me design future projects that may utilize All of Us data to gain insights into drug safety and pharmacogenomics.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Aileen Wright - Research Fellow, Vanderbilt University Medical Center

Body Temperature 121719

Project Purpose(s)

  • Educational
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

As a demonstration project, this study will will examine the distribution of normal body temperature by sex, age, and race from data available in the All of Us research dataset. The results will be compared to those reported in several other large epidemiologic studies to see if the same established patterns of these distributions are demonstrable in the All of Us dataset. Specific questions include:

1. What are the distributions of normal body temperature demonstrated in the All of Us research dataset by sex, age and race?
2. How do the distributions of normal body temperature demonstrated in the All of Us research dataset compare with those reported in other large epidemiologic studies?

Scientific Approaches

To examine the distributions of normal body temperature in different age, sex, and race groups represented in the All of Us research dataset, we will identify normal body temperatures as the lowest oral temperature recorded for each individual with a body temperature value in the dataset, stratify the cohort by sex, race, and age at time of temperature recording (in 10 year intervals). The mean, standard deviation, median, and interquartile ranges of temperatures will be determined for each strata.
Sources for the data required for the analysis include:
EHR Oral Body Temperature values and date of measurement
Participant Provided Information (PPI) for date of birth
Participant Provided Information (PPI) for demographics

Anticipated Findings

For this study, we anticipate that we will be able to replicate the previously established patterns in distributions of body temperature in adults by age, sex, and race. This will serve to demonstrate that physiologic measures derived from EHR data within the All of Us dataset are valid for epidemiologic study. It will also provide a basis for further study of body temperature trends within individuals to include investigations into possible individualized definitions of normal and febrile temperature ranges.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jill Waalen - Mid-career Tenured Researcher, Scripps Research

breast cancer demo

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

how breast cancer phenotype can be implemented in AoU

Scientific Approaches

Not available.

Anticipated Findings

That AoU has great data !

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

breast cancer

Project Purpose(s)

  • Disease Focused Research (breast cancer) ...

Scientific Questions Being Studied

breast cancer

Scientific Approaches

Not available.

Anticipated Findings

breast cancer

Demographic Categories of Interest

  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Xinzhuo Jiang - Project Personnel, Columbia University

Collaborators:

  • Noah Engel - Other, All of Us Program Operational Use

Breast Cancer

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

How breast cancer phenotype can be implemented in AoU.

Scientific Approaches

Not available.

Anticipated Findings

That AoU has great data.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Tsung-Ting Kuo - Early Career Tenure-track Researcher, University of California, San Diego

Breast Cancer

Project Purpose(s)

  • Disease Focused Research (Breast Cancer) ...

Scientific Questions Being Studied

How breast cancer phentype in AoU

Scientific Approaches

Not available.

Anticipated Findings

Phenotype

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Nyia Noel - Mid-career Tenured Researcher, Boston Medical Center

Breast Cancer Demo

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

How can breast cancer phenotypes be studied in AoU

Scientific Approaches

Not available.

Anticipated Findings

Learning about the platform

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Shashwat Deepali Nagar - Graduate Trainee, Georgia Tech

breast cancer demo

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

How breast cancer phenotypes can implemented in AoU

Scientific Approaches

Not available.

Anticipated Findings

breast cancer phenotypes

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jihoon Kim - Project Personnel, University of California, San Diego

Breast Cancer Demo

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

How breast cancer is diagnosed in different ancestral populations

Scientific Approaches

Not available.

Anticipated Findings

Diagnoses are detected at later sages in URM populations

Demographic Categories of Interest

Not available.

Research Team

Owner:

  • Cenai Zhang - Project Personnel, Cornell University

Collaborators:

  • Margaret Ross - Late Career Tenured Researcher, Cornell University

Breast cancer demo

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

That AoU has great data

Scientific Approaches

Not available.

Anticipated Findings

xxx

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Elizabeth Karlson - Late Career Tenured Researcher, Massachusetts General Hospital

Breast Cancer Demo

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

Learning

Scientific Approaches

Not available.

Anticipated Findings

Learning

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • King Jordan - Mid-career Tenured Researcher, Georgia Tech

Breast Cancer Phenotype Tutorial (mkn)

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

following guided breast cancer tutorial

Scientific Approaches

Not available.

Anticipated Findings

following guided breast cancer tutorial

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Mary Nielsen - Project Personnel, University of Arizona

Breast Cancer prevalence

Project Purpose(s)

  • Disease Focused Research (breast cancer) ...

Scientific Questions Being Studied

Breast Cancer prevalence

Scientific Approaches

Not available.

Anticipated Findings

Breast Cancer prevalence

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jihoon Kim - Project Personnel, University of California, San Diego

Breast Cancer Treatment

Project Purpose(s)

  • Disease Focused Research (breast cancer)
  • Methods Development ...

Scientific Questions Being Studied

We to apply machine learning and optimization to the personalization of treatment sequences for patients with metastatic breast cancer. There are several decisions that must be made when designing a treatment regimen for an individual, such as which treatment to start with, when to change treatments, and how to account for response uncertainty.  We hypothesize that a patient's response trajectory is dependent on not only the regimens used for treatment, but also the sequence of these regimens. By applying machine learning, we can accurately predict how a patient might respond to different regimens based on their clinical features. We can then use this knowledge to select the most promising treatment sequences.

Scientific Approaches

We leverage state-of-the-art machine learning algorithms to gain insight into predictors of treatment response length, fitting separate models for distinct regimens. We create a feature space of patient demographics, disease characteristics, clinical variables, and treatment regimens and define our outcome variable as the duration of the regimen. We will train various algorithms to predict regimen duration and determine the best modeling approach with consideration of both quantitative performance and clinical interpretability. We then couple the regimen-specific machine learning models with mixed-integer optimization to identify the best sequence of treatment regimens for an individual.

Anticipated Findings

The All of Us dataset will allow us to gain a better understanding of treatment response on an individualized level through synthesis of a large, diverse cohort of patients with metastatic breast cancer. We hope to get insights into clinical indicators of treatment success and the interdependence of successive treatment regimens. Ultimately, we hope to propose personalized treatment recommendations that can improve treatment duration and success.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Holly Wiberg - Graduate Trainee, Massachusetts Institute of Technology

breast cancer tutorial

Project Purpose(s)

  • Population Health
  • Social / Behavioral ...

Scientific Questions Being Studied

epidemiology of breast cancer

Scientific Approaches

Not available.

Anticipated Findings

disparities

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Lizette Mendez - Project Personnel, Boston Medical Center

BreastCancerDemo

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

How breast cancer phenotype can be implemented in AoU

Scientific Approaches

Not available.

Anticipated Findings

That AoU has great data

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Luca Bonomi - Research Fellow, University of California, San Diego

Burden of mental health among US adults with and without hypertension

Project Purpose(s)

  • Disease Focused Research (hypertension)
  • Population Health ...

Scientific Questions Being Studied

Mental health diseases contribute to a significant proportion of disease burden and are a leading cause of years lived with disability globally. Mental disorders are risk factors for a number of chronic diseases. In addition, poor mental health status is more prevalent among women than men. We aim to examine the prevalence of mental problems among US adults with and without hypertension disorders. We will also stratify by gender and other demographic factors. including age, race/ethnicity, income.

Scientific Approaches

We will first identify participants who answered the following questions from the overall health questionnaire.
-PPI1585729. In general, how would you rate your mental health, including your mood and your ability to think?
-PPI1585760. In the past 7 days, how often have you been bothered by emotional problems such as feeling anxious, depressed or irritable?

Then we will extract the conditions of hypertension disorders from the EHR condition domain. We will examine the prevalence of mental disorders among participants with and without hypertension disorders. We will also stratify the population by gender, age.

Anticipated Findings

We expect to find a higher prevalence of mental disorders among participants with hypertension, especially among women with hypertension.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth

Research Team

Owner:

  • Xiang Li - Research Fellow, Tulane University

Collaborators:

  • Christopher Pottle - Project Personnel, Tulane University

Burden of mental heatlh US adults with and without hypertension

Project Purpose(s)

  • Disease Focused Research (hypertension)
  • Population Health ...

Scientific Questions Being Studied

Mental health diseases contribute to a significant proportion of disease burden and are a leading cause of years lived with disability globally. Mental disorders are risk factors for a number of chronic diseases. In addition, poor mental health status is more prevalent among women than men. We aim to examine the prevalence of mental problems among US adults with and without hypertension disorders. We will also stratify by gender and other demographic factors. including age, race/ethnicity, income.

Scientific Approaches

We will first identify participants who answered the following questions from the overall health questionnaire.
-PPI1585729. In general, how would you rate your mental health, including your mood and your ability to think?
-PPI1585760. In the past 7 days, how often have you been bothered by emotional problems such as feeling anxious, depressed or irritable?

Then we will extract the conditions of hypertension disorders from the EHR condition domain. We will examine the prevalence of mental disorders among participants with and without hypertension disorders. We will also stratify the population by gender, age.

Anticipated Findings

We expect to find a higher prevalence of mental disorders among participants with hypertension, especially among women with hypertension.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth

Research Team

Owner:

  • Xiang Li - Research Fellow, Tulane University

BWH_HTN

Project Purpose(s)

  • Disease Focused Research (hypertension) ...

Scientific Questions Being Studied

What is the prevalence of hypertension (HTN) defined using an electronic health record definition from eMERGE among UBR groups defined by race/ethnicity, income and education?

Do treatment patterns for HTN (using medication sequencing analysis) vary by UBR groups defined by race/ethnicity, income and education, and in geographic regions based on grouping states?

Scientific Approaches

Not available.

Anticipated Findings

There may be disparities in HTN across racial and income groups of policy interest.

Demographic Categories of Interest

  • Income Level

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital
  • Elizabeth Karlson - Late Career Tenured Researcher, Massachusetts General Hospital

Cancer

Project Purpose(s)

  • Population Health ...

Scientific Questions Being Studied

We intend to explore the difference in the prevalence of cancer between the AoU population. In particular, we will be looking at the difference between the entire population, the subset with medical records, and the subset with self-reported data.

Scientific Approaches

We intend to select a list of SNOMED codes corresponding to primary cancers to get the subset with cancer in the medical record

We intend to select the survey question asking about self-reported cancer to get the subset with self-reported cancer

Anticipated Findings

We expect the difference of cancer to vary between self-report and medical record, which could have implications for how cancer is measured on a population-level.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Paul Zakin - Project Personnel, University of Chicago

Collaborators:

  • Sameep Shah - Project Personnel, University of Chicago

Cancer Prevalence and Family History

Project Purpose(s)

  • Disease Focused Research (cancer)
  • Population Health ...
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)

Scientific Questions Being Studied

As a demonstration project, we seek to understand the regional, demographic, and family history characteristics in prevalence and incidence of both solid and hematologic (blood) cancers. Our questions are:
1. How do rates of cancer differ based on the self-report (participant provided information, or “PPI”) and electronic health records.
2. Are characteristics related to cancer similar or different among the people represented in All of Us compared to other national cohorts.

Scientific Approaches

We will analyze all types of cancer in adults to compare incidence (new cases) and prevalence (current levels in the population). We will compare our results with the published information from SEER national cancer registry and national surveys (e.g., National Health Information Survey). We will analyze socio-demographics and geographic differences.

We will identify cancer cases based on self-report from PPI individual medical history survey and from diagnosis codes plus lab results from E.H.R.
We will identify family history of cancer form the PPI family medical history survey.
We will map the cancer categories by physiologic site used for the SEER registry to the SNOMED and ICD codes used in the EHR and to the cancer conditions in the PPI. We use Jupyter notebooks to generate reusable code for the mapping.

Anticipated Findings

For this study, we anticipate we will be able to replicate the relative prevalence and incidence rates of cancer and family history of cancer. This will serve to demonstrate the quality and utility of All of Us data and tools for conducting epidemiologic analyses.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Education Level
  • Income Level

Research Team

Owner:

  • Jihoon Kim - Project Personnel, University of California, San Diego

Collaborators:

  • Paulina Paul - Project Personnel, University of California, San Diego
  • Katherine Kim - Early Career Tenure-track Researcher, University of California, Davis

Cancer Rates

Project Purpose(s)

  • Disease Focused Research (cancer)
  • Population Health ...
  • Methods Development
  • Ancestry

Scientific Questions Being Studied

Distribution, comparison to SEER, and consideration of the feasibility of ascertainment from EHR.

Scientific Approaches

Not available.

Anticipated Findings

Understanding of demographic differences in case ascertainment and differences by PPI and EHR.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Brisa Aschebrook-Kilfoy - Mid-career Tenured Researcher, University of Chicago

Cancer Survivorship

Project Purpose(s)

  • Disease Focused Research (Complications arise from cancer and cancer-treatment)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

With improvements in the early detection and treatment of malignancies, the number of cancer survivors is anticipated to increase dramatically over the next decade. However, cancer survivors are often at risk of long-term morbidities, related directly to the cancer itself, to pre-existing co-morbidities and especially to their anti-cancer therapies. Toxicities from anti-cancer treatment are detrimental and survivors are often plagued by a wide range of treatment-induced toxicities that lead to functional impairment at significant economic, emotional and social cost. Clearly, the healthcare needs of survivors are tremendous during survivorship. This study is designed to investigate the predictors of long-term morbidities in cancer survivors, with a focus on clinical, demographic and genetic predispositions.

Scientific Approaches

A cohort of cancer survivors will be nested using the database. Various clinical, demographic and genetic variables will be evaluated on their associations with complications in cancer survivors. Complications of cancer survivors are investigated using a number of outcomes, including patient-related outcomes, healthcare utilizations (unnecessary admissions and medication use), as well as economic outcomes. Factors associated with higher risks for complications of cancer treatment will be investigated.

Anticipated Findings

It is anticipated certain clinical outcomes (such as comorbidities as well as certain cancers) will prone survivors to higher risks of having complications. This will allow interventional studies or prevention studies that can be designed to cater for specific patient populations.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Alexandre Chan - Late Career Tenured Researcher, University of California, Irvine

Collaborators:

  • Ding Quan Ng - Graduate Trainee, University of California, Irvine
  • Stanley Jia - Undergraduate Student, University of California, Irvine

Cardiovascular Risk Prediction

Project Purpose(s)

  • Disease Focused Research (arteriosclerotic cardiovascular disease)
  • Population Health ...

Scientific Questions Being Studied

Cardiovascular disease risk is an important determinant of how we treat patients. Preventive medications can reduce future risk, but they are associated with both financial and biological side-effects. Previous methods of predicting risk, notably the Pooled Cohort Equation (PCE), tend to perform worse in racial and ethnic minorities than in populations from European ancestry. As such, we plan to evaluate the accuracy of cardiovascular risk prediction in the diverse cohort of the All of Us study, and determine which if any variables might aid in refining risk in racial and ethnic minorities.

Scientific Approaches

Atherosclerotic cardiovascular disease (ASCVD) is defined by the American College of Cardiology and American Heart Association (ACC/AHA) includes stroke, transient ischemic attack (TIA), documented coronary artery disease (CAD) with stable angina, acute coronary syndromes (ACS), coronary or other arterial revascularization, peripheral vascular disease with or without claudication, and aortic aneurysm. In clinical practice, physicians will also include asymptomatic CAD (meaning, without angina) with demonstrable ischemia on a stress test.

We plan to utilize data from adults greater than 18 years of age in the All of Us Research study who are free of ASCVD, and have complete data for inputs in the Pooled Cohort Equation (age, gender, race, cholesterol levels, blood pressure (BP), BP medications, diabetes status, and smoking status).

We propose to then use the PCE to assess accuracy of risk prediction in the multi-ethnic All of Us cohort.

Anticipated Findings

The anticipated findings are that the PCE will not perform equally across all racial and ethnic categories. This will provide an opportunity for further refinement of risk-prediction algorithms using commonly collected clinical information.

An improved risk prediction algorithm would improve our ability to target the most needed preventive therapies to those at greatest risk of future events, and minimize harms from unnecessary therapies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Romit Bhattacharya - Research Fellow, The Broad Institute

Collaborators:

  • Sarah Urbut - Research Fellow, The Broad Institute
  • Mark Trinder - Graduate Trainee, The Broad Institute

Causal Inference using EHR Data

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

Building a better risk prediction and treatment effect estimation model to help patients choosing medications. In particular, the high dimensional EHR data bring both opportunities and challenges for developing a risk prediction model (e.g. 1-year mortality or time to CVD). Our project aims to develop a general framework for building a robust and interpretable risk prediction model to identify risk factors associated with the disease of interest. This study will help us better understand the disease etiology and provide new insight for treatment selection.

Scientific Approaches

We will develop new statistical/machine learning models that will be useful to unbias estimate the heterogeneous treatment effect with high dimensional covariates. In addition, we will develop interpretable risk prediction models for disease risk prediction and stratification based on boosting or neural network models with the structural constraint to enhance interpretability.

Anticipated Findings

Our method can provide a general framework for determining individualized treatment rules using large scale observational data (EHR data) and help patients to choose between treatment choices: e.g. metformin vs insulin based on their characteristics such as age, BMI, lab test values. Since one (size) treatment may not fit for all, our method can help patients and physicians to make patient-centered and evidence-based treatment decision making.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Guanhua Chen - Early Career Tenure-track Researcher, University of Wisconsin, Madison

Childhood Obesity

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Drug Development ...

Scientific Questions Being Studied

Childhood obesity is a major public health problem across the globe as well as in the US. Childhood obesity can continue into adulthood and is known to be a major risk factor for chronic diseases such as diabetes, cancer, and cardiovascular diseases . Preventing childhood obesity has been actively pursued in pediatric programs. However, decades of rigorous research have shown that prevention and management of obesity is not easy. This is partly due to our limited understating of obesity and the complex interactions among a myriad of various factors, including biological and environmental ones, that are known to contribute to obesity. The motivation of this work is to predict childhood obesity early on and study the cause and consequences of obesity.

Scientific Approaches

We are planning to use deep machine learning models to study cause and consequences of obesity and predict childhood obesity.

Anticipated Findings

We are looking to study cause and consequences of childhood obesity. We will look into factors that will help predict childhood obesity early on. This will help prevent obesity and other chronic diseases that are the consequence of childhood obesity.

Demographic Categories of Interest

  • Age

Research Team

Owner:

  • Mehak Gupta - Graduate Trainee, University of Delaware

Collaborators:

  • Raphael Poulain - Graduate Trainee, University of Delaware

Chronic Kidney Disease

Project Purpose(s)

  • Disease Focused Research (chronic kidney disease)
  • Control Set ...

Scientific Questions Being Studied

The long-term goal of this Genome Wide Association Study (GWAS) pilot project is to identify genetic loci associated with chronic kidney disease (CKD). Although diabetes and hypertension are primary risk factors for CKD, genetic factors have also been associated with CKD-defining traits. In this study, we will examine genome sequence variation among individuals in diagnosed with CKD with those not diagnosed with CKD. Our project seeks to: 1) identify putative genetic loci that increase risk for CKD that have not been previously reported; 2) examine associations between known CKD risk loci among individuals in study; and 3) compare genetic data with other CKD studies as a way of assessing whether individuals included in the study are a representative cohort of CKD subjects. Genetic risk loci identified through this study will form the basis for basic biomedical research to define the molecular determinants of disease and provide a platform for the development of effective treatments.

Scientific Approaches

We plan to conduct a Genome Wide Association Study (GWAS) to identify genetic loci associated with chronic kidney disease (CKD). To conduct this study, we will combine de-identified data we plan to generate for 100 CKD patients from Maine with All of Us data. We plan to use All of Us data for both cases and controls. The overall statistical analysis approach consists of three components as follows. First, we will analyze distributions of relevant clinical data from participants, such as eGFR and age separated by sex. Second, we will analyze population stratification using ADMIXTURE (Shringapure et al, 2016) and PLINK (Purcell et al, 2007) to identify any subpopulations that differ genetically and if disease prevalence differs among those subpopulations (Marchini et al, 2004). Distributions of alleles among participants will be compared to reference panels in order to estimate individual ancestry proportions. For the GWAS analysis, genetic associations will be tested using PLINK.

Anticipated Findings

We seek to: 1) identify putative genetic loci that increase risk for chronic kidney disease (CKD) that have not been previously reported; 2) examine associations between known CKD risk loci among individuals in study; and 3) compare genetic data with other CKD studies as a way of assessing whether individuals included in the study are a representative cohort of CKD subjects. Genetic risk loci identified through this study will form the basis for basic biomedical research in animal models to define the molecular determinants of disease and provide a platform for the development of effective treatments.

Demographic Categories of Interest

  • Geography

Research Team

Owner:

  • Benjamin King - Early Career Tenure-track Researcher, University of Maine

Chronic pain prevalence and treatment in Hispanic subpopulations

Project Purpose(s)

  • Disease Focused Research (Chronic Pain)
  • Population Health ...

Scientific Questions Being Studied

Hispanic subpopulations vary in the prevalence of chronic disabling conditions, as well as in socioeconomic status and health care utilization. Yet few studies have examined in detail the epidemiology of pain in these subgroups. There are no studies comparing chronic pain prevalence and treatment in these subpopulations, nor studies examining the influence of race on potential associations with pain in Hispanics. The present study will provide data on the prevalence and treatment of chronic pain in racial and Hispanic populations that are often underserved by answering the following questions: 1. Are U.S. adults who self-identify as Puerto Rican more likely to report severe chronic pain than individuals of other Hispanic ancestry as well as non-Hispanic populations? 2. Does pain prevalence and treatment vary by racial identity when holding Hispanic ancestry constant? 3. Does pain prevalence and treatment vary by Hispanic ancestry when holding racial identity constant? .

Scientific Approaches

ICD codes will be used to identify participants with chronic pain. CPT codes will be used to identify treatments associated with this pain. Contingency tables were used to assess the relationships between Race*Hispanic Ancestry subpopulations with both chronic pain prevalence and treatment. Multivariable logistic regression was used to assess the relationship between pain and race/ethnicity, with race and ethnicity added to the model as both individual terms and as an interaction term (race*ethnicity). Additional covariates in the model will include at a minimum age, educational attainment, survey language, health insurance status, region of country, sex and survey year. The influence of BMI, Income/wealth, comorbidity, health behaviors, perceived discrimination, and social networks will also be explored in preliminary analysis but only included in the final stratified analyses if participant confidentiality is maintained.

Anticipated Findings

We anticipate that substantial differences in chronic pain prevalence and treatment will be seen across race*Hispanics subpopulations, and that these differences will be maintained even after adjustment for demographic and socio-economic factors.

The present study will contribute to the Institute of Medicine and DHHS National Pain Strategy calls for better national data on the prevalence of chronic in racial and ethnic minority populations and provide comprehensive data on the treatment of pain in underserved populations.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Richard Nahin - Late Career Tenured Researcher, NIH

Chronic sinusitis

Project Purpose(s)

  • Disease Focused Research (Chronic sinusitis) ...

Scientific Questions Being Studied

Looking for evidence regarding the epidemiology of chronic sinusitis, include associated comorbid conditions and need for procedures.

Scientific Approaches

Will use association statistics (correlation, t-tests) to look at the prevalence of comorbid disease with chronic sinusitis.

Anticipated Findings

Anticipate that asthma, bronchitis, allergic rhinitis, and other airway diseases will be over-represented in the CRS population.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Naweed Chowdhury - Early Career Tenure-track Researcher, Vanderbilt University Medical Center

CKD

Project Purpose(s)

  • Population Health
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use) ...

Scientific Questions Being Studied

Chronic kidney disease (CKD) is a growing public health problem that affects over 27 million adults in the US and 11% of the population worldwide. It is well established that CKD disproportionately affects racial/ethnic minorities, as well as women. Furthermore, in addition to an increased risk for kidney failure, persons with CKD experience poor quality of life and physical function, and a very high risk of morbidity and mortality. In this demonstration project, we focus on the prevalence of CKD and its awareness, treatment, and control in a large and diverse participant sample of the All of Us Research Program. Specific questions include:
1) What is the prevalence of CKD among participants in the All of Us Research Program?
2) Among CKD participants, what is the prevalence of awareness, treatment and control?
3) How do these estimates compare to the general US population assessed in the National Health and Nutrition Examination Survey (NHANES), 2015-2016?

Scientific Approaches

This descriptive analysis is based on eGFR and blood pressure measurements from the participants’ physical measurement evaluations, and data derived from participant provided information (PPI) and electronic health records (EHR).
1) Demographic factors such as age, sex, race/ethnicity, educational attainment, income and health insurance were assessed in PPI questionnaire.
2) PPI questionnaire data was also used to define self-reported doctor diagnosis of CKD, self-reported medication use.
3) EHR evidence of CKD diagnosis was defined as the presence of ICD9/ICD10 codes corresponding to CKD any time before baseline.
4) EHR evidence of CKD medication use was defined as at least 1 drug exposure to CKD medications any time before baseline.

Anticipated Findings

For this study, we anticipate that the prevalence, awareness, treatment and control of CKD will be different across demographic strata. This will help to identify health disparities and improve health equity in vulnerable populations. We also anticipate that estimates will be different between the All of Us Research program and the general US population assessed in NHANES 2015-2016. Understanding these differences will help to characterize potential selection bias and demonstrate the quality and utility of the All of Us data and tools.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Madhawa Saranadasa - Graduate Trainee, University of Illinois at Chicago

Class Workspace

Project Purpose(s)

  • Population Health
  • Educational ...

Scientific Questions Being Studied

I will be exploring the workbench as part of an applied biomedical informatics graduate course. I will be using AoU for educational purposes.

Scientific Approaches

Some of the work over the course of the semester will include review of tools, processes, and data across the longitudinal cohort of AoU participants

Anticipated Findings

Only exploratory findings for educational purposes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Lindsey Knake - Graduate Trainee, Vanderbilt University Medical Center

Clinical & Translational Research

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

I am exploring the workbench as part of an applied biomedical informatics graduate course and I will be leveraging AoU for educational purposes . Some of my work over the course of the semester will include review of tools, processes, and data across the longitudinal cohort of AoU participants.

Scientific Approaches

I am exploring the workbench as part of an applied biomedical informatics graduate course and I will be leveraging AoU for educational purposes . Some of my work over the course of the semester will include review of tools, processes, and data across the longitudinal cohort of AoU participants.

Anticipated Findings

I am exploring the workbench as part of an applied biomedical informatics graduate course and I will be leveraging AoU for educational purposes . Some of my work over the course of the semester will include review of tools, processes, and data across the longitudinal cohort of AoU participants.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Candace Crawford - Research Fellow, Vanderbilt University Medical Center

Code_share

Project Purpose(s)

  • Disease Focused Research (hypertension) ...

Scientific Questions Being Studied

What is the prevalence of hypertension (HTN) defined using an electronic health record definition from eMERGE among UBR groups defined by race/ethnicity, income and education?

Do treatment patterns for HTN (using medication sequencing analysis) vary by UBR groups defined by race/ethnicity, income and education, and in geographic regions based on grouping states?

Scientific Approaches

Not available.

Anticipated Findings

There may be disparities in HTN across racial and income groups of policy interest.

Demographic Categories of Interest

  • Income Level

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital
  • Lizette Mendez - Project Personnel, Boston Medical Center
  • Confidence Achilike - Project Personnel, Boston Medical Center
  • Nyia Noel - Mid-career Tenured Researcher, Boston Medical Center

Cognitive Impairment

Project Purpose(s)

  • Disease Focused Research (cognitive impairment) ...

Scientific Questions Being Studied

Our hypothesis is that a machine learning tool can predict which patients are at risk for developing cognitive impairment in the future.

Scientific Approaches

We will use demographic data (age, sex) as well as clinical data (labs, vitals) and diagnosis codes.

Anticipated Findings

We anticipate that we will be able to predict cognitive impairment within a 10-year horizon. This work may be used to develop an integrated tool that will improve modifiable risk factors for patients at high risk of cognitive impairment.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Thomas Byrd, IV - Early Career Tenure-track Researcher, Northwestern University

Connections between Depression and Autoimmune diseases

Project Purpose(s)

  • Disease Focused Research (Depressive and autoimmune disorders (lupus, celiac, rheumatoid arthritis, IBD)) ...

Scientific Questions Being Studied

The goal of this study is to examine the correlations between depressive disorder and various inflammatory autoimmune diseases. We will see whether the presence of depressive disorder can increase the likelihood of the same person developing an autoimmune disease. Additionally, we will look for common lifestyle or environmental factors that those in the cohorts of depressive disorder are exposed to, and whether this will also increase the likelihood of developing an autoimmune disorder.

Scientific Approaches

EHRs
Surveys of environmental exposures
Statistical analysis using python
Other lifestyle or environmental triggers that make an autoimmune disease more likely with those with a history of mental illness

Anticipated Findings

With this information, we will be able to monitor or test for autoimmune diseases in those who have depressive disorder. We will also be able to recommend lifestyle or environmental changes to those with depression, in hopes to preventatively decrease their risk of autoimmune disease.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Catherine Hubbard - Undergraduate Student, University of California, Irvine

Collaborators:

  • Steven Hiek - Project Personnel, University of California, Irvine
  • Argyrios Ziogas - Late Career Tenured Researcher, University of California, Irvine

Covid

Project Purpose(s)

  • Disease Focused Research (Covid) ...

Scientific Questions Being Studied

Understanding factors of social determinants of health and medical conditions association with covid.

Scientific Approaches

Statistical and descriptive analysis of SDOH and medical comorbidities to understand Covid prevalence.

Anticipated Findings

It will help us further understand Covid spread and the factors which influence susceptibility to covid and / transmission of covid.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level
  • Others

Research Team

Owner:

  • Yalini Senathirajah - Early Career Tenure-track Researcher, University of Pittsburgh

COVID-19 Test Variation

Project Purpose(s)

  • Disease Focused Research (COVID-19)
  • Educational ...
  • Methods Development

Scientific Questions Being Studied

I wish to examine the temporal patterns of SARS-CoV-2 test positivity and negativity to distinguish recrudescence, recurrence and reinfection.

Scientific Approaches

I will be using laboratory results and looking for markers of disease severity among medications, problem lists, procedures, etc.

Anticipated Findings

I expect to be able to develop phenotype definitions for resolution, recurrence, recrudescence and reinfection that can be used by others studying patients with COVID-19

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • James Cimino - Late Career Tenured Researcher, University of Alabama at Birmingham

CP descriptives_practice

Project Purpose(s)

  • Disease Focused Research (Cerebral palsy) ...

Scientific Questions Being Studied

This project is to determine if adults with cerebral palsy (CP) are able to be identified along with their gross motor function classification system or ambulatory ability. This project is an exploration if adult CP research can be performed adequately using All of Us.

Scientific Approaches

Adults 18+ years with CP and exploration of count and responses to surveys, anthropometrics, biospecimen data availability, medications, and co-occurring morbidities.

Anticipated Findings

This is exploratory; no anticipated findings. There are many clinical knowledge gaps that exist regarding healthful aging for adults with CP. By knowing if an adult CP sample with adequate information can be obtained from All of Us, future research can be designed using the All of Us database.

Demographic Categories of Interest

  • Disability Status

Research Team

Owner:

  • Daniel Whitney - Early Career Tenure-track Researcher, University of Michigan

CRI_course

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

My purposes are primarily educational. As part of an applied biomedical informatics graduate course, I plan to become familiar with the workbench and processes, tools, and data of AoU participants.

Scientific Approaches

I plan to use a variety of tools, including NLP and statistical methods. The research methods will focus on statistical case-control analysis.

Anticipated Findings

The results of the study will be increased personal knowledge of applied biomedical informatics techniques.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Thomas Brown - Graduate Trainee, Vanderbilt University

CVD Risk Factor Profiles

Project Purpose(s)

  • Disease Focused Research (Cardiovascular Disease Risk Factors) ...

Scientific Questions Being Studied

The purpose of this research effort is to explore cardiovascular risk factor profiles among diverse racial ethnic groups within the All of Us sample. Hypothesis is that Latinos will have a worse cardiovascular risk factor profile compared to other racial ethnic groups. These findings would be important from a public health perspective because they will identify which racial ethnic groups are in greatest need of clinical and public health interventions to control the risk factors.

Scientific Approaches

We will analyze the frequency of traditional cardiovascular risk factors, defined as serum cholesterol >240 mg/dL or taking cholesterol lowering medication; systolic blood pressure >140 mmHg or diastolic blood pressure >90 mmHg or taking antihypertensive medication; current cigarette smoking; body mass index >30.0 kg/m2; diabetes mellitus; and ECG abnormalities. We will create risk scores for each of the diverse racial ethnic groups by counting the risk factors within each group and then make comparisons to test our hypothesis.

Anticipated Findings

We anticipate that Hispanics will have a worse cardiovascular risk profile in comparison to other racial ethnic groups and we will identify those risk factors that are of greatest importance to that particular racial ethnic group. This will inform both clinical and public health practitioners on which populations in which risk factors are of greatest importance in their community.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Education Level
  • Income Level

Research Team

Owner:

  • Gregory Talavera - Late Career Tenured Researcher, San Diego State University

Collaborators:

  • Margaret Pichardo - Graduate Trainee, Yale University
  • Tania Pena Ortiz - Undergraduate Student, New York University
  • Maria Lopez-Gurrola - Project Personnel, San Diego State University
  • Jonathan Helm - Early Career Tenure-track Researcher, San Diego State University
  • Catherine Pichardo - Graduate Trainee, University of Illinois at Chicago