Research Projects Directory

Research Projects Directory

Information about each research project within the Workbench is available in the Research Projects Directory below. Approved researchers provide their project’s research purpose, description, populations of interest and more. This information helps All of Us ensure transparency on the type of research being conducted.

At this time, all listed projects are using data in the Registered Tier. The Registered Tier contains individual-level data from electronic health records, survey answers, and physical measurements. These data have been altered to protect participant privacy.

Note: Researcher Workbench users provide information about their research projects independently. Any views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program.

Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

There are currently 291 active workspaces. This information was updated on 12/5/2020.

Sort By Title:

D014 - Opioids

Project Purpose(s)

  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

As a demonstration project, this study will present the results of prevalence of opioid use in the United States. Specific questions include:

1. What is the prevalence of prescription opioids received from healthcare systems?
2. What is the prevalence of opioids misuse including nonmedical prescription opioids use and street opioid use?
3. Data in both previous questions will also be stratified by geographic region

Scientific Approaches

We will identify prevalence of opioid use in two ways and stratified by state.
First, we use EHR Drug Exposures to capture use of prescription opioid.
Second, we use lifestyle survey questionnaire to capture substance use reported by patients themselves:
1. In your LIFETIME, which of the following substances have you ever used?
2. In the PAST THREE MONTHS, how often have you used this substance?
The prevalence will be stratified by state, therefore EHR Observation Table will be used to capture this information.

Anticipated Findings

For this study, we anticipate that we will be able to replicate previous national studies of estimating prevalence of opioids. All of Us workbench research data also provides an alternative tool for assessing prevalence rate of substance use and prescription opioids for US population.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Hsueh-Han Yeh - Research Associate, Henry Ford Health System

D015-housing

Project Purpose(s)

  • Population Health
  • Other Purpose (The data can provide evidence of AOU ability to replicate findings around social determinants and the ability to identify vulnerable populations in our cohort.) ...

Scientific Questions Being Studied

What is the prevalence of housing insecurity among current participants in the All of Us study? What individual-level factors are related to housing insecurity, including demographics, indicators of health care access, and perceived health status?

Scientific Approaches

We will determine the prevalence of housing insecurity in the All of Us study sample using data collected in the Basics module (“worried or concerned about not having a place to live”). We will use housing insecurity as the dependent variable in a multivariate analysis to determine the relationship of healthcare access and health services utilization. Finally, we will report the independent relationship between housing insecurity and healthcare access, adjusting for the covariates and conducting stratified analyses as appropriate.

Anticipated Findings

Recently, investigators examined the relationship of housing insecurity using the 2011-2015 BRFSS and found a 12.6% prevalence among the >228,000 in the study sample. All of Us can replicate these findings among its core participants using questionnaire items similar to those used by investigators.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Amy Tang - Early Career Tenure-track Researcher, Henry Ford Health System

D027-MS

Project Purpose(s)

  • Disease Focused Research (multiple sclerosis)
  • Other Purpose (Provide evidence of AOU ability to replicate findings on the prevalence and demographics of MS ) ...

Scientific Questions Being Studied

Objective: Determine the prevalence, demographics and regional distribution of multiple sclerosis (MS) in the All of Us Research Program?

Scientific Approaches

Study population: All of Us Research Program participants who have given access to their electronic health record information and who have answered the Basics survey, and who have answered Personal Medical History survey.

Data analysis: We will determine the prevalence of multiple sclerosis in the All of Us Research Program electronic medical record data and personal medical history survey with three different cohorts: patients had EHR only, survey only and both EHR and Survey. Those data will then be stratified by age, sex, race/ethnicity and region as self-reported in the Basics PPI survey.

Anticipated Findings

We anticipate that the AoURP will have prevalence and demographics of MS as recent previous studies. We further anticipate that findings regarding MS in AoURP participants' EHR will be similar to those in the survey data.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Cathryn Peltz - Other, Henry Ford Health System

Collaborators:

  • Amy Tang - Early Career Tenure-track Researcher, Henry Ford Health System

D029

Project Purpose(s)

  • Disease Focused Research (cardio vascular disease, cancer (all types), diabetes )
  • Population Health ...

Scientific Questions Being Studied

The overall goal of this project is to examine whether there is evidence of the Latino Epidemiological Paradox within the All of Us Research Project (AoURP) cohort.
. In this proposal, we will perform analysis that would seek to examine this phenomenon. We will address the following aims:
• Specific Aim 1. To determine whether Latinos have lower prevalence of gender stratified age-adjusted CVD versus NHWs and non-Hispanic blacks in the cohort.
• Specific Aim 2. To determine whether Latinos have lower prevalence of gender stratified age-adjusted cancer (overall) versus NHWs and non-Hispanic blacks in the cohort
• Specific Aim #3. To determine whether Latinos have higher prevalence of gender stratified age-adjusted diabetes and obesity (overall) versus NHWs and non-Hispanic blacks in the cohort
• Specific Aim #4: To extent possible examine differences by Latino subgroups and among foreign born versus US born Latinos.

Scientific Approaches

Not available.

Anticipated Findings

to determine whether there is evidence of the Latino epidemiological paradox in the AoURP cohort.

Demographic Categories of Interest

Not available.

Research Team

Owner:

  • olveen carrasquillo - Late Career Tenured Researcher, University of Miami

D16_HTN_revision_after_code_review

Project Purpose(s)

  • Other Purpose (This work is an AoU demo project. Demo projects are efforts by the AoU Research Program designed to meet the program goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. As an approved demo project, this work was reviewed and overseen by the AoU Research Program Science Committee and the AoU Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use. ) ...

Scientific Questions Being Studied

We are using the All of Us Researcher Workbench interface to answer the question, "Is hypertension prevalence in the All of Us Research Program similar to hypertension prevalence in the 2015–2016 National Health and Nutrition Examination Survey (NHANES) ?". Clinical approaches to understanding and treating hypertension may benefit from the integration of a precision medicine approach that integrates data on environments, social determinants of health, behaviors, and genomic factors that contribute to hypertension risk. Hypertension is a major public health concern and remains a leading risk factor for stroke and cardiovascular disease.

Scientific Approaches

In this cross-sectional, population-based study, we used All of Us baseline data from patient (age>18) provided information (PPI) surveys and electronic health record (EHR) blood pressure measurements and retrospectively examined the prevalence of hypertension in the EHR cohort using Systemized Nomenclature of Medicine (SNOMED codes and blood pressure medications recorded in the EHR. We used the EHR data (SNOMED codes on 2 distinct dates and at least one hypertension medication) as the primary definition, and then add subjects with elevated systolic or elevated diastolic blood pressure on measurements 2 and 3 from PPI. We extracted each participant’s detailed dates of SNOMED code for essential hypertension from the Researcher Workbench table ‘cb_search_all_events’. We calculated an age-standardized HTN prevalence according to the age distribution of the U.S. Census, using 3 groups (18-39, 40-59, ≥ 60).

Anticipated Findings

The prevalence of hypertension in the All of Us cohort is similar to that of published literature. All of Us age-adjusted HTN prevalence was 27.9% compared to 29.6% in National Health and Nutrition Examination Survey. The All of Us cohort is a growing source of diverse longitudinal data that can be utilized to study hypertension nationwide. The prevalence of hypertension varies in the United States (U.S.) by age, sex, and socioeconomic status. Hypertension can often be treated successfully with medication, and prevented or delayed with lifestyle modifications. Even with these established hypertension intervention and prevention strategies, the prevalence of hypertension continues to be at levels of public health concern. The diversity within All of Us may provide insight into factors relevant to hypertension prevention and treatments in a variety of social and geographic contexts and population strata in the U.S.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital
  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Elizabeth Karlson - Late Career Tenured Researcher, Massachusetts General Hospital

Data element analysis of AllofUS

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

Exploring the ALLofUS datasets, to discover the usage of different dta elements and they are used to define and categorize different cohorts of patients. The a analysis will also aim to identify different phenotype methods based on avaialble data elements in the dataset.

Scientific Approaches

Using the complete dataset, we will study the volume and usage of data elements potentially used for phenotyping. This included conducting an analysis of different data element volume and the diversity of values used for distinct data elements.

Anticipated Findings

Our findings will include an understanding of how each data element is used, how commonly and how often values are populated and what are common values for different elements, all in an effort to discover different techniques for phenotype development.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Craig Mayer - Project Personnel, NIH

Data Management

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

Students may be asked to verbally give a brief summary of what they learned from the reading during the lecture portion of the class. This summary, along with discussion during class and engagement over Teams will contribute to Instructors’ subjective assessment of students’ participation.
A final project will be assigned around the fourth week of the class. This project will tie together multiple concepts introduced in the course. In the last class session, each student will turn in a writeup of their final project and present their work to the class.

Scientific Approaches

Some of the work over the course of the semester will include review of tools, processes, and data across the longitudinal cohort of AoU participants.

Anticipated Findings

A final project will be assigned around the fourth week of the class. This project will tie together multiple concepts introduced in the course. In the last class session, each student will turn in a writeup of their final project and present their work to the class.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Marily Barron - Graduate Trainee, Vanderbilt University

Data Quality and Data Characterization

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

This research project will use AoU data to test data quality methods. It will also use AoU to provide reference benchmark data for testing data quality. The analysis will include data characterization. Data will be analyzed if it conform to expected patterns.

Scientific Approaches

Achilles R package developed by OHDSI is example of data quality and data characterization tool. The approach will include running SQL or other analytical queries on AoU dataset.

Anticipated Findings

We will understand how data is structured either as a whole and what are the differences in data from sites.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Collaborators:

  • Craig Mayer - Project Personnel, NIH

DataExploration

Project Purpose(s)

  • Social / Behavioral
  • Educational ...
  • Methods Development

Scientific Questions Being Studied

Explore the collected data set so far and determine the type of research and education activities we can perform in future work.

Scientific Approaches

Descriptive statistics will be calculated to understand the data. In certain cases, we will also use data visualization. We will use Python and R packages for the data analysis.

Anticipated Findings

A clear understanding about the current data set.

Demographic Categories of Interest

  • Age
  • Geography
  • Disability Status
  • Access to Care

Research Team

Owner:

  • Leming Zhou - Project Personnel, University of Pittsburgh

Dementia-Hypertension-Diabetes-2

Project Purpose(s)

  • Disease Focused Research (dementia)
  • Methods Development ...
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy.)

Scientific Questions Being Studied

Alzheimer’s disease is a neurodegenerative condition characterized by a progressive decline in cognitive function (dementia). Studies suggest that patients with elevated blood pressure (hypertension) are at risk of Alzheimer’s disease type dementias. High blood sugar levels or Type2 Diabetes Mellitus may also be associated with an increased risk of dementia. Some minority populations may have an increased incidence of hypertension and diabetes. For example, African Americans have a higher incidence of hypertension. Therefore we will to investigate the grouping of racial and ethnic categories, with respect to the incidence of hypertension, diabetes and dementia, to determine whether minority groups have a stronger association between dementia and co-morbidities by race/ ethnicity.
The goal of this demonstration project is to validate previous research showing potential interactions between dementia, diabetes, and hypertension, with an explicit consideration of race/ ethnicity.

Scientific Approaches

Data from participants (aged 40 or over) will be subjected to statistical analysis to identify interactions between the incidence of dementia, Diabetes, and Hypertension, and self-identified Race/ Ethnicity. We will only analyze participants in this age group, because the incidence of dementia is very low in patients younger than 40. We will only analyze patients with electronic health care data, because we have to ensure that patients have not had a diagnosis of hypertension, dementia or diabetes.

The statistical analysis package R will be used to create contingency tables, perform chi-squared and Cochran-Mantel-Haenszel tests. Figures will be created in R.

Anticipated Findings

We expect that our data will confirm an increased rate of dementia in African Americans with hypertension and diabetes, compared to white participants. We will determine whether other minorities also see a difference in incidence of dementia, hypertension diabetes and interactions between the them.

If there is an increased incidence of dementia in people with hypertension or diabetes, this may suggest that populations with these disorders need more careful monitoring of their conditions, as they may increase the chance of developing dementia. potentially future All of Us projects may be able to determine whether long term control of hypertension (or Diabetes/ blood glucose) may reduce the potential for developing dementia.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Robert Meller - Mid-career Tenured Researcher, Morehouse School of Medicine

Collaborators:

  • Shashwat Deepali Nagar - Graduate Trainee, Georgia Tech
  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Francis Ratsimbazafy - Other, All of Us Program Operational Use
  • King Jordan - Mid-career Tenured Researcher, Georgia Tech

Demographics of Mammography 2020_04

Project Purpose(s)

  • Population Health
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use) ...

Scientific Questions Being Studied

Mammography is an effective screening tool for breast cancer, often identifying tumors that can be treated before they develop invasive potential. Across the United States, it is estimated that 65% of women aged 40 and above have received a screening mammogram. However, smaller studies using data from electronic health records suggest that (1) that the actual screening rate may be lower and (2) mammography screening differs by racial, ethnic, and sociodemographic characteristics, and lower rates of mammography screening may contribute to disparities in breast cancer mortality.

In this demonstration project, we will describe the distribution of mammography screening captured by the submitted electronic health records in the large and diverse participant sample of the All of Us Research Program. Further, we will describe the participant characteristics that are associated with mammography rates in women during the ages in which national guidelines suggest routine screening.

Scientific Approaches

After limiting ourselves to All of Us research participants with electronic health record information, we will identify rates of mammography screening using the procedure and diagnosis tables. Using the participant provided information from the surveys, we will use logistic regression to identify participant characteristics that are associated with higher or lower rates of screening.

Anticipated Findings

Some prior research has attempted to validate self-reported mammography screening against electronic health record verification of the screening. Largely, this research has found that (1) mammography rates are likely lower than self-report suggests and (2) certain patient characteristics are associated with lower rates of screening.

We anticipate that these findings will largely hold in the All of Us study population, and that the diversity of the All of Us participants will allow us to better identify those who may need more assistance to achieve the recommended screening frequency.

Demographic Categories of Interest

  • Race / Ethnicity
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Molly Scannell Bryan - Early Career Tenure-track Researcher, University of Illinois at Chicago

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Dermatology

Project Purpose(s)

  • Disease Focused Research (psoriasis, atopic dermatitis, acne, and more) ...

Scientific Questions Being Studied

Dermatologic diseases in minority populations

Scientific Approaches

Initially exploratory. Plan on conducting cohort comparisons amongst psoriasis, acne, atopic dermatitis, and other rare dermatologic diseases as well as looking at cross-sectional data for these diseases.

Anticipated Findings

Expand the known information of dermatologic disease in minority populations.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Ahmed Yousaf - Research Fellow, West Virginia University

Determinants of cardiovascular disease across minority populations

Project Purpose(s)

  • Disease Focused Research (cardiovascular system disease)
  • Population Health ...
  • Social / Behavioral
  • Ancestry

Scientific Questions Being Studied

Cardiovascular disease (CVD) are responsible for a substantial proportion of the morbidity and mortality observed in the general population. Mounting evidence indicates that this impact disproportionately affects minority populations. This disproportionate effect is not only present in minorities defined by race/ethnicity, but also in those defined by age, sexual orientation, and other characteristics. The main questions of this study are: (1) can we use All of US to identify novel risk factors for cardiovascular disease that are specific to a given minority group? (2) Are existing risk factors for CVD shared across all minority groups? (3) How do the effects of these risk factors vary when considering more than one minority group? These questions are important to (1) identify groups of persons at particularly high risk of sustaining these conditions that may benefit from tailored diagnostic and therapeutic interventions; and (2) identify new treatments for these conditions.

Scientific Approaches

We will use the All of US dataset V3. We will identify variables that represent (1) cardiovascular disease (myocardial infarction, coronary artery disease, stroke); (2) all the known risk factors for each of these conditions; (3) physiological variables that either define a risk factor or are associated with risk of cardiovascular disease (blood pressure, cholesterol levels, hemoglobin A1C); and (4) identify the minority groups of interest. We will use linear and logistic regression to test for association between risk factors and the conditions of interest.

Anticipated Findings

We expect to find that: (1) a substantial number of the known vascular risk factors increase risk of cardiovascular disease in across all evaluated groups; (2) known risk factors for cardiovascular disease disproportionately affect some minority groups; and (3) the effect of these risk factors will be stronger in some minority groups. These findings will helps us to (1) identify groups of persons at particularly high risk of sustaining these conditions that may benefit from tailored diagnostic and therapeutic interventions; and (2) identify new treatments for these conditions.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Guido Falcone - Early Career Tenure-track Researcher, Yale University

Collaborators:

  • Julian Acosta - Research Fellow, Yale University
  • Audrey Leasure - Graduate Trainee, Yale University

Diabetes and CVD Risk Factor Control and Treatment Patterns

Project Purpose(s)

  • Disease Focused Research (Cardiovascular disease and diabetes) ...

Scientific Questions Being Studied

Understanding the spectrum risks found in persons with diabetes and ASCVD in a contemporary cohort of US adults can help target the intensity of therapeutic approaches to prevent future adverse outcomes.

The project will examine the following questions:

1. Within those with diabetes, how many are higher risk (two or more risk factors) and within those with ASCVD how many are at very high risk according to the recent 2018 cholesterol guidelines?
2. Among these higher versus lower risk patients with ASCVD and DM, what is the adherence to non-smoking, healthy diet, and regular physical activity?
3. Among these groups, what is the proportion at target for BP (<130/80 mmHg), LDL-C (< 70 mg/dl if CVD, <100 mg/dl otherwise), HbA1c (<8% if with CVD or <7% if not)?
4. How does the extent of recommended medication use compare among these groups, including statins, BP medication, DM medication, antiplatelet/aspirin therapy, and influenza vaccination.

Scientific Approaches

The dataset we would like to utilize would be participants’ provided information (PPI), electronic health records (EHR) and physical measurements, which provide information about the participant’s overall health status, lifestyle, medication history, serum biochemicals and demographic characteristics.

We will categorize the sample with and without diabetes mellitus (and within diabetes those with vs, without multiple risk factors) or cardiovascular diseases (and among such persons those at high vs. very high risk), and use Chi-square test to compare the extent of adherence to lifestyle measures (in# 2 above), risk factor control (#3 above), and extent of recommended medication use (#4 above). Multiple logistic regression will be used to examine whether the extent of single and multiple risk factor control is and within these conditions, those at higher versus lower risk and use multiple logistic regression to test the behavior and risk factors difference among different groups.

Anticipated Findings

We anticipate that there are different patterns in terms of health behaviors and risk factors control across the spectrum of diabetes and cardiovascular diseases, which will help to identify the possible prevention strategies towards different kinds of DM and CVDs and realize precision medicine among the patients.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Yufan Gong - Graduate Trainee, University of California, Los Angeles

Collaborators:

  • Nathan Wong - Other, University of California, Irvine

Diabetes Comparison Workspace

Project Purpose(s)

  • Other Purpose (This workspace's main purpose will be to provide a place to learn first hand how to create and analyze data from All of Us. The "research aim" of this project will be to compare diabetes patients and control patients, however this is only meant as a directive for the ultimate purpose of better understanding workspace creation and analysis in AoU. ) ...

Scientific Questions Being Studied

What are the differences in A1c levels between diabetic and control populations and how do these comparisons vary when controlling for other covariates (age, gender, race, demographic information).

Scientific Approaches

We plan to use simple comparative statistical analyses such as t-tests and Bayesian analyses to explore group differences. Linear regression and more advanced modeling techniques (regularized regression, tree based methods) may be used to further define differences between the groups. Most of the analysis will be conducted in R.

Anticipated Findings

Anticipated findings are that A1c levels are higher among diabetes and prediabetes patients than controls.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Kyle Webb - Project Personnel, NIH

Collaborators:

  • Josh Denny - Other, All of Us Program Operational Use

DiabetesAndEyeDiseases

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

People who have diabetes are more likely to develop several eye diseases or conditions, such as diabetic retinopathy, cataracts and open-angle glaucoma. For people who have diabetes, it is important to get regular comprehensive dilated eye exam to identify these conditions and have early treatment. This study is to survey cohorts of AoU participants who have diabetes and who do not, and compare the incidence rates of these eye conditions, and to examine their EHR data for regular eye exams as a means of prevention.

Scientific Approaches

1) Building datasets: identify participants who have type 2 diabetes and randomly select equal number of participants without type 2 diabetes
2) Search the EHR records of the participants to identify eye diseases and conditions
3) Run statistical analysis to find the difference on incidence rate between the two populations

Anticipated Findings

Type 2 diabetic participants will have a higher incidence rate of eye diseases or conditions compared to non-diabetic participants. Using the EHR data, if the type 2 participants do not have more frequent eye exams compared to non-diabetic participants, the findings in the study will provide evidence to recommend for regular eye exams for diabetic patients.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • James Gao - Administrator, NIH

Diabetic Foot ulcer

Project Purpose(s)

  • Disease Focused Research (diabetes mellitus) ...

Scientific Questions Being Studied

Evaluate the differences in healthcare access among adults with diabetes and a foot ulcer, based on race/ethnicity, geography, and other socioeconomic factors

Scientific Approaches

We will utilize the existing survey data from the All of Us Research Program (AOURP) to determine the disparities in healthcare access and utilization among participants with diabetes and those with foot ulcerations.

Anticipated Findings

Racial/ethnic minorities, those living in rural areas, and of low socioeconomic classes will experience disparities in access to care and health care utilization compared to whites and general population.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Tze-Woei Tan - Mid-career Tenured Researcher, University of Arizona

Collaborators:

  • Chiu-Hsieh Hsu - Late Career Tenured Researcher, University of Arizona

Disease_convergence_and_lifestyle

Project Purpose(s)

  • Population Health
  • Methods Development ...
  • Ancestry

Scientific Questions Being Studied

Multiple genetic polymorphisms have been identified for complex diseases, but relationships, such as the biological underpinning of genetic interactions, are still elusive. Epigenomic studies have shown that genetic variants may have convergent effects, which increase the risk of developing complex diseases and comorbidities. We aim to prioritize the genetic variants with convergent effects and diseases of excess epigenomic similarity from the abundant biological resources, such as ENCODE and GTEx. We will then study the agreement between the convergent effects and interactions of genetic variants in AllofUsRP and the agreement between disease epigenomic similarity and disease comorbidities in AllofUsRP. Lifestyle and environment exposures are critical risk factors, and their effects will be modeled as well. The research will help us understand disease mechanisms and missing heritability and foster applications like drug repositioning.

Scientific Approaches

We have developed an information-theoretical based similarity for quantifying the similarity of genetic variants and disease pairs from GTEx data. We have also developed a multi-omics integration method to quantify the overall similarity of genetic variants in ENCODE. We will extend the latter method to quantify the epigenomic similarity for disease pairs. We aim to use AllofUsRP for validating the genetic interactions between genetic variants and comorbidities. Further, we will use, logistic regression, LASSO, and deep learning methods to model diseases from lifestyles and genetic interactions.

Anticipated Findings

We expect to find many unexpected biological links between the effects of distinct genetic variants, which may explain the increased risk of diseases and comorbidities. With machine learning models, we will build disease prediction models, particularly those impacted heavily by lifestyles, such as cancers. The research will generate candidates for novel drug targets and drug repositioning approaches.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Haiquan Li - Early Career Tenure-track Researcher, University of Arizona

Collaborators:

  • Edwin Baldwin - Graduate Trainee, University of Arizona

Diseases with similar symptoms as COVID-19

Project Purpose(s)

  • Disease Focused Research (COVID-19) ...

Scientific Questions Being Studied

We want to find diseases with similar symptoms as COVID-19. Such diseases can be used as features in COVID-19 prediction models.

Scientific Approaches

Use mutual information to find diseases/conditions that have high co-occurrence with COVID-19's common symptoms.

Anticipated Findings

We expect to find ~100 diseases/conditions that have high co-occurrence with COVID-19's common symptoms. Considering these diseases/conditions when building COVID-19 prediction models can help to reduce false-positive predictions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jifan Gao - Graduate Trainee, University of Wisconsin, Madison

Disparities in Kidney, Bladder, and Prostate Cancer

Project Purpose(s)

  • Disease Focused Research (cancer)
  • Population Health ...
  • Social / Behavioral
  • Control Set
  • Ancestry

Scientific Questions Being Studied

What are underlying factors that drive disparities in kidney, bladder, and prostate cancer? How do these factors result in increased risk? Identifying risk factors is important as it may allow for more targeted screening, early detection, and personalized therapies.

Scientific Approaches

We plan to look at datasets of kidney, bladder, and prostate cancers. We would like to include analysis of associations for genetic, socioeconomic or geographic variables.

Anticipated Findings

We anticipate that we may identify novel genetic, socioeconomic, or genetic factors as contributors to risk in genitourinary cancers. These findings may improve risk assessment in determining personalized screening, allow for earlier identification of disease, or facilitate targeted therapies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jacob Knorr - Graduate Trainee, Cleveland Clinic

Distress and T2D

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes mellitus) ...

Scientific Questions Being Studied

Depression, anxiety, and other forms of mental distress are frequently co-morbid with type 2 diabetes. Are there common risk factors between the two?

Scientific Approaches

Comparison demographics and medications of individuals diagnosed with type 2 diabetes with and without mental distress.

Anticipated Findings

If we know that someone is at risk for mental distress, we might be able to provide increased support to mitigate the effects.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Sara Taylor - Research Fellow, Massachusetts Institute of Technology

Diversity within Eating Disorders

Project Purpose(s)

  • Population Health ...

Scientific Questions Being Studied

We will explore sociodemographic variables (e.g., gender, sexual orientation, race/ethnicity) in relation to eating disorder diagnoses. We specifically are interested in disparities in the occurrence of eating disorder diagnoses, access to treatment, age of initial diagnosis, and associated distress and impairment. That is, are some sociodemographic groups more likely to receive a diagnosis of eating disorders, receive care, have varying age of initial diagnosis, and/or experience disproportionate distress/impairment, than other sociodemographic groups? These research questions are important, as there is limited research exploring intersecting identities within eating disorders.

Scientific Approaches

Within the All of Us dataset V3, we will exact sociodemographic variables (gender, sexual orientation, race/ethnicity) eating disorder diagnosis and treatment, and items from the 'Overall Health' survey to assess distress/impairment and quality of life. Initially, descriptive statistics will be used to report the frequencies of eating disorder diagnoses and treatment as a function of the aforementioned sociodemographic variables. Should there be adequate statistical power, logistic regression models will be employed with sociodemographic variables set as 'predictors' of binary eating disorder 'outcomes.' Additionally, within individuals diagnosed with an eating disorder, we will examine sociodemographic differences in distress/impairment and quality of life via linear regression. Metrics of effect size estimates will also be reported. Should statistical power allow us, interaction terms by sociodemographic variables will also be tested.

Anticipated Findings

There is limited research on intersecting identities among individuals diagnosed with eating disorders. By employing the All of Us dataset, we may be able to identify health disparities in the occurrence of eating disorders and/or associated distress/impairment and quality of life. Results may help guide additional research efforts into understanding the mechanisms which may place some populations at disproportionate risk, which subsequently could lead to refined and tailored eating disorder prevention and treatment approaches.

Demographic Categories of Interest

  • Race / Ethnicity
  • Gender Identity
  • Sexual Orientation
  • Access to Care

Research Team

Owner:

  • Aaron Blashill - Mid-career Tenured Researcher, San Diego State University

Collaborators:

  • Melissa Simone - Research Fellow, University of Minnesota
  • Alexandra Convertino - Graduate Trainee, San Diego State University
  • Jamie-Lee Pennesi - Research Fellow, San Diego State University
  • Jonathan Helm - Early Career Tenure-track Researcher, San Diego State University
  • Autumn Askew - Project Personnel, University of Minnesota

DJS: Duplicate of JAMA PheWAS Final Review 05-21-2020

Project Purpose(s)

  • Methods Development
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

As a demonstration project, this study will present the results of Phenome-Wide Association Studies (PheWAS) to show how the various sources of data contained within All of Us research dataset can be used to inform scientific discovery. We will perform separate PheWAS studies with smoking status as the independent variable. Specific questions include:

1. How can one implement a PheWAS within the All of Us Researcher Workbench?
2. How can one use heterogeneous data sources within the All of Us dataset to explore disease associations using self-reported exposures (Participant Provided Information, or “PPI”) and exposures captured in the electronic medical record (EHR).”

There is no pre-specified hypothesis. It is important to determine if PheWAS can be conducted within the All of Us workbench

Scientific Approaches

As a demonstration project, this study will present the results of Phenome-Wide Association Studies (PheWAS) to show how the various sources of data contained within All of Us research dataset can be used to inform scientific discovery. We will perform separate PheWAS studies with smoking status as the independent variable. Specific questions include:

1. How can one implement a PheWAS within the All of Us Researcher Workbench?
2. How can one use heterogeneous data sources within the All of Us dataset to explore disease associations using self-reported exposures (Participant Provided Information, or “PPI”) and exposures captured in the electronic medical record (EHR).”

There is no pre-specified hypothesis. It is important to determine if PheWAS can be conducted within the All of Us workbench

Anticipated Findings

For this study, we anticipate that we will be able to replicate known disease associations with smoking exposure. This will serve to demonstrate the quality, utility, and diversity of the All of Us data and tools and the power of gathering multiple data sources for a single phenotype, providing researchers options for study design and validation. Importantly the entire PheWAS package is made available for reuse by researchers in the Workbench, for new hypothesis generation.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • David Schlueter - Research Fellow, NIH

DRC_Duplicate of For_HTN_code_review

Project Purpose(s)

  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy.) ...

Scientific Questions Being Studied

We are using the All of Us Researcher Workbench interface to answer the question, "Is hypertension prevalence in the All of Us Research Program similar to hypertension prevalence in the 2015–2016 National Health and Nutrition Examination Survey (NHANES) ?". Clinical approaches to understanding and treating hypertension may benefit from the integration of a precision medicine approach that integrates data on environments, social determinants of health, behaviors, and genomic factors that contribute to hypertension risk. Hypertension is a major public health concern and remains a leading risk factor for stroke and cardiovascular disease.

Scientific Approaches

In this cross-sectional, population-based study, we used All of Us baseline data from patient (age>18) provided information (PPI) surveys and electronic health record (EHR) blood pressure measurements and retrospectively examined the prevalence of hypertension in the EHR cohort using Systemized Nomenclature of Medicine (SNOMED codes and blood pressure medications recorded in the EHR. We used the EHR data (SNOMED codes on 2 distinct dates and at least one hypertension medication) as the primary definition, and then add subjects with elevated systolic or elevated diastolic blood pressure on measurements 2 and 3 from PPI. We extracted each participant’s detailed dates of SNOMED code for essential hypertension from the Researcher Workbench table ‘cb_search_all_events’. We calculated an age-standardized HTN prevalence according to the age distribution of the U.S. Census, using 3 groups (18-39, 40-59, ≥ 60).

Anticipated Findings

The prevalence of hypertension in the All of Us cohort is similar to that of published literature. All of Us age-adjusted HTN prevalence was 27.9% compared to 29.6% in National Health and Nutrition Examination Survey. The All of Us cohort is a growing source of diverse longitudinal data that can be utilized to study hypertension nationwide. The prevalence of hypertension varies in the United States (U.S.) by age, sex, and socioeconomic status. Hypertension can often be treated successfully with medication, and prevented or delayed with lifestyle modifications. Even with these established hypertension intervention and prevention strategies, the prevalence of hypertension continues to be at levels of public health concern. The diversity within All of Us may provide insight into factors relevant to hypertension prevention and treatments in a variety of social and geographic contexts and population strata in the U.S.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Francis Ratsimbazafy - Other, All of Us Program Operational Use
  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

DRC_Duplicate of for_obesity_code_review

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Educational ...
  • Methods Development
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy.)

Scientific Questions Being Studied

National obesity prevention and intervention strategies may benefit from precision medicine approaches that incorporate integrated data on environments, social determinants of health, and genomic factors. We examined the quality and utility of the All of Us Research Hub Workbench for accelerating precision medicine by replicating methods from existing studies that examine the prevalence of obesity at the population level. We evaluated the measurements of obesity in the participant measurement (PM) data set and the electronic health record (EHR) data set using methods similar to the Ward et al. NEJM December 2019 publication that assessed prevalence of obesity in the US by state using BRFSS data.

Scientific Approaches

For this population-based cross-sectional study of All of Us Research Workbench participants, we excluded individuals with measurements obtained during pregnancy or inpatient visits and individuals from states with fewer than 100 participants. Physical measurements (PM) of height and weight at the time of program enrollment of 142,116 participants and measured weight and height extracted from electronic health records (EHR) of 40,885 individuals were used to calculate body-mass index (BMI). We did a complete case analysis for All of Us participants with known sex (male or female), race, income and education levels and estimated state-specific and demographic subgroup-specific prevalence of categories of BMI [obesity (BMI ≥30) and extreme obesity (BMI ≥ 35)] nationwide and for each state: overall and by subgroups, male and female. We examined the difference between EHR and PM calculated BMI by state.

Anticipated Findings

Using states with at least 100 participants, PM data included 142,116 individuals (mean [SD] age, 51.2 [16.6] and EHR data on height and weight included 40,885 individuals (mean [SD] age, 52.5 [16.5]. The median BMI for PM participants was 28.4 [24.4 to 33.7]; the median BMI for EHR was 29.0 [24.8 to 34.5]. The PM national prevalence for obesity (includes BMI>30 and BMI >35) and extreme obesity (BMI >35) were 41.2 % (95% Confidence Interval [CI], 40.9 to 41.4) and 20.8% (95% CI, 20.6 to 21.0), respectively, with large variations across states. Women had higher prevalence of extreme obesity than men in all selected states. Subgroups with extreme obesity (BMI, >35) prevalence greater than 25% included subgroup, N, prevalence %, (95% CI): Black NH, 8913, 28.9 (25.8 to 32.0) , individuals with income less than $25,000, 13,244, 25.1 (22.1 to 28.1); education of high school to some college, 17, 272, 26.1 (23.1 to 29.1) and the region of the South, 6,639, 25.3 (22.3 to 28.3).

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital
  • Jun Qian - Other, All of Us Program Operational Use

Duplicate (latest ver, for testing) of Systemic Disease and Glaucoma

Project Purpose(s)

  • Disease Focused Research (Primary open angle glaucoma)
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy. ) ...

Scientific Questions Being Studied

We have previously published a predictive model of glaucoma progression using electronic health record (EHR) data pertaining to systemic attributes from a single institution. We aim to use the All of Us dataset to 1) serve as external validation for this single-center model and 2) to train new models focused on predicting glaucoma progression using systemic predictors. This is important to understand whether the original findings are generalizable and provide additional knowledge about the utility of systemic predictors on a national-level dataset.

Scientific Approaches

We plan to primarily work with EHR data contained in All of Us for a cohort of adult participants diagnosed with primary open-angle glaucoma. We will extract data on systemic conditions and medications for this cohort, as well as physical measurements and vital signs. We will clean the data such that the format is consistent with the data from our previously published model. Then, we will use this data as an external validation of a logistic regression model derived from our prior study that was based at a single academic center. Next, we will use All of Us data to train a new set of models, using techniques such as logistic regression, random forests, and artificial neural networks. We will optimize these models using feature selection methods and class balancing procedures. By evaluating performance metrics such as area under the curve (AUC), precision, recall, and accuracy, we will assess whether we can achieve superior predictive performance when training models using All of Us.

Anticipated Findings

We anticipate that the All of Us data will validate the findings from the model, which demonstrated that blood pressure-related metrics and certain medication classes had predictive value for glaucoma progression. In addition, we anticipate that the models trained with All of Us data will outperform the model trained with single institution data due to larger sample size and greater diversity. These findings will support further investigation in understanding the relationship between systemic conditions like blood pressure with glaucoma progression.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Bharanidharan Radha Saseendrakumar - Project Personnel, University of California, San Diego

Duplicate of ARI Workspace -7-29-20 #1

Project Purpose(s)

  • Disease Focused Research (Autoimmune diseases) ...

Scientific Questions Being Studied

The goal of our research is to determine prevalence of autoimmune diseases, individually and as a class of disease, in the US. This work will help understand the likelihood of having autoimmune disease and we hope it will improve the ability of doctors to diagnose patients as it will establish the prior probability of having one of these many diseases.

Scientific Approaches

We will create three data sets for analysis:

1. A list of diseases rated in the following ways:

a. Evidence Class
i. Strong evidence it is autoimmune
ii. Moderate evidence it is autoimmune
iii. Weak evidence for autoimmunity
iv. A comorbidity of autoimmune disease
v. Symptom or symptom set with no known mechanism

b. Autoinflammatory versus autoimmune flag

c. “Not always autoimmune” flag – to indicate diseases that could have alternative mechanisms of cause

2. A list of patients, anonymized, with socioeconomic, geographic and other data that would be of interest to patients and public health officials to understand which communities are affected by these diseases
3. Outcomes data for patients over time assessing quality of life using PROMIS metrics

Anticipated Findings

The current NIH estimate of 23.5 million people with autoimmune disease was a guess by a knowledgable clinician, but has no scientific support. As a consequence, there are numerous figures in the public sphere and nobody knows which one is correct.

Many reports say autoimmune diseases are on the increase, but since the number is unknown, it is impossible to say whether this is a public health issue or not. Having a methodology that can be used to recompute the number of people with autoimmune disease will help us understand if these reports are true.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Aaron Abend - Senior Researcher, Autoimmune Registry

Duplicate of Childhood Obesity

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Educational ...
  • Drug Development

Scientific Questions Being Studied

Childhood obesity is a major public health problem across the globe as well as in the US. Childhood obesity can continue into adulthood and is known to be a major risk factor for chronic diseases such as diabetes, cancer, and cardiovascular diseases . Preventing childhood obesity has been actively pursued in pediatric programs. However, decades of rigorous research have shown that prevention and management of obesity is not easy. This is partly due to our limited understating of obesity and the complex interactions among a myriad of various factors, including biological and environmental ones, that are known to contribute to obesity. The motivation of this work is to predict childhood obesity early on and study the cause and consequences of obesity.

Scientific Approaches

We are planning to use deep machine learning models to study cause and consequences of obesity and predict childhood obesity.

Anticipated Findings

We are looking to study cause and consequences of childhood obesity. We will look into factors that will help predict childhood obesity early on. This will help prevent obesity and other chronic diseases that are the consequence of childhood obesity.

Demographic Categories of Interest

  • Age

Research Team

Owner:

  • Raphael Poulain - Graduate Trainee, University of Delaware

Duplicate of Demo - Medication Sequencing

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes, depression)
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

1- What are the main prescribed medication sequences that participants with type 2 diabetes and depression took over three years of treatment?
In this questions, we are extracting the anti-diabetes and anti-depressant medications used to to treated participants who have T2D and depression codes. We retrieved medications prescribed after the first diagnosis code for each disease. We represented the medications using their ATC 4th level.
2- What is the most common first anti-diabetic and anti-depressant that were prescribed for All of Us participants? We extracted the first medications prescribed to treat T2D and depression. We identified the most common first medication with the highest number of participants.
3- Is there a change in the percentages of participants who were prescribed first common medication, treated using one medication, treated only using one common medication between 2000-2018?

Scientific Approaches

In this project, we plan on using the medication sequencing developed at Columbia University and the OHDSI network as a means to characterize treatment pathways at scale. Further, we want to demonstrate implementation of these medication sequencing algorithms in the All of Us research dataset to show how the various sources of data contained within the program can be used to characterize treatment pathways at scale. We will perform separate medication sequence analyses for three different common, complex diseases: type 2 diabetes, depression
1- Data manipulation
Using python and BigQuery to:
A- Retrieve medication and their classes
B-Create the medications sequences

2- Visualization:
A- Creating sunburst to visualize the sequences
B- Plotting the percentages of participants the first common medication and one medication during three years

Anticipated Findings

For this study, we anticipate demonstrating the validity of the data by showing expected treatment patterns despite gathering data from over 30 individual EHR sites. Specifically, we expect to find:
1- Variation in the medication sequences prescribed to treat All of Us participants who had type 2 diabetes and depression.
2- The most common medication used to treat participants as first line treatment with type 2 diabetes and depression diagnosis.
3- A trend or change over time of prescribing the first common medication over the study period
4- Trend overtime for the percentage of participants
Importantly, the detailed code developed herein is made available within the Researcher Workbench to researchers, so that they may more easily extract medication data and class information using a common medication ontology, an approach useful in many discovery studies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Duplicate of Demo - Medication Sequencing

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes, depression)
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

1- What are the main prescribed medication sequences that participants with type 2 diabetes and depression took over three years of treatment?
In this questions, we are extracting the anti-diabetes and anti-depressant medications used to to treated participants who have T2D and depression codes. We retrieved medications prescribed after the first diagnosis code for each disease. We represented the medications using their ATC 4th level.
2- What is the most common first anti-diabetic and anti-depressant that were prescribed for All of Us participants? We extracted the first medications prescribed to treat T2D and depression. We identified the most common first medication with the highest number of participants.
3- Is there a change in the percentages of participants who were prescribed first common medication, treated using one medication, treated only using one common medication between 2000-2018?

Scientific Approaches

In this project, we plan on using the medication sequencing developed at Columbia University and the OHDSI network as a means to characterize treatment pathways at scale. Further, we want to demonstrate implementation of these medication sequencing algorithms in the All of Us research dataset to show how the various sources of data contained within the program can be used to characterize treatment pathways at scale. We will perform separate medication sequence analyses for three different common, complex diseases: type 2 diabetes, depression
1- Data manipulation
Using python and BigQuery to:
A- Retrieve medication and their classes
B-Create the medications sequences

2- Visualization:
A- Creating sunburst to visualize the sequences
B- Plotting the percentages of participants the first common medication and one medication during three years

Anticipated Findings

For this study, we anticipate demonstrating the validity of the data by showing expected treatment patterns despite gathering data from over 30 individual EHR sites. Specifically, we expect to find:
1- Variation in the medication sequences prescribed to treat All of Us participants who had type 2 diabetes and depression.
2- The most common medication used to treat participants as first line treatment with type 2 diabetes and depression diagnosis.
3- A trend or change over time of prescribing the first common medication over the study period
4- Trend overtime for the percentage of participants
Importantly, the detailed code developed herein is made available within the Researcher Workbench to researchers, so that they may more easily extract medication data and class information using a common medication ontology, an approach useful in many discovery studies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Duplicate of Demo - Medication Sequencing

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes, depression)
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

1- What are the main prescribed medication sequences that participants with type 2 diabetes and depression took over three years of treatment?
In this questions, we are extracting the anti-diabetes and anti-depressant medications used to to treated participants who have T2D and depression codes. We retrieved medications prescribed after the first diagnosis code for each disease. We represented the medications using their ATC 4th level.
2- What is the most common first anti-diabetic and anti-depressant that were prescribed for All of Us participants? We extracted the first medications prescribed to treat T2D and depression. We identified the most common first medication with the highest number of participants.
3- Is there a change in the percentages of participants who were prescribed first common medication, treated using one medication, treated only using one common medication between 2000-2018?

Scientific Approaches

In this project, we plan on using the medication sequencing developed at Columbia University and the OHDSI network as a means to characterize treatment pathways at scale. Further, we want to demonstrate implementation of these medication sequencing algorithms in the All of Us research dataset to show how the various sources of data contained within the program can be used to characterize treatment pathways at scale. We will perform separate medication sequence analyses for three different common, complex diseases: type 2 diabetes, depression
1- Data manipulation
Using python and BigQuery to:
A- Retrieve medication and their classes
B-Create the medications sequences

2- Visualization:
A- Creating sunburst to visualize the sequences
B- Plotting the percentages of participants the first common medication and one medication during three years

Anticipated Findings

For this study, we anticipate demonstrating the validity of the data by showing expected treatment patterns despite gathering data from over 30 individual EHR sites. Specifically, we expect to find:
1- Variation in the medication sequences prescribed to treat All of Us participants who had type 2 diabetes and depression.
2- The most common medication used to treat participants as first line treatment with type 2 diabetes and depression diagnosis.
3- A trend or change over time of prescribing the first common medication over the study period
4- Trend overtime for the percentage of participants
Importantly, the detailed code developed herein is made available within the Researcher Workbench to researchers, so that they may more easily extract medication data and class information using a common medication ontology, an approach useful in many discovery studies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Stanley Jia - Undergraduate Student, University of California, Irvine

Duplicate of Demo - Medication Sequencing

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes, depression)
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

1- What are the main prescribed medication sequences that participants with type 2 diabetes and depression took over three years of treatment?
In this questions, we are extracting the anti-diabetes and anti-depressant medications used to to treated participants who have T2D and depression codes. We retrieved medications prescribed after the first diagnosis code for each disease. We represented the medications using their ATC 4th level.
2- What is the most common first anti-diabetic and anti-depressant that were prescribed for All of Us participants? We extracted the first medications prescribed to treat T2D and depression. We identified the most common first medication with the highest number of participants.
3- Is there a change in the percentages of participants who were prescribed first common medication, treated using one medication, treated only using one common medication between 2000-2018?

Scientific Approaches

In this project, we plan on using the medication sequencing developed at Columbia University and the OHDSI network as a means to characterize treatment pathways at scale. Further, we want to demonstrate implementation of these medication sequencing algorithms in the All of Us research dataset to show how the various sources of data contained within the program can be used to characterize treatment pathways at scale. We will perform separate medication sequence analyses for three different common, complex diseases: type 2 diabetes, depression
1- Data manipulation
Using python and BigQuery to:
A- Retrieve medication and their classes
B-Create the medications sequences

2- Visualization:
A- Creating sunburst to visualize the sequences
B- Plotting the percentages of participants the first common medication and one medication during three years

Anticipated Findings

For this study, we anticipate demonstrating the validity of the data by showing expected treatment patterns despite gathering data from over 30 individual EHR sites. Specifically, we expect to find:
1- Variation in the medication sequences prescribed to treat All of Us participants who had type 2 diabetes and depression.
2- The most common medication used to treat participants as first line treatment with type 2 diabetes and depression diagnosis.
3- A trend or change over time of prescribing the first common medication over the study period
4- Trend overtime for the percentage of participants
Importantly, the detailed code developed herein is made available within the Researcher Workbench to researchers, so that they may more easily extract medication data and class information using a common medication ontology, an approach useful in many discovery studies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Duplicate of Demo - PheWAS Smoking

Project Purpose(s)

  • Methods Development
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

As a demonstration project, this study will present the results of Phenome-Wide Association Studies (PheWAS) to show how the various sources of data contained within All of Us research dataset can be used to inform scientific discovery. We will perform separate PheWAS studies with smoking status as the independent variable. Specific questions include:

1. How can one implement a PheWAS within the All of Us Researcher Workbench?
2. How can one use heterogeneous data sources within the All of Us dataset to explore disease associations using self-reported exposures (Participant Provided Information, or “PPI”) and exposures captured in the electronic medical record (EHR).

Scientific Approaches

As a method for assessing the health burden of smoking on potential observed phenotypes, we implement a Phenome-Wide Association study. A Phenome-wide association study consists of an array of association tests over an indexed representation of the human phenome. In this analysis, we will conduct PheWAS for EHR derived smoking and PPI derived smoking exposures included in the All of Us research dataset. We will be representing "Smoking Exposure” in three ways:
EHR Smoking ICD Billing Codes
Participant Provided Information (PPI) Smoking lifetime 100 cigarettes yes/no
Participant Provided Information (PPI) Smoking lifetime smoking everyday
To perform PheWAS, we will map ICD representations of disease to a common vocabulary of PheCodes. We then use Jupyter Notebooks to create reusable functions to perform PheWAS and generate Manhattan Plots to summarize associations.

Anticipated Findings

For this study, we anticipate that we will be able to replicate known disease associations with smoking exposure. This will serve to demonstrate the quality, utility, and diversity of the All of Us data and tools and the power of gathering multiple data sources for a single phenotype, providing researchers options for study design and validation. Importantly the entire pheWAS package is made available for reuse by researchers in the Workbench, for new hypothesis generation.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jie Chen - Late Career Tenured Researcher, Augusta University

Duplicate of Genetics_of_comorbidity

Project Purpose(s)

  • Methods Development
  • Ancestry ...
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)

Scientific Questions Being Studied

As a demonstration project, this study shows the reproducibility and utility of comorbidities as candidate phenotypes for mining datasets derived from clinical practice. Diseases annotations extracted from billing data have been shown by Electronic Medical Records and Genomics Network to be unreliable phenotypes for GWAS studies. We propose that co-occurrences of specific diseases together can be designed and validated with accuracy and reliability in high throughput, as they represent known and sometimes novel clinical syndromes. Specific questions:

Can we reproduce comorbidities discovered from external datasets (e.g., Healthcare Cost and Utilization Project - HCUP) through statistical regression in the All of Us dataset, while controlling potentially confounders such as gender, age, race and ethnicity?

Can we reproduce candidate medical genetics syndromes derived from mining of GWAS jointly with other biomolecular datasets and previously confirmed as comorbid in clinical datasets?

Scientific Approaches

As a method to assess comorbidities, we will implement previously validated analyses to reliably assign the presence of specific diseases to a subject and to calculate their excess co-occurrence in the dataset with statistical significance and effect sizes. We will:

1. Utilize the OMOP code corresponding to the SNOMED-coded mapping that we curated from 262 GWAS.
2. Query the All of Us dataset to identify subjects with any of these diseases.
3. Use our published logistic regression (controlled for age, gender, and race) to calculate the effect size and significance of each pair of disease (comorbidities).
4. The significant comorbidities discovered in the All of Us dataset will be compared and contrasted with those observed as reproducible in HCUP dataset. A subset of comorbidities that we previously confirmed with converging molecular genetics from trans-EQTL patterns across chromosomes are also studied as candidate medical genetic syndromes among comorbid syndromes.

Anticipated Findings

For this study, we anticipate that we will confirm the reliability of the All-of-Us clinical practice dataset to reliably recapitulate comorbidities that were reproducibly observed in independent datasets. These "confirmed comorbidities" will serve as reliable phenotypes for future studies, such as calculating a Phenome-wide association study or computing a Genome-wide association study to demonstrate the genetic underpinning of new clinical syndromes consisting of more than one disease (e.g., the metabolic syndrome).
Importantly, the curated phenotypes and the tested logistic regression are made available for reuse by researchers in the Workbench, for new hypothesis generation.

Findings will be disseminated by the following: through scientific journals, GitHub and the workbench

Outcomes anticipated from the research: a method for querying reliably ~400 to 500 reproducible comorbidities

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth

Research Team

Owner:

  • Jianrong Li - Project Personnel, University of Arizona

Duplicate of How to Work with All of Us Physical Measurements Data

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

N/A

Anticipated Findings

N/A

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Harry Hochheiser - Mid-career Tenured Researcher, University of Pittsburgh

Duplicate of How to Work with All of Us Physical Measurements Data

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

N/A

Anticipated Findings

N/A

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Raphael Poulain - Graduate Trainee, University of Delaware

Duplicate of How to Work with All of Us Physical Measurements Data

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

N/A

Anticipated Findings

N/A

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jie Chen - Late Career Tenured Researcher, Augusta University

Duplicate of How to Work with All of Us Physical Measurements Data

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

The purpose of this workspace is to get familiar with the available data as well as its structure in the All of Us cohort. There are no active research questions being pursued at this stage but experience obtained through this workspace will help enable us to formulate plans as to how best utilize this rich dataset to answers scientific questions.

Scientific Approaches

I am using R programming language within Jupyter notebook provided in the workbench to better understand the available data elements as well as their structure.

Anticipated Findings

The only anticipated finding from this workspace is more experience with the All of Us cohort dataset as well as its workbench which will enable us to be better equipped to carry out future studies using data collected from All of Us participants.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Ozan Dikilitas - Research Fellow, Mayo Clinic

Duplicate of How to Work with All of Us Physical Measurements Data

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

N/A

Anticipated Findings

N/A

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • James Geller - Late Career Tenured Researcher, New Jersey Institute of Technology

Duplicate of How to Get Started with Registered Tier Data

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Tutorial Workspace. It is meant to provide instruction for key Researcher Workbench components and All of Us data representation.)

Scientific Questions Being Studied

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data.

What should you expect? This notebook will give you an overview of what data is available in the current Curated Data Repository (CDR). It will also teach you how to retrieve information about Electronic Health Record (EHR), Physical Measurements (PM), and Survey data.

Scientific Approaches

This Tutorial Workspace contains two Jupyter Notebooks (one written in Python, the other in R). Each notebook is divided into the following sections:

1. Setup: How to set up this notebook, install and import software packages, and select the correct version of the CDR.
2. Data Availability Part 1: How to summarize the number of unique participants with major data types: Physical Measurements, Survey, and EHR;
3. Data Availability Part 2: How to delve a little deeper into data availability within each major data type;
4. Data Organization: An explanation of how data is organized according to our common data model.
5. Example Queries: How to directly query the CDR, using two examples of SQL queries to extract demographic data.
6. Expert Tip: How to access the base version of the CDR, for users that want to do their own cleaning.

Anticipated Findings

By reading and running the notebooks in this Tutorial Workspace, you will understand the following:

All of Us data are made available in a Curated Data Repository. Participants may contribute any combination of survey, physical measurement, and electronic health record data. Not all participants contribute all possible data types. Each unique piece of health information is given a unique identifier called a concept_id and organized into specific tables according to our common data model. You can use these concept_ids to query the CDR and pull data on specific health information relevant to your analysis. See our support article Learning the Basics of the All of Us Dataset for more info.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Duplicate of How to Get Started with Registered Tier Data

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Tutorial Workspace. It is meant to provide instruction for key Researcher Workbench components and All of Us data representation.)

Scientific Questions Being Studied

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data.

What should you expect? This notebook will give you an overview of what data is available in the current Curated Data Repository (CDR). It will also teach you how to retrieve information about Electronic Health Record (EHR), Physical Measurements (PM), and Survey data.

Scientific Approaches

This Tutorial Workspace contains two Jupyter Notebooks (one written in Python, the other in R). Each notebook is divided into the following sections:

1. Setup: How to set up this notebook, install and import software packages, and select the correct version of the CDR.
2. Data Availability Part 1: How to summarize the number of unique participants with major data types: Physical Measurements, Survey, and EHR;
3. Data Availability Part 2: How to delve a little deeper into data availability within each major data type;
4. Data Organization: An explanation of how data is organized according to our common data model.
5. Example Queries: How to directly query the CDR, using two examples of SQL queries to extract demographic data.
6. Expert Tip: How to access the base version of the CDR, for users that want to do their own cleaning.

Anticipated Findings

By reading and running the notebooks in this Tutorial Workspace, you will understand the following:

All of Us data are made available in a Curated Data Repository. Participants may contribute any combination of survey, physical measurement, and electronic health record data. Not all participants contribute all possible data types. Each unique piece of health information is given a unique identifier called a concept_id and organized into specific tables according to our common data model. You can use these concept_ids to query the CDR and pull data on specific health information relevant to your analysis. See our support article Learning the Basics of the All of Us Dataset for more info.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Janis Geary - Research Fellow, Arizona State University

Duplicate of How to Get Started with Registered Tier Data

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Tutorial Workspace. It is meant to provide instruction for key Researcher Workbench components and All of Us data representation.)

Scientific Questions Being Studied

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data.

What should you expect? This notebook will give you an overview of what data is available in the current Curated Data Repository (CDR). It will also teach you how to retrieve information about Electronic Health Record (EHR), Physical Measurements (PM), and Survey data.

Scientific Approaches

This Tutorial Workspace contains two Jupyter Notebooks (one written in Python, the other in R). Each notebook is divided into the following sections:

1. Setup: How to set up this notebook, install and import software packages, and select the correct version of the CDR.
2. Data Availability Part 1: How to summarize the number of unique participants with major data types: Physical Measurements, Survey, and EHR;
3. Data Availability Part 2: How to delve a little deeper into data availability within each major data type;
4. Data Organization: An explanation of how data is organized according to our common data model.
5. Example Queries: How to directly query the CDR, using two examples of SQL queries to extract demographic data.
6. Expert Tip: How to access the base version of the CDR, for users that want to do their own cleaning.

Anticipated Findings

By reading and running the notebooks in this Tutorial Workspace, you will understand the following:

All of Us data are made available in a Curated Data Repository. Participants may contribute any combination of survey, physical measurement, and electronic health record data. Not all participants contribute all possible data types. Each unique piece of health information is given a unique identifier called a concept_id and organized into specific tables according to our common data model. You can use these concept_ids to query the CDR and pull data on specific health information relevant to your analysis. See our support article Learning the Basics of the All of Us Dataset for more info.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Laura Goetz - Early Career Tenure-track Researcher, Translational Genomics Research Institute

Duplicate of How to Work with All of Us Survey Data

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Tutorial Workspace created by the Researcher Workbench Support team. It is meant to provide instruction for key Researcher Workbench components and All of Us data representation.)

Scientific Questions Being Studied

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data.

What should you expect?
By running the notebooks in this workspace, you should get familiar with how to query PPI questions/surveys, what the frequencies of answers for each question in each PPI module are.

Scientific Approaches

Not available.

Anticipated Findings

By reading and running the notebooks in this Tutorial Workspace, researchers will learn the following:
- how to query the survey data,
- how to summarize PPI modules, and questions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jie Chen - Late Career Tenured Researcher, Augusta University

Duplicate of Mental Health Demonstration Project

Project Purpose(s)

  • Disease Focused Research (generalized anxiety disorder, depressive disorder, bipolar disorder)
  • Other Purpose (“This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use”.) ...

Scientific Questions Being Studied

As a demonstration project, this study aimed to explore the usability of the All of Us dataset and examined the prevalence of mental health conditions in the All of Us Research Program cohort. Specifically, we explored the lifetime prevalence of depressive disorder, bipolar disorder, and generalized anxiety disorder.

Our study looked prevalence rates for the above conditions in the following ways:
1. Prevalence in EHR data available by various demographic factors
2. Cohort characteristics
3. Congruency for diagnoses in EHR and self-report questionnaire
4. Among individuals who self-report as having been diagnosed with a mental health condition listed above, the percentage of individuals in treatment and associations between treatment and various demographic factors

Scientific Approaches

In this analysis, we calculated prevalence of mental health conditions by leveraging demographic information, questionnaire responses, and EHR data Specifically, we utilized the following surveys: Basics, Overall Health, Personal Medical History, and Healthcare Access PPIs. We utilized EHR data by creating a cohort of individuals with specific diagnoses code in their EHR. We referenced all relevant parent and child SNOMED codes for each mental health condition of the investigation (documented in Concept Set). Associations were calculated using Chi Square.

Anticipated Findings

We anticipated that the prevalence rates found in All of Us will be consistent with previous large scale studies, such as the National Comorbidity Survey. We found that the All of Us dataset is sensitive to detecting mood disorders and is usable for examining mental health conditions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Kai Yin Ho - Project Personnel, Northwestern University

Duplicate of Phenotype - Breast Cancer

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Questions Being Studied

The Notebooks in this Workspace can be used to implement well-known phenotype algorithms in one’s own research.

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

Ning Shang, George Hripcsak, Chunhua Weng, Wendy K. Chung, & Katherine Crew. Breast Cancer. Retrieved from https://phekb.org/phenotype/breast-cancer.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jie Chen - Late Career Tenured Researcher, Augusta University

Duplicate of Phenotype - Ischemic Heart Disease

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Questions Being Studied

The Notebooks in this workspace can be used to implement well-known phenotype algorithms in one’s own research.

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

Christianne L. Roumie; Jana Shirey-Rice, Sunil Kripalani. Vanderbilt University. MidSouth CDRN - Coronary Heart Disease Algorithm. PheKB; 2014. Available from https://phekb.org/phenotype/234

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Romit Bhattacharya - Research Fellow, The Broad Institute

Collaborators:

  • Sarah Urbut - Research Fellow, The Broad Institute

Duplicate of Phenotype - Ischemic Heart Disease

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Questions Being Studied

The Notebooks in this workspace can be used to implement well-known phenotype algorithms in one’s own research.

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

Christianne L. Roumie; Jana Shirey-Rice, Sunil Kripalani. Vanderbilt University. MidSouth CDRN - Coronary Heart Disease Algorithm. PheKB; 2014. Available from https://phekb.org/phenotype/234

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Henry Zheng - Graduate Trainee, University of California, Los Angeles

Duplicate of Phenotype - Type 2 Diabetes

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Questions Being Studied

The Notebooks in this Workspace can be used to implement well-known phenotype algorithms in one’s own research.

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

Jennifer Pacheco and Will Thompson. Northwestern University. Type 2 Diabetes Mellitus. PheKB; 2012 Available from: https://phekb.org/phenotype/18

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Nayyar Ahmed - Other, University of Pittsburgh

Duplicate of Systemic Disease and Glaucoma

Project Purpose(s)

  • Disease Focused Research (primary open angle glaucoma)
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy. ) ...

Scientific Questions Being Studied

We have previously published a predictive model of glaucoma progression using electronic health record (EHR) data pertaining to systemic attributes from a single institution. We aim to use the All of Us dataset to 1) serve as external validation for this single-center model and 2) to train new models focused on predicting glaucoma progression using systemic predictors. This is important to understand whether the original findings are generalizable and provide additional knowledge about the utility of systemic predictors on a national-level dataset.

Scientific Approaches

We will develop predictive models using the All of Us dataset using multivariable logistic regression, random forests, and artificial neural networks.

Anticipated Findings

We anticipate that the All of Us data will validate the findings from the model, which demonstrated that blood pressure-related metrics and certain medication classes had predictive value for glaucoma progression. In addition, we anticipate that the models trained with All of Us data will outperform the model trained with single institution data due to larger sample size and greater diversity.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Sally Baxter - Research Fellow, University of California, San Diego

Collaborators:

  • Tsung-Ting Kuo - Early Career Tenure-track Researcher, University of California, San Diego
  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Paulina Paul - Project Personnel, University of California, San Diego
  • Lucila Ohno-Machado
  • Luca Bonomi - Research Fellow, University of California, San Diego
  • Katherine Kim - Early Career Tenure-track Researcher, University of California, Davis
  • Jihoon Kim - Project Personnel, University of California, San Diego
  • Bharanidharan Radha Saseendrakumar - Project Personnel, University of California, San Diego

Duplicate of Test Workspace 2

Project Purpose(s)

  • Control Set ...

Scientific Questions Being Studied

Test

Scientific Approaches

Not available.

Anticipated Findings

Test

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Eric Song - Administrator, All of Us Program Operational Use

Duplicate of Working with All of Us Physical Measurements Data v1

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

Not available.

Anticipated Findings

N/A

Demographic Categories of Interest

  • Sex at Birth
  • Geography

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Cheryl Clark

Duplicate of Working with All of Us Physical Measurements Data v1

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

Not available.

Anticipated Findings

N/A

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Duplicate version of Demo - PheWAS Smoking for learning

Project Purpose(s)

  • Methods Development
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

As a demonstration project, this study will present the results of Phenome-Wide Association Studies (PheWAS) to show how the various sources of data contained within All of Us research dataset can be used to inform scientific discovery. We will perform separate PheWAS studies with smoking status as the independent variable. Specific questions include:

1. How can one implement a PheWAS within the All of Us Researcher Workbench?
2. How can one use heterogeneous data sources within the All of Us dataset to explore disease associations using self-reported exposures (Participant Provided Information, or “PPI”) and exposures captured in the electronic medical record (EHR).

Scientific Approaches

As a method for assessing the health burden of smoking on potential observed phenotypes, we implement a Phenome-Wide Association study. A Phenome-wide association study consists of an array of association tests over an indexed representation of the human phenome. In this analysis, we will conduct PheWAS for EHR derived smoking and PPI derived smoking exposures included in the All of Us research dataset. We will be representing "Smoking Exposure” in three ways:
EHR Smoking ICD Billing Codes
Participant Provided Information (PPI) Smoking lifetime 100 cigarettes yes/no
Participant Provided Information (PPI) Smoking lifetime smoking everyday
To perform PheWAS, we will map ICD representations of disease to a common vocabulary of PheCodes. We then use Jupyter Notebooks to create reusable functions to perform PheWAS and generate Manhattan Plots to summarize associations.

Anticipated Findings

For this study, we anticipate that we will be able to replicate known disease associations with smoking exposure. This will serve to demonstrate the quality, utility, and diversity of the All of Us data and tools and the power of gathering multiple data sources for a single phenotype, providing researchers options for study design and validation. Importantly the entire pheWAS package is made available for reuse by researchers in the Workbench, for new hypothesis generation.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • jie na - Project Personnel, Mayo Clinic

Collaborators:

  • Guoqian Jiang - Mid-career Tenured Researcher, Mayo Clinic

Duplicate_for_DRC_obesity_code_review

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Educational ...
  • Methods Development

Scientific Questions Being Studied

National obesity prevention and intervention strategies may benefit from precision medicine approaches that incorporate integrated data on environments, social determinants of health, and genomic factors. We examined the quality and utility of the All of Us Research Hub Workbench for accelerating precision medicine by replicating methods from existing studies that examine the prevalence of obesity at the population level. We evaluated the measurements of obesity in the participant measurement (PM) data set and the electronic health record (EHR) data set using methods similar to the Ward et al. NEJM December 2019 publication that assessed prevalence of obesity in the US by state using BRFSS data.

Scientific Approaches

For this population-based cross-sectional study of All of Us Research Workbench participants, we excluded individuals with measurements obtained during pregnancy or inpatient visits and individuals from states with fewer than 100 participants. Physical measurements (PM) of height and weight at the time of program enrollment of 142,116 participants and measured weight and height extracted from electronic health records (EHR) of 40,885 individuals were used to calculate body-mass index (BMI). We did a complete case analysis for All of Us participants with known sex (male or female), race, income and education levels and estimated state-specific and demographic subgroup-specific prevalence of categories of BMI [obesity (BMI ≥30) and extreme obesity (BMI ≥ 35)] nationwide and for each state: overall and by subgroups, male and female. We examined the difference between EHR and PM calculated BMI by state.

Anticipated Findings

Using states with at least 100 participants, PM data included 142,116 individuals (mean [SD] age, 51.2 [16.6] and EHR data on height and weight included 40,885 individuals (mean [SD] age, 52.5 [16.5]. The median BMI for PM participants was 28.4 [24.4 to 33.7]; the median BMI for EHR was 29.0 [24.8 to 34.5]. The PM national prevalence for obesity (includes BMI>30 and BMI >35) and extreme obesity (BMI >35) were 41.2 % (95% Confidence Interval [CI], 40.9 to 41.4) and 20.8% (95% CI, 20.6 to 21.0), respectively, with large variations across states. Women had higher prevalence of extreme obesity than men in all selected states. Subgroups with extreme obesity (BMI, >35) prevalence greater than 25% included subgroup, N, prevalence %, (95% CI): Black NH, 8913, 28.9 (25.8 to 32.0) , individuals with income less than $25,000, 13,244, 25.1 (22.1 to 28.1); education of high school to some college, 17, 272, 26.1 (23.1 to 29.1) and the region of the South, 6,639, 25.3 (22.3 to 28.3).

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

earlyonsetcolorectcalcancer

Project Purpose(s)

  • Disease Focused Research (colorectal cancer) ...

Scientific Questions Being Studied

Examine the demographic ,geographic, inflammatory biomarker differences of early onset versus late-onset colorectal cancer to determine potential biomarkers for identification of individuals at increased risk for colorectal cancer that may benefit from early screening.

Scientific Approaches

Compare individuals with and without colorectal cancer overall in the All of Us cohort by demographics, geography, and biomarkers associated with increased risk of CRC (ESR, triglycerides, BMI, systolic blood pressure, waist circumference, ApoB100, hemoglobin A1C) . Look at biomarkers at least 2 years prior to year of diagnosis.

Anticipated Findings

Identify biomarkers that may guide future research into the biology of early onset colorectal cancer.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Effect of Pyridoxine in Type 2 Diabetics

Project Purpose(s)

  • Disease Focused Research (Diabetes mellitus) ...

Scientific Questions Being Studied

The aim of this study is to investigate if Pyridoxine use can benefit diabetics in preventing long term complications by inhibiting formation of activated glycation end products and improving clinical outcomes.

Diabetes and hyperglycemia are affecting over 415 million people worldwide, and by 2040 the number is expected to increase to 642 million. Chronic hyperglycemia results in the glycation of proteins and other biomolecules resulting in generation of AGEs. Glycation can be identified as the core reason for diabetes associated disorders. The interaction of AGEs with their receptor elicits oxidative stress and as a result evokes proliferative, inflammatory, thrombotic and fibrotic reactions in a variety of cells. Therefore, inhibiting the glycation process might be an effective way to prevent the complications of chronic hyperglycemia.

Scientific Approaches

Dataset: type 2 DM patients
Inclusion Criteria: Diabetics not on insulin, Age > 18 and < 70. A1c >6.5%
Exclusion Criteria: History of uncontrolled DM with A1c >8.5, hemoglobinopathy, sickle cell disease, thalassemia, anemia(iron def, pernicious anemia, B12 def., Folate def.) blood transfusion in the last 9 months, coagulopathy, blood thinner treatment, treatment with B6/B12/folate/iron in the last 3 months,treatment for TB or INH treatment, asplenia, pregnant or planning pregnancy in the next 6 months.

Research Method:
Blood and urine labs such as HGB A1c, HGB/HCT, fructosamine, fasting lipids, microalbuminuria, 24-hour creatinine/ protein, reticulocyte count and glycated albumin will be analysed.
Lab parameters and outcomes with patients on pyridoxine 100 mg po daily will be compared with subjects not on pyridoxine.

Anticipated Findings

1. Pyridoxine can decrease HbA1c in Type 2 diabetics
2. Pyridoxine can decrease Glycated albumin, Glycomark, microalbuminuria

If pyridoxine can reduce AGE without the risk of hypoglycemia and without other side effects , Pyridoxine should be used in diabetic patients. This will save diabetic patients from the known complications of hyperglycemia without the side effect of anti-diabetic medication.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Bijun Kannadath - Early Career Tenure-track Researcher, University of Arizona

Collaborators:

  • Jiali Ling - Project Personnel, University of Arizona

End of Life Prediction

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

We plan to test the compatibility of the omop-learn library with the All of Us dataset. This library was developed by the Clinical Machine Learning Group at MIT, and facilitates rapid prototyping in Python of predictive machine-learning models using longitudinal medical data from an OMOP CDM-standard database. This test is important to verify that other researchers will be able to use the omop-learn library to run their own predictive tasks on this particular dataset, as well as other OMOP datasets.

Scientific Approaches

We plan to test the library with the task of predicting mortality over a six-month window for patients over the age of 70. Using the omop-learn library, we will choose a cohort based on their age and enrollment during the training and outcome windows. Then, we will build a sparse feature matrix of drug, condition, procedure, and specialty features. Using this dataset, we will use two methods to predict outcomes: a simple logistic regression model, and the SARD model, which is a model deep-learning algorithm.

Anticipated Findings

We anticipate that omop-learn will be compatible with the All of Us dataset. Our findings here will allow us to identify and resolve any compatibility issues we encounter, which will enable researchers to use omop-learn for their own predictive modeling tasks.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • justin Lim - Graduate Trainee, Massachusetts Institute of Technology

Epidemiology of PCOS

Project Purpose(s)

  • Disease Focused Research (Polycystic ovary syndrome)
  • Population Health ...

Scientific Questions Being Studied

Polycystic ovary syndrome (PCOS) is the most common endocrine disorder in women of reproductive age and one of the leading causes of infertility. Minority females with PCOS are more at risk of developing detrimental metabolic outcomes. Therefore, are scientific question is what PCOS risk factors differ by race and/or ethnicity?

Scientific Approaches

We will leverage the All of Us data to identify females with PCOS and characterize their risk factors using demographic data, ICD codes, lab values, and socioeconomic status.

Anticipated Findings

We anticipate that we will find phenotypic differences related to metabolic dysfunction between racially and ethnically diverse females with PCOS.

Demographic Categories of Interest

  • Race / Ethnicity
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Ky'Era Actkins - Graduate Trainee, Meharry Medical College

Evidence of the Latino Epidemiologic Paradox in the All of Us Research Project

Project Purpose(s)

  • Disease Focused Research (Cardiovascular disease)
  • Population Health ...
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)

Scientific Questions Being Studied

The overall goal of this project is to examine whether there is evidence of the Latino Epidemiological Paradox within the All of Us Research Project (AoURP) cohort. The specific aims are:

Specific Aim 1
To determine whether Latinos have lower prevalence of gender stratified age-adjusted CVD versus NHWs and non-Hispanic blacks in the cohort.

Specific Aim 2
To determine whether Latinos have lower prevalence of gender stratified age-adjusted cancer (overall) versus NHWs and non-Hispanic blacks in the cohort.

Specific Aim 3
To determine whether Latinos have higher prevalence of gender stratified age-adjusted diabetes and obesity (overall) versus NHWs and non-Hispanic blacks in the cohort.

Specific Aim 4
To extent possible examine differences by Latino subgroups and among foreign born versus US born Latinos.

Scientific Approaches

Study population. All of Us Research Project core participants. We will examine data from different data sources including electronic health records (EHR) and participant provided information (PPI) and physical measurements.

Main outcome variables: we will work with the DRC Research Support Team to obtain support for their existing classification scheme for common complex diseases which in this project would include cardiovascular disease, cancer (including subtypes to extent possible) and Diabetes (Type 2). For the definition of diseases we will use EHR data to preserve very objective outcomes, excluding for now survey data.

Statistical analysis
We will present all data stratified by gender adn age adjusted using direct standardization. BMI categories would be <25, 25-30, 30-35 and >35). For diabetes AIC data will be categorized (AIC <7, AIC 7-9 and AIC > 9).

Anticipated Findings

We expect to find evidence of the Latino Epidemiological Paradox within the All of Us Research Project (AoURP) cohort. We expect to find that despite multiple social and economic disadvantages, overall on many measures of population health Latinos seem to have a more favorable health advantage than other racial/ethnic minority groups such as blacks and in some measures even better health status than Non-Hispanic Whites (NHWs).

Previous studies like the Study of Latinos (SOL), which is the largest study of Latinos (16,000), aimed to examine this paradox but had the limitation that only included Latinos and thus comparative data on non-Latinos was not collected. With 40,000 Latinos core participants in the AllofUs study (as well 160,000 non Latinos), the AoURP study is uniquely positioned to contribute our knowledge and further understanding of this paradox.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth

Research Team

Owner:

  • Raul Montanez Valverde - Graduate Trainee, University of Miami

Collaborators:

  • olveen carrasquillo - Late Career Tenured Researcher, University of Miami

Exercise, HIV, Mental Health, Medication Adherence, Substance use

Project Purpose(s)

  • Social / Behavioral ...

Scientific Questions Being Studied

The aims of these analyses will be to determine if poor mental health and substance use mediate the relationship between exercise and medication adherence in people living with HIV.

I hypothesize that people living with HIV who exercise will have better mental health and less substance use behaviors and thus more consistent medication adherence. We expect the opposite for people living with HIV who exercise less.

Scientific Approaches

The dataset will include all people living with HIV who also answer questionnaires about their exercise, mental health, substance use, and medication adherence.

Anticipated Findings

I hypothesize that people living with HIV who exercise will have better mental health and less substance use behaviors and thus more consistent medication adherence. We expect the opposite for people living with HIV who exercise less.

Demographic Categories of Interest

  • Race / Ethnicity
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Nick SantaBarbara - Research Fellow, University of California, Los Angeles

Exploration of data for use in predicting cancer diagnosis

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

This is my initial project, and is set up to explore the possibilities with All of Us data.

My scientific interest is to better understand why some people develop specific cancers and some do not.

Scientific Approaches

I plan to use machine learning applied to germ line DNA data to answer these questions.

Anticipated Findings

Ultimately the findings from this line of work should lead to better predictive tests for cancer. An example is that someday you might be able to take a blood sample from a young adult and tell them that they will probably develop colon cancer sometime in the next 40 years. This might lead them to screen for colon polyps more rigorously and ultimately let them avoid a late stage colon cancer diagnosis.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • James Brody - Mid-career Tenured Researcher, University of California, Irvine

Exploratory Analysis

Project Purpose(s)

  • Population Health
  • Methods Development ...

Scientific Questions Being Studied

Lab reference ranges have traditionally been derived from healthy populations. To the extent that a patient fits into that population, the reference range is pertinent to the patient. Unfortunately, for patients with chronic diseases or multiple comorbidities, their "normal" (usual value) may not fall into the reference interval for a healthy cohort and be flagged as “abnormal” even though it does not represent a significant or actionable change in the patient’s pathophysiology. In an internal stakeholder analysis at the Veteran's Affairs, we found 1-17% (median 5%) of the abnormal alerts were clinically useful; the rest (95%) represent noise (values that had to be reviewed but have no clinical significance). We believe that novel techniques may enable the creation precision
references intervals that are unique to each patient.

Scientific Approaches

Starting with population based reference intervals, as we acquire information about a patient from prior clinical lab work, diagnostic coding information, and disease evolution, we should be able to set individualized (precision) reference intervals. We hope to adopt methods from statistical process control and Bayesian statistics to adapt population level reference ranges to the individual.

Anticipated Findings

We hope to create a computational model for establishing precision medicine reference intervals for three common outpatient laboratory tests with low signal to noise ratios in patients with chronic diseases. This model would provide better reference ranges for the unique physiology of these patients and would provide physicians with a better understanding of abnormality in their context.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Alistair Johnson - Early Career Tenure-track Researcher, Massachusetts Institute of Technology

explore

Project Purpose(s)

  • Disease Focused Research (disease of mental health)
  • Population Health ...
  • Social / Behavioral
  • Drug Development
  • Methods Development

Scientific Questions Being Studied

This research is focused on leveraging digital phenotyping techniques to understand user's behavioral patterns and its relationship to mental health, We would like to understand the behavioral markers of the disease as manifested with mobile phone usage. The research focuses on understanding the patterns of behavior as manifested in sensor data. Moreover, since the All of Us dataset offers data from a wide range of experimental modalities, it offers the opportunity to find correlations in the data never explored before such as the relationship of behavior and clinical records. Also, the dataset allows performing research in a wide range of cultural backgrounds and age ranges.

Scientific Approaches

At the Institute for Medical and Engineering Sciences at MIT, we are interested in a data science approach to understanding human behavior via digital phenotyping. We are also interested in the rich context of the individual. We plan to leverage new methods in machine learning such as deep learning to make sense of longitudinal behavioral data.

Anticipated Findings

As of today, we do not know what the biomarkers or behavioral markers of mental disease are. This research hopes to shine light into the most basic understanding of human behavior and its relationship to mental disease as measured with mobile sensing technology. The All of Us dataset allows us to perform this research in a wider audience and with more experimental variables that as never performed before.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Disability Status
  • Education Level
  • Income Level

Research Team

Owner:

  • Omar Costilla Reyes - Research Fellow, Massachusetts Institute of Technology

Exploring

Project Purpose(s)

  • Educational
  • Other Purpose ( exploring the workbench as part of an applied biomedical informatics graduate course and that you’ll be leveraging AoU for educational purposes. ) ...

Scientific Questions Being Studied

How useful is All of Us data in biomedical and public health research? For this workspace I intend on looking around the workspace and understanding how the information in All of US will help formulate new hypotheses. I intend on using BMI data and perhaps other types of data to help me in this analysis.

Scientific Approaches

I intend on using the workspace to review tools, processes, and data across the longitudinal cohort of AoU participants.

Anticipated Findings

In exploring the All of Us workspace, I will understand if it is a viable tool for my research. If it is, I may continue to use this tool in the future.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Disability Status
  • Access to Care
  • Income Level

Research Team

Owner:

  • Michelle Gomez - Graduate Trainee, Vanderbilt University

Exploring All of Us

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

This workspace will be used to get to know the tools and features of All of Us. We hope that by getting this experience, we can better help researchers at our institution who are using the Workbench for research.

Scientific Approaches

We are interested in understanding how to work with this data in R and Jupyter notebooks.

Anticipated Findings

As this is an exploration of All of Us and its features, there are no anticipated findings. However, by doing this exploration we may be better able to support researchers producing findings from their research.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Amy Yarnell - Other, University of Maryland, Baltimore

Collaborators:

  • Jean-Paul Courneya - Other, University of Maryland, Baltimore

Eye Related

Project Purpose(s)

  • Disease Focused Research (eye disease) ...

Scientific Questions Being Studied

I am currently exploring the data to determine the amount of eye health related information exists in All of US. This will help determine future research questions.

Scientific Approaches

At this stage, my use of the workbench is exploratory. I will be using data mining techniques to look for eye health related data to inform future research.

Anticipated Findings

I expect to find some data relevant to eye health to provide a basis for further study.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Kerry Goetz - Project Personnel, NIH

First

Project Purpose(s)

  • Other Purpose (Learn how to use workbench (first execution)) ...

Scientific Questions Being Studied

This is my first execution of workbench. There are no specific research questions. I hope to learn how to use the workbench. As new user, I don't know what to expect and what functions the workbench has.

Scientific Approaches

that are no formal approaches planed for this project. This is the first project of this user and the purpose is to learn how to use all possible workbench functions.

Anticipated Findings

demonstrate ability to execute workbench tools and functions. It will consist of query and results that flow from such query. E.g., SQL query results.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

First Test Workspace

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

Exploratory data analysis to start, to see if can support research questions around clinical decision support applications:

- Can the results of microbiology culture tests be accurately predicted based on available patient / clinical data at the time of test ordering?

- Can the clinical orders from new specialty consultation visits be predicted based on available patient / clinical data at the time of referral from a generalist?

Scientific Approaches

Supervised and unsupervised machine learning models (e.g., collaborative filtering) applied to clinical data sources to predict subsequent labels in the form of clinical test orders and results.
Cases where patients receive empiric antibiotic prescriptions (simultaneous antibiotics with new diagnostic microbiology culture tests).
Cases where a patient is referred to and then subsequent sees a specialist (e.g., endocrinology or hematology).

Anticipated Findings

Clinical orders and tests results are sufficiently predictable given available data that they can power clinical decision support information retrieval tools to aid clinical decision making under uncertainty.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jonathan Chen - Early Career Tenure-track Researcher, Stanford University

For training and learning

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

The workspace is aimed to develop a learning module and provide and exposure to students on potential social science based implications for occupational choices.

Scientific Approaches

The workspace aims to use traditional statistical methods in Python.

Anticipated Findings

Developing a deeper understanding of the dataset and the baseline descriptives related to occupational choice.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Pankaj Patel - Late Career Tenured Researcher, Villanova University

For_DRC_obesity_code_review

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Educational ...
  • Methods Development

Scientific Questions Being Studied

National obesity prevention and intervention strategies may benefit from precision medicine approaches that incorporate integrated data on environments, social determinants of health, and genomic factors. We examined the quality and utility of the All of Us Research Hub Workbench for accelerating precision medicine by replicating methods from existing studies that examine the prevalence of obesity at the population level. We evaluated the measurements of obesity in the participant measurement (PM) data set and the electronic health record (EHR) data set using methods similar to the Ward et al. NEJM December 2019 publication that assessed prevalence of obesity in the US by state using BRFSS data.

Scientific Approaches

For this population-based cross-sectional study of All of Us Research Workbench participants, we excluded individuals with measurements obtained during pregnancy or inpatient visits and individuals from states with fewer than 100 participants. Physical measurements (PM) of height and weight at the time of program enrollment of 142,116 participants and measured weight and height extracted from electronic health records (EHR) of 40,885 individuals were used to calculate body-mass index (BMI). We did a complete case analysis for All of Us participants with known sex (male or female), race, income and education levels and estimated state-specific and demographic subgroup-specific prevalence of categories of BMI [obesity (BMI ≥30) and extreme obesity (BMI ≥ 35)] nationwide and for each state: overall and by subgroups, male and female. We examined the difference between EHR and PM calculated BMI by state.

Anticipated Findings

Using states with at least 100 participants, PM data included 142,116 individuals (mean [SD] age, 51.2 [16.6] and EHR data on height and weight included 40,885 individuals (mean [SD] age, 52.5 [16.5]. The median BMI for PM participants was 28.4 [24.4 to 33.7]; the median BMI for EHR was 29.0 [24.8 to 34.5]. The PM national prevalence for obesity (includes BMI>30 and BMI >35) and extreme obesity (BMI >35) were 41.2 % (95% Confidence Interval [CI], 40.9 to 41.4) and 20.8% (95% CI, 20.6 to 21.0), respectively, with large variations across states. Women had higher prevalence of extreme obesity than men in all selected states. Subgroups with extreme obesity (BMI, >35) prevalence greater than 25% included subgroup, N, prevalence %, (95% CI): Black NH, 8913, 28.9 (25.8 to 32.0) , individuals with income less than $25,000, 13,244, 25.1 (22.1 to 28.1); education of high school to some college, 17, 272, 26.1 (23.1 to 29.1) and the region of the South, 6,639, 25.3 (22.3 to 28.3).

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

For_HTN_code_review

Project Purpose(s)

  • Other Purpose (This work is an AoU demo project. Demo projects are efforts by the AoU Research Program designed to meet the program goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. As an approved demo project, this work was reviewed and overseen by the AoU Research Program Science Committee and the AoU Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use. ) ...

Scientific Questions Being Studied

We are using the All of Us Researcher Workbench interface to answer the question, "Is hypertension prevalence in the All of Us Research Program similar to hypertension prevalence in the 2015–2016 National Health and Nutrition Examination Survey (NHANES) ?". Clinical approaches to understanding and treating hypertension may benefit from the integration of a precision medicine approach that integrates data on environments, social determinants of health, behaviors, and genomic factors that contribute to hypertension risk. Hypertension is a major public health concern and remains a leading risk factor for stroke and cardiovascular disease.

Scientific Approaches

In this cross-sectional, population-based study, we used All of Us baseline data from patient (age>18) provided information (PPI) surveys and electronic health record (EHR) blood pressure measurements and retrospectively examined the prevalence of hypertension in the EHR cohort using Systemized Nomenclature of Medicine (SNOMED codes and blood pressure medications recorded in the EHR. We used the EHR data (SNOMED codes on 2 distinct dates and at least one hypertension medication) as the primary definition, and then add subjects with elevated systolic or elevated diastolic blood pressure on measurements 2 and 3 from PPI. We extracted each participant’s detailed dates of SNOMED code for essential hypertension from the Researcher Workbench table ‘cb_search_all_events’. We calculated an age-standardized HTN prevalence according to the age distribution of the U.S. Census, using 3 groups (18-39, 40-59, ≥ 60).

Anticipated Findings

The prevalence of hypertension in the All of Us cohort is similar to that of published literature. All of Us age-adjusted HTN prevalence was 27.9% compared to 29.6% in National Health and Nutrition Examination Survey. The All of Us cohort is a growing source of diverse longitudinal data that can be utilized to study hypertension nationwide. The prevalence of hypertension varies in the United States (U.S.) by age, sex, and socioeconomic status. Hypertension can often be treated successfully with medication, and prevented or delayed with lifestyle modifications. Even with these established hypertension intervention and prevention strategies, the prevalence of hypertension continues to be at levels of public health concern. The diversity within All of Us may provide insight into factors relevant to hypertension prevention and treatments in a variety of social and geographic contexts and population strata in the U.S.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital
  • Elizabeth Karlson - Late Career Tenured Researcher, Massachusetts General Hospital
  • Cheryl Clark

for_obesity_code_review

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Educational ...
  • Methods Development

Scientific Questions Being Studied

National obesity prevention and intervention strategies may benefit from precision medicine approaches that incorporate integrated data on environments, social determinants of health, and genomic factors. We examined the quality and utility of the All of Us Research Hub Workbench for accelerating precision medicine by replicating methods from existing studies that examine the prevalence of obesity at the population level. We evaluated the measurements of obesity in the participant measurement (PM) data set and the electronic health record (EHR) data set using methods similar to the Ward et al. NEJM December 2019 publication that assessed prevalence of obesity in the US by state using BRFSS data.

Scientific Approaches

For this population-based cross-sectional study of All of Us Research Workbench participants, we excluded individuals with measurements obtained during pregnancy or inpatient visits and individuals from states with fewer than 100 participants. Physical measurements (PM) of height and weight at the time of program enrollment of 142,116 participants and measured weight and height extracted from electronic health records (EHR) of 40,885 individuals were used to calculate body-mass index (BMI). We did a complete case analysis for All of Us participants with known sex (male or female), race, income and education levels and estimated state-specific and demographic subgroup-specific prevalence of categories of BMI [obesity (BMI ≥30) and extreme obesity (BMI ≥ 35)] nationwide and for each state: overall and by subgroups, male and female. We examined the difference between EHR and PM calculated BMI by state.

Anticipated Findings

Using states with at least 100 participants, PM data included 142,116 individuals (mean [SD] age, 51.2 [16.6] and EHR data on height and weight included 40,885 individuals (mean [SD] age, 52.5 [16.5]. The median BMI for PM participants was 28.4 [24.4 to 33.7]; the median BMI for EHR was 29.0 [24.8 to 34.5]. The PM national prevalence for obesity (includes BMI>30 and BMI >35) and extreme obesity (BMI >35) were 41.2 % (95% Confidence Interval [CI], 40.9 to 41.4) and 20.8% (95% CI, 20.6 to 21.0), respectively, with large variations across states. Women had higher prevalence of extreme obesity than men in all selected states. Subgroups with extreme obesity (BMI, >35) prevalence greater than 25% included subgroup, N, prevalence %, (95% CI): Black NH, 8913, 28.9 (25.8 to 32.0) , individuals with income less than $25,000, 13,244, 25.1 (22.1 to 28.1); education of high school to some college, 17, 272, 26.1 (23.1 to 29.1) and the region of the South, 6,639, 25.3 (22.3 to 28.3).

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Functional GI Disorders in African American/Black Patients

Project Purpose(s)

  • Disease Focused Research (Functional Gastrointestinal Disorders) ...

Scientific Questions Being Studied

We are currently exploring the prevalence of Functional Gastrointestinal Disorders (FGIDs) in African American/Black patients, as well as any correlated comorbidities and other factors, such as lifestyle and healthcare utilization. Although comorbid conditions have been well described in white FGID patients, they have not been well described in African American/Black FGID patients. We hypothesize that African American/Black FGID patients will have significantly different comorbidities compared to white FGID patients. We hope to better describe the distribution of FGIDs in African American/Black patients and identify risk factors for the development of FGIDs within these patients in order to help clinicians to better identify and treat FGIDs in African American/Black patients.

Scientific Approaches

We plan to identify African American/Black patients with Functional Gastrointestinal Disorder (FGID) diagnoses, such as Irritable Bowel Syndrome, Functional Dyspepsia, Chronic Idiopathic Constipation, etc., within the All of Us database. We also intend to collect these patients' comorbidity data, as well as survey data- specifically Lifestyle, Personal Medical History, and Health Care Access & Utilization survey data. We will then use multivariate regression modeling to identify comorbidities and other factors, such as degree of healthcare utilization, that are associated with specific FGID diagnoses. We will also use a control dataset of white FGID patients from the All of US database in order to identify any key differences in risk factors for FGIDs when compared to the African American/Black FGID cohort.

Anticipated Findings

We hope to better describe and characterize comorbidities and other factors associated with Functional Gastrointestinal Disorders (FGID) within an African American/Black patient population, which has not been well described in the scientific literature up until this point. We hope to identify risk factors for the development of specific FGID diagnoses, which could help clinicians better identify and treat FGIDs in African American/Black patients. We also hope to show key differences in the prevalence of different FGIDs and their associate comorbidites within an African American/Black population compared to a white population.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Casey Silvernale - Graduate Trainee, Massachusetts General Hospital