Research Projects Directory

Research Projects Directory

At this time, all listed projects are using data in the registered tier. The registered tier contains individual-level data from electronic health records, survey answers, and physical measurements. These data have been altered to protect participant privacy.

Note: Researcher Workbench users provide information about their research projects independently. Any views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program.

Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

There are currently 214 active workspaces. This information was updated on 9/24/2020.

Sort By Title:

D014 - Opioids

Project Purpose(s)

  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

As a demonstration project, this study will present the results of prevalence of opioid use in the United States. Specific questions include:

1. What is the prevalence of prescription opioids received from healthcare systems?
2. What is the prevalence of opioids misuse including nonmedical prescription opioids use and street opioid use?
3. Data in both previous questions will also be stratified by geographic region

Scientific Approaches

We will identify prevalence of opioid use in two ways and stratified by state.
First, we use EHR Drug Exposures to capture use of prescription opioid.
Second, we use lifestyle survey questionnaire to capture substance use reported by patients themselves:
1. In your LIFETIME, which of the following substances have you ever used?
2. In the PAST THREE MONTHS, how often have you used this substance?
The prevalence will be stratified by state, therefore EHR Observation Table will be used to capture this information.

Anticipated Findings

For this study, we anticipate that we will be able to replicate previous national studies of estimating prevalence of opioids. All of Us workbench research data also provides an alternative tool for assessing prevalence rate of substance use and prescription opioids for US population.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Hsueh-Han Yeh - Research Associate, Henry Ford Health System

D015-housing

Project Purpose(s)

  • Population Health
  • Other Purpose (The data can provide evidence of AOU ability to replicate findings around social determinants and the ability to identify vulnerable populations in our cohort.) ...

Scientific Questions Being Studied

What is the prevalence of housing insecurity among current participants in the All of Us study? What individual-level factors are related to housing insecurity, including demographics, indicators of health care access, and perceived health status?

Scientific Approaches

We will determine the prevalence of housing insecurity in the All of Us study sample using data collected in the Basics module (“worried or concerned about not having a place to live”). We will use housing insecurity as the dependent variable in a multivariate analysis to determine the relationship of healthcare access and health services utilization. Finally, we will report the independent relationship between housing insecurity and healthcare access, adjusting for the covariates and conducting stratified analyses as appropriate.

Anticipated Findings

Recently, investigators examined the relationship of housing insecurity using the 2011-2015 BRFSS and found a 12.6% prevalence among the >228,000 in the study sample. All of Us can replicate these findings among its core participants using questionnaire items similar to those used by investigators.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Amy Tang - Early Career Tenure-track Researcher, Henry Ford Health System

D027-MS

Project Purpose(s)

  • Disease Focused Research (multiple sclerosis)
  • Other Purpose (Provide evidence of AOU ability to replicate findings on the prevalence and demographics of MS ) ...

Scientific Questions Being Studied

Objective: Determine the prevalence, demographics and regional distribution of multiple sclerosis (MS) in the All of Us Research Program?

Scientific Approaches

Study population: All of Us Research Program participants who have given access to their electronic health record information and who have answered the Basics survey, and who have answered Personal Medical History survey.

Data analysis: We will determine the prevalence of multiple sclerosis in the All of Us Research Program electronic medical record data and personal medical history survey with three different cohorts: patients had EHR only, survey only and both EHR and Survey. Those data will then be stratified by age, sex, race/ethnicity and region as self-reported in the Basics PPI survey.

Anticipated Findings

We anticipate that the AoURP will have prevalence and demographics of MS as recent previous studies. We further anticipate that findings regarding MS in AoURP participants' EHR will be similar to those in the survey data.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Cathryn Peltz - Other, Henry Ford Health System

Collaborators:

  • Amy Tang - Early Career Tenure-track Researcher, Henry Ford Health System

D029

Project Purpose(s)

  • Disease Focused Research (cardio vascular disease, cancer (all types), diabetes )
  • Population Health ...

Scientific Questions Being Studied

The overall goal of this project is to examine whether there is evidence of the Latino Epidemiological Paradox within the All of Us Research Project (AoURP) cohort.
. In this proposal, we will perform analysis that would seek to examine this phenomenon. We will address the following aims:
• Specific Aim 1. To determine whether Latinos have lower prevalence of gender stratified age-adjusted CVD versus NHWs and non-Hispanic blacks in the cohort.
• Specific Aim 2. To determine whether Latinos have lower prevalence of gender stratified age-adjusted cancer (overall) versus NHWs and non-Hispanic blacks in the cohort
• Specific Aim #3. To determine whether Latinos have higher prevalence of gender stratified age-adjusted diabetes and obesity (overall) versus NHWs and non-Hispanic blacks in the cohort
• Specific Aim #4: To extent possible examine differences by Latino subgroups and among foreign born versus US born Latinos.

Scientific Approaches

Not available.

Anticipated Findings

to determine whether there is evidence of the Latino epidemiological paradox in the AoURP cohort.

Demographic Categories of Interest

Not available.

Research Team

Owner:

  • olveen carrasquillo - Late Career Tenured Researcher, University of Miami

D16_HTN_revision_after_code_review

Project Purpose(s)

  • Other Purpose (This work is an AoU demo project. Demo projects are efforts by the AoU Research Program designed to meet the program goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. As an approved demo project, this work was reviewed and overseen by the AoU Research Program Science Committee and the AoU Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use. ) ...

Scientific Questions Being Studied

We are using the All of Us Researcher Workbench interface to answer the question, "Is hypertension prevalence in the All of Us Research Program similar to hypertension prevalence in the 2015–2016 National Health and Nutrition Examination Survey (NHANES) ?". Clinical approaches to understanding and treating hypertension may benefit from the integration of a precision medicine approach that integrates data on environments, social determinants of health, behaviors, and genomic factors that contribute to hypertension risk. Hypertension is a major public health concern and remains a leading risk factor for stroke and cardiovascular disease.

Scientific Approaches

In this cross-sectional, population-based study, we used All of Us baseline data from patient (age>18) provided information (PPI) surveys and electronic health record (EHR) blood pressure measurements and retrospectively examined the prevalence of hypertension in the EHR cohort using Systemized Nomenclature of Medicine (SNOMED codes and blood pressure medications recorded in the EHR. We used the EHR data (SNOMED codes on 2 distinct dates and at least one hypertension medication) as the primary definition, and then add subjects with elevated systolic or elevated diastolic blood pressure on measurements 2 and 3 from PPI. We extracted each participant’s detailed dates of SNOMED code for essential hypertension from the Researcher Workbench table ‘cb_search_all_events’. We calculated an age-standardized HTN prevalence according to the age distribution of the U.S. Census, using 3 groups (18-39, 40-59, ≥ 60).

Anticipated Findings

The prevalence of hypertension in the All of Us cohort is similar to that of published literature. All of Us age-adjusted HTN prevalence was 27.9% compared to 29.6% in National Health and Nutrition Examination Survey. The All of Us cohort is a growing source of diverse longitudinal data that can be utilized to study hypertension nationwide. The prevalence of hypertension varies in the United States (U.S.) by age, sex, and socioeconomic status. Hypertension can often be treated successfully with medication, and prevented or delayed with lifestyle modifications. Even with these established hypertension intervention and prevention strategies, the prevalence of hypertension continues to be at levels of public health concern. The diversity within All of Us may provide insight into factors relevant to hypertension prevention and treatments in a variety of social and geographic contexts and population strata in the U.S.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital
  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Elizabeth Karlson - Late Career Tenured Researcher, Massachusetts General Hospital

Data element analysis of AllofUS

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

Exploring the ALLofUS datasets, to discover the usage of different dta elements and they are used to define and categorize different cohorts of patients. The a analysis will also aim to identify different phenotype methods based on avaialble data elements in the dataset.

Scientific Approaches

Using the complete dataset, we will study the volume and usage of data elements potentially used for phenotyping. This included conducting an analysis of different data element volume and the diversity of values used for distinct data elements.

Anticipated Findings

Our findings will include an understanding of how each data element is used, how commonly and how often values are populated and what are common values for different elements, all in an effort to discover different techniques for phenotype development.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Craig Mayer - Project Personnel, NIH

Data Management

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

Students may be asked to verbally give a brief summary of what they learned from the reading during the lecture portion of the class. This summary, along with discussion during class and engagement over Teams will contribute to Instructors’ subjective assessment of students’ participation.
A final project will be assigned around the fourth week of the class. This project will tie together multiple concepts introduced in the course. In the last class session, each student will turn in a writeup of their final project and present their work to the class.

Scientific Approaches

Some of the work over the course of the semester will include review of tools, processes, and data across the longitudinal cohort of AoU participants.

Anticipated Findings

A final project will be assigned around the fourth week of the class. This project will tie together multiple concepts introduced in the course. In the last class session, each student will turn in a writeup of their final project and present their work to the class.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Marily Barron - Graduate Trainee, Vanderbilt University

Data Quality and Data Characterization

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

This research project will use AoU data to test data quality methods. It will also use AoU to provide reference benchmark data for testing data quality. The analysis will include data characterization. Data will be analyzed if it conform to expected patterns.

Scientific Approaches

Achilles R package developed by OHDSI is example of data quality and data characterization tool. The approach will include running SQL or other analytical queries on AoU dataset.

Anticipated Findings

We will understand how data is structured either as a whole and what are the differences in data from sites.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Collaborators:

  • Craig Mayer - Project Personnel, NIH

DataExploration

Project Purpose(s)

  • Social / Behavioral
  • Educational ...
  • Methods Development

Scientific Questions Being Studied

Explore the collected data set so far and determine the type of research and education activities we can perform in future work.

Scientific Approaches

Descriptive statistics will be calculated to understand the data. In certain cases, we will also use data visualization. We will use Python and R packages for the data analysis.

Anticipated Findings

A clear understanding about the current data set.

Demographic Categories of Interest

  • Age
  • Geography
  • Disability Status
  • Access to Care

Research Team

Owner:

  • Leming Zhou - Project Personnel, University of Pittsburgh

Dementia-Hypertension-Diabetes-2

Project Purpose(s)

  • Disease Focused Research (dementia)
  • Methods Development ...
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy.)

Scientific Questions Being Studied

Alzheimer’s disease is a neurodegenerative condition characterized by a progressive decline in cognitive function (dementia). Studies suggest that patients with elevated blood pressure (hypertension) are at risk of Alzheimer’s disease type dementias. High blood sugar levels or Type2 Diabetes Mellitus may also be associated with an increased risk of dementia. Some minority populations may have an increased incidence of hypertension and diabetes. For example, African Americans have a higher incidence of hypertension. Therefore we will to investigate the grouping of racial and ethnic categories, with respect to the incidence of hypertension, diabetes and dementia, to determine whether minority groups have a stronger association between dementia and co-morbidities by race/ ethnicity.
The goal of this demonstration project is to validate previous research showing potential interactions between dementia, diabetes, and hypertension, with an explicit consideration of race/ ethnicity.

Scientific Approaches

Data from participants (aged 40 or over) will be subjected to statistical analysis to identify interactions between the incidence of dementia, Diabetes, and Hypertension, and self-identified Race/ Ethnicity. We will only analyze participants in this age group, because the incidence of dementia is very low in patients younger than 40. We will only analyze patients with electronic health care data, because we have to ensure that patients have not had a diagnosis of hypertension, dementia or diabetes.

The statistical analysis package R will be used to create contingency tables, perform chi-squared and Cochran-Mantel-Haenszel tests. Figures will be created in R.

Anticipated Findings

We expect that our data will confirm an increased rate of dementia in African Americans with hypertension and diabetes, compared to white participants. We will determine whether other minorities also see a difference in incidence of dementia, hypertension diabetes and interactions between the them.

If there is an increased incidence of dementia in people with hypertension or diabetes, this may suggest that populations with these disorders need more careful monitoring of their conditions, as they may increase the chance of developing dementia. potentially future All of Us projects may be able to determine whether long term control of hypertension (or Diabetes/ blood glucose) may reduce the potential for developing dementia.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Robert Meller - Mid-career Tenured Researcher, Morehouse School of Medicine

Collaborators:

  • Shashwat Deepali Nagar - Graduate Trainee, Georgia Tech
  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Francis Ratsimbazafy - Other, All of Us Program Operational Use
  • King Jordan - Mid-career Tenured Researcher, Georgia Tech

Demographics of Mammography 2020_04

Project Purpose(s)

  • Population Health
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use) ...

Scientific Questions Being Studied

Mammography is an effective screening tool for breast cancer, often identifying tumors that can be treated before they develop invasive potential. Across the United States, it is estimated that 65% of women aged 40 and above have received a screening mammogram. However, smaller studies using data from electronic health records suggest that (1) that the actual screening rate may be lower and (2) mammography screening differs by racial, ethnic, and sociodemographic characteristics, and lower rates of mammography screening may contribute to disparities in breast cancer mortality.

In this demonstration project, we will describe the distribution of mammography screening captured by the submitted electronic health records in the large and diverse participant sample of the All of Us Research Program. Further, we will describe the participant characteristics that are associated with mammography rates in women during the ages in which national guidelines suggest routine screening.

Scientific Approaches

After limiting ourselves to All of Us research participants with electronic health record information, we will identify rates of mammography screening using the procedure and diagnosis tables. Using the participant provided information from the surveys, we will use logistic regression to identify participant characteristics that are associated with higher or lower rates of screening.

Anticipated Findings

Some prior research has attempted to validate self-reported mammography screening against electronic health record verification of the screening. Largely, this research has found that (1) mammography rates are likely lower than self-report suggests and (2) certain patient characteristics are associated with lower rates of screening.

We anticipate that these findings will largely hold in the All of Us study population, and that the diversity of the All of Us participants will allow us to better identify those who may need more assistance to achieve the recommended screening frequency.

Demographic Categories of Interest

  • Race / Ethnicity
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Molly Scannell Bryan - Early Career Tenure-track Researcher, University of Illinois at Chicago

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Dermatology

Project Purpose(s)

  • Disease Focused Research (psoriasis, atopic dermatitis, acne, and more) ...

Scientific Questions Being Studied

Dermatologic diseases in minority populations

Scientific Approaches

Initially exploratory. Plan on conducting cohort comparisons amongst psoriasis, acne, atopic dermatitis, and other rare dermatologic diseases as well as looking at cross-sectional data for these diseases.

Anticipated Findings

Expand the known information of dermatologic disease in minority populations.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Ahmed Yousaf - Research Fellow, West Virginia University

Diseases with similar symptoms as COVID-19

Project Purpose(s)

  • Disease Focused Research (COVID-19) ...

Scientific Questions Being Studied

We want to find diseases with similar symptoms as COVID-19. Such diseases can be used as features in COVID-19 prediction models.

Scientific Approaches

Use mutual information to find diseases/conditions that have high co-occurrence with COVID-19's common symptoms.

Anticipated Findings

We expect to find ~100 diseases/conditions that have high co-occurrence with COVID-19's common symptoms. Considering these diseases/conditions when building COVID-19 prediction models can help to reduce false-positive predictions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jifan Gao - Graduate Trainee, University of Wisconsin, Madison

Disparities in Kidney, Bladder, and Prostate Cancer

Project Purpose(s)

  • Disease Focused Research (cancer)
  • Population Health ...
  • Social / Behavioral
  • Control Set
  • Ancestry

Scientific Questions Being Studied

What are underlying factors that drive disparities in kidney, bladder, and prostate cancer? How do these factors result in increased risk? Identifying risk factors is important as it may allow for more targeted screening, early detection, and personalized therapies.

Scientific Approaches

We plan to look at datasets of kidney, bladder, and prostate cancers. We would like to include analysis of associations for genetic, socioeconomic or geographic variables.

Anticipated Findings

We anticipate that we may identify novel genetic, socioeconomic, or genetic factors as contributors to risk in genitourinary cancers. These findings may improve risk assessment in determining personalized screening, allow for earlier identification of disease, or facilitate targeted therapies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jacob Knorr - Graduate Trainee, Cleveland Clinic

Distress and T2D

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes mellitus) ...

Scientific Questions Being Studied

Depression, anxiety, and other forms of mental distress are frequently co-morbid with type 2 diabetes. Are there common risk factors between the two?

Scientific Approaches

Comparison demographics and medications of individuals diagnosed with type 2 diabetes with and without mental distress.

Anticipated Findings

If we know that someone is at risk for mental distress, we might be able to provide increased support to mitigate the effects.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Sara Taylor - Research Fellow, Massachusetts Institute of Technology

Diversity within Eating Disorders

Project Purpose(s)

  • Population Health ...

Scientific Questions Being Studied

We will explore sociodemographic variables (e.g., gender, sexual orientation, race/ethnicity) in relation to eating disorder diagnoses. We specifically are interested in disparities in the occurrence of eating disorder diagnoses, access to treatment, age of initial diagnosis, and associated distress and impairment. That is, are some sociodemographic groups more likely to receive a diagnosis of eating disorders, receive care, have varying age of initial diagnosis, and/or experience disproportionate distress/impairment, than other sociodemographic groups? These research questions are important, as there is limited research exploring intersecting identities within eating disorders.

Scientific Approaches

Within the All of Us dataset V3, we will exact sociodemographic variables (gender, sexual orientation, race/ethnicity) eating disorder diagnosis and treatment, and items from the 'Overall Health' survey to assess distress/impairment and quality of life. Initially, descriptive statistics will be used to report the frequencies of eating disorder diagnoses and treatment as a function of the aforementioned sociodemographic variables. Should there be adequate statistical power, logistic regression models will be employed with sociodemographic variables set as 'predictors' of binary eating disorder 'outcomes.' Additionally, within individuals diagnosed with an eating disorder, we will examine sociodemographic differences in distress/impairment and quality of life via linear regression. Metrics of effect size estimates will also be reported. Should statistical power allow us, interaction terms by sociodemographic variables will also be tested.

Anticipated Findings

There is limited research on intersecting identities among individuals diagnosed with eating disorders. By employing the All of Us dataset, we may be able to identify health disparities in the occurrence of eating disorders and/or associated distress/impairment and quality of life. Results may help guide additional research efforts into understanding the mechanisms which may place some populations at disproportionate risk, which subsequently could lead to refined and tailored eating disorder prevention and treatment approaches.

Demographic Categories of Interest

  • Race / Ethnicity
  • Gender Identity
  • Sexual Orientation
  • Access to Care

Research Team

Owner:

  • Aaron Blashill - Mid-career Tenured Researcher, San Diego State University

Collaborators:

  • Melissa Simone - Research Fellow, University of Minnesota
  • Alexandra Convertino - Graduate Trainee, San Diego State University
  • Jamie-Lee Pennesi - Research Fellow, San Diego State University
  • Jonathan Helm - Early Career Tenure-track Researcher, San Diego State University
  • Autumn Askew - Project Personnel, University of Minnesota

DRC_Duplicate of For_HTN_code_review

Project Purpose(s)

  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy.) ...

Scientific Questions Being Studied

We are using the All of Us Researcher Workbench interface to answer the question, "Is hypertension prevalence in the All of Us Research Program similar to hypertension prevalence in the 2015–2016 National Health and Nutrition Examination Survey (NHANES) ?". Clinical approaches to understanding and treating hypertension may benefit from the integration of a precision medicine approach that integrates data on environments, social determinants of health, behaviors, and genomic factors that contribute to hypertension risk. Hypertension is a major public health concern and remains a leading risk factor for stroke and cardiovascular disease.

Scientific Approaches

In this cross-sectional, population-based study, we used All of Us baseline data from patient (age>18) provided information (PPI) surveys and electronic health record (EHR) blood pressure measurements and retrospectively examined the prevalence of hypertension in the EHR cohort using Systemized Nomenclature of Medicine (SNOMED codes and blood pressure medications recorded in the EHR. We used the EHR data (SNOMED codes on 2 distinct dates and at least one hypertension medication) as the primary definition, and then add subjects with elevated systolic or elevated diastolic blood pressure on measurements 2 and 3 from PPI. We extracted each participant’s detailed dates of SNOMED code for essential hypertension from the Researcher Workbench table ‘cb_search_all_events’. We calculated an age-standardized HTN prevalence according to the age distribution of the U.S. Census, using 3 groups (18-39, 40-59, ≥ 60).

Anticipated Findings

The prevalence of hypertension in the All of Us cohort is similar to that of published literature. All of Us age-adjusted HTN prevalence was 27.9% compared to 29.6% in National Health and Nutrition Examination Survey. The All of Us cohort is a growing source of diverse longitudinal data that can be utilized to study hypertension nationwide. The prevalence of hypertension varies in the United States (U.S.) by age, sex, and socioeconomic status. Hypertension can often be treated successfully with medication, and prevented or delayed with lifestyle modifications. Even with these established hypertension intervention and prevention strategies, the prevalence of hypertension continues to be at levels of public health concern. The diversity within All of Us may provide insight into factors relevant to hypertension prevention and treatments in a variety of social and geographic contexts and population strata in the U.S.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Francis Ratsimbazafy - Other, All of Us Program Operational Use
  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

DRC_Duplicate of for_obesity_code_review

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Educational ...
  • Methods Development
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy.)

Scientific Questions Being Studied

National obesity prevention and intervention strategies may benefit from precision medicine approaches that incorporate integrated data on environments, social determinants of health, and genomic factors. We examined the quality and utility of the All of Us Research Hub Workbench for accelerating precision medicine by replicating methods from existing studies that examine the prevalence of obesity at the population level. We evaluated the measurements of obesity in the participant measurement (PM) data set and the electronic health record (EHR) data set using methods similar to the Ward et al. NEJM December 2019 publication that assessed prevalence of obesity in the US by state using BRFSS data.

Scientific Approaches

For this population-based cross-sectional study of All of Us Research Workbench participants, we excluded individuals with measurements obtained during pregnancy or inpatient visits and individuals from states with fewer than 100 participants. Physical measurements (PM) of height and weight at the time of program enrollment of 142,116 participants and measured weight and height extracted from electronic health records (EHR) of 40,885 individuals were used to calculate body-mass index (BMI). We did a complete case analysis for All of Us participants with known sex (male or female), race, income and education levels and estimated state-specific and demographic subgroup-specific prevalence of categories of BMI [obesity (BMI ≥30) and extreme obesity (BMI ≥ 35)] nationwide and for each state: overall and by subgroups, male and female. We examined the difference between EHR and PM calculated BMI by state.

Anticipated Findings

Using states with at least 100 participants, PM data included 142,116 individuals (mean [SD] age, 51.2 [16.6] and EHR data on height and weight included 40,885 individuals (mean [SD] age, 52.5 [16.5]. The median BMI for PM participants was 28.4 [24.4 to 33.7]; the median BMI for EHR was 29.0 [24.8 to 34.5]. The PM national prevalence for obesity (includes BMI>30 and BMI >35) and extreme obesity (BMI >35) were 41.2 % (95% Confidence Interval [CI], 40.9 to 41.4) and 20.8% (95% CI, 20.6 to 21.0), respectively, with large variations across states. Women had higher prevalence of extreme obesity than men in all selected states. Subgroups with extreme obesity (BMI, >35) prevalence greater than 25% included subgroup, N, prevalence %, (95% CI): Black NH, 8913, 28.9 (25.8 to 32.0) , individuals with income less than $25,000, 13,244, 25.1 (22.1 to 28.1); education of high school to some college, 17, 272, 26.1 (23.1 to 29.1) and the region of the South, 6,639, 25.3 (22.3 to 28.3).

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital
  • Jun Qian - Other, All of Us Program Operational Use

Duplicate of ARI Workspace -7-29-20 #1

Project Purpose(s)

  • Disease Focused Research (Autoimmune diseases) ...

Scientific Questions Being Studied

The goal of our research is to determine prevalence of autoimmune diseases, individually and as a class of disease, in the US. This work will help understand the likelihood of having autoimmune disease and we hope it will improve the ability of doctors to diagnose patients as it will establish the prior probability of having one of these many diseases.

Scientific Approaches

We will create three data sets for analysis:

1. A list of diseases rated in the following ways:

a. Evidence Class
i. Strong evidence it is autoimmune
ii. Moderate evidence it is autoimmune
iii. Weak evidence for autoimmunity
iv. A comorbidity of autoimmune disease
v. Symptom or symptom set with no known mechanism

b. Autoinflammatory versus autoimmune flag

c. “Not always autoimmune” flag – to indicate diseases that could have alternative mechanisms of cause

2. A list of patients, anonymized, with socioeconomic, geographic and other data that would be of interest to patients and public health officials to understand which communities are affected by these diseases
3. Outcomes data for patients over time assessing quality of life using PROMIS metrics

Anticipated Findings

The current NIH estimate of 23.5 million people with autoimmune disease was a guess by a knowledgable clinician, but has no scientific support. As a consequence, there are numerous figures in the public sphere and nobody knows which one is correct.

Many reports say autoimmune diseases are on the increase, but since the number is unknown, it is impossible to say whether this is a public health issue or not. Having a methodology that can be used to recompute the number of people with autoimmune disease will help us understand if these reports are true.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Aaron Abend - Senior Researcher, Autoimmune Registry, Inc.

Duplicate of Demo - Medication Sequencing

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes, depression)
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

1- What are the main prescribed medication sequences that participants with type 2 diabetes and depression took over three years of treatment?
In this questions, we are extracting the anti-diabetes and anti-depressant medications used to to treated participants who have T2D and depression codes. We retrieved medications prescribed after the first diagnosis code for each disease. We represented the medications using their ATC 4th level.
2- What is the most common first anti-diabetic and anti-depressant that were prescribed for All of Us participants? We extracted the first medications prescribed to treat T2D and depression. We identified the most common first medication with the highest number of participants.
3- Is there a change in the percentages of participants who were prescribed first common medication, treated using one medication, treated only using one common medication between 2000-2018?

Scientific Approaches

In this project, we plan on using the medication sequencing developed at Columbia University and the OHDSI network as a means to characterize treatment pathways at scale. Further, we want to demonstrate implementation of these medication sequencing algorithms in the All of Us research dataset to show how the various sources of data contained within the program can be used to characterize treatment pathways at scale. We will perform separate medication sequence analyses for three different common, complex diseases: type 2 diabetes, depression
1- Data manipulation
Using python and BigQuery to:
A- Retrieve medication and their classes
B-Create the medications sequences

2- Visualization:
A- Creating sunburst to visualize the sequences
B- Plotting the percentages of participants the first common medication and one medication during three years

Anticipated Findings

For this study, we anticipate demonstrating the validity of the data by showing expected treatment patterns despite gathering data from over 30 individual EHR sites. Specifically, we expect to find:
1- Variation in the medication sequences prescribed to treat All of Us participants who had type 2 diabetes and depression.
2- The most common medication used to treat participants as first line treatment with type 2 diabetes and depression diagnosis.
3- A trend or change over time of prescribing the first common medication over the study period
4- Trend overtime for the percentage of participants
Importantly, the detailed code developed herein is made available within the Researcher Workbench to researchers, so that they may more easily extract medication data and class information using a common medication ontology, an approach useful in many discovery studies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Duplicate of Demo - Medication Sequencing

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes, depression)
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

1- What are the main prescribed medication sequences that participants with type 2 diabetes and depression took over three years of treatment?
In this questions, we are extracting the anti-diabetes and anti-depressant medications used to to treated participants who have T2D and depression codes. We retrieved medications prescribed after the first diagnosis code for each disease. We represented the medications using their ATC 4th level.
2- What is the most common first anti-diabetic and anti-depressant that were prescribed for All of Us participants? We extracted the first medications prescribed to treat T2D and depression. We identified the most common first medication with the highest number of participants.
3- Is there a change in the percentages of participants who were prescribed first common medication, treated using one medication, treated only using one common medication between 2000-2018?

Scientific Approaches

In this project, we plan on using the medication sequencing developed at Columbia University and the OHDSI network as a means to characterize treatment pathways at scale. Further, we want to demonstrate implementation of these medication sequencing algorithms in the All of Us research dataset to show how the various sources of data contained within the program can be used to characterize treatment pathways at scale. We will perform separate medication sequence analyses for three different common, complex diseases: type 2 diabetes, depression
1- Data manipulation
Using python and BigQuery to:
A- Retrieve medication and their classes
B-Create the medications sequences

2- Visualization:
A- Creating sunburst to visualize the sequences
B- Plotting the percentages of participants the first common medication and one medication during three years

Anticipated Findings

For this study, we anticipate demonstrating the validity of the data by showing expected treatment patterns despite gathering data from over 30 individual EHR sites. Specifically, we expect to find:
1- Variation in the medication sequences prescribed to treat All of Us participants who had type 2 diabetes and depression.
2- The most common medication used to treat participants as first line treatment with type 2 diabetes and depression diagnosis.
3- A trend or change over time of prescribing the first common medication over the study period
4- Trend overtime for the percentage of participants
Importantly, the detailed code developed herein is made available within the Researcher Workbench to researchers, so that they may more easily extract medication data and class information using a common medication ontology, an approach useful in many discovery studies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Duplicate of Demo - Medication Sequencing

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes, depression)
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

1- What are the main prescribed medication sequences that participants with type 2 diabetes and depression took over three years of treatment?
In this questions, we are extracting the anti-diabetes and anti-depressant medications used to to treated participants who have T2D and depression codes. We retrieved medications prescribed after the first diagnosis code for each disease. We represented the medications using their ATC 4th level.
2- What is the most common first anti-diabetic and anti-depressant that were prescribed for All of Us participants? We extracted the first medications prescribed to treat T2D and depression. We identified the most common first medication with the highest number of participants.
3- Is there a change in the percentages of participants who were prescribed first common medication, treated using one medication, treated only using one common medication between 2000-2018?

Scientific Approaches

In this project, we plan on using the medication sequencing developed at Columbia University and the OHDSI network as a means to characterize treatment pathways at scale. Further, we want to demonstrate implementation of these medication sequencing algorithms in the All of Us research dataset to show how the various sources of data contained within the program can be used to characterize treatment pathways at scale. We will perform separate medication sequence analyses for three different common, complex diseases: type 2 diabetes, depression
1- Data manipulation
Using python and BigQuery to:
A- Retrieve medication and their classes
B-Create the medications sequences

2- Visualization:
A- Creating sunburst to visualize the sequences
B- Plotting the percentages of participants the first common medication and one medication during three years

Anticipated Findings

For this study, we anticipate demonstrating the validity of the data by showing expected treatment patterns despite gathering data from over 30 individual EHR sites. Specifically, we expect to find:
1- Variation in the medication sequences prescribed to treat All of Us participants who had type 2 diabetes and depression.
2- The most common medication used to treat participants as first line treatment with type 2 diabetes and depression diagnosis.
3- A trend or change over time of prescribing the first common medication over the study period
4- Trend overtime for the percentage of participants
Importantly, the detailed code developed herein is made available within the Researcher Workbench to researchers, so that they may more easily extract medication data and class information using a common medication ontology, an approach useful in many discovery studies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Duplicate of Genetics_of_comorbidity

Project Purpose(s)

  • Methods Development
  • Ancestry ...
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)

Scientific Questions Being Studied

As a demonstration project, this study shows the reproducibility and utility of comorbidities as candidate phenotypes for mining datasets derived from clinical practice. Diseases annotations extracted from billing data have been shown by Electronic Medical Records and Genomics Network to be unreliable phenotypes for GWAS studies. We propose that co-occurrences of specific diseases together can be designed and validated with accuracy and reliability in high throughput, as they represent known and sometimes novel clinical syndromes. Specific questions:

Can we reproduce comorbidities discovered from external datasets (e.g., Healthcare Cost and Utilization Project - HCUP) through statistical regression in the All of Us dataset, while controlling potentially confounders such as gender, age, race and ethnicity?

Can we reproduce candidate medical genetics syndromes derived from mining of GWAS jointly with other biomolecular datasets and previously confirmed as comorbid in clinical datasets?

Scientific Approaches

As a method to assess comorbidities, we will implement previously validated analyses to reliably assign the presence of specific diseases to a subject and to calculate their excess co-occurrence in the dataset with statistical significance and effect sizes. We will:

1. Utilize the OMOP code corresponding to the SNOMED-coded mapping that we curated from 262 GWAS.
2. Query the All of Us dataset to identify subjects with any of these diseases.
3. Use our published logistic regression (controlled for age, gender, and race) to calculate the effect size and significance of each pair of disease (comorbidities).
4. The significant comorbidities discovered in the All of Us dataset will be compared and contrasted with those observed as reproducible in HCUP dataset. A subset of comorbidities that we previously confirmed with converging molecular genetics from trans-EQTL patterns across chromosomes are also studied as candidate medical genetic syndromes among comorbid syndromes.

Anticipated Findings

For this study, we anticipate that we will confirm the reliability of the All-of-Us clinical practice dataset to reliably recapitulate comorbidities that were reproducibly observed in independent datasets. These "confirmed comorbidities" will serve as reliable phenotypes for future studies, such as calculating a Phenome-wide association study or computing a Genome-wide association study to demonstrate the genetic underpinning of new clinical syndromes consisting of more than one disease (e.g., the metabolic syndrome).
Importantly, the curated phenotypes and the tested logistic regression are made available for reuse by researchers in the Workbench, for new hypothesis generation.

Findings will be disseminated by the following: through scientific journals, GitHub and the workbench

Outcomes anticipated from the research: a method for querying reliably ~400 to 500 reproducible comorbidities

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth

Research Team

Owner:

  • Jianrong Li - Project Personnel, University of Arizona

Duplicate of How to Work with All of Us Physical Measurements Data

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

N/A

Anticipated Findings

N/A

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Harry Hochheiser - Mid-career Tenured Researcher, University of Pittsburgh

Duplicate of How to Work with All of Us Physical Measurements Data

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

The purpose of this workspace is to get familiar with the available data as well as its structure in the All of Us cohort. There are no active research questions being pursued at this stage but experience obtained through this workspace will help enable us to formulate plans as to how best utilize this rich dataset to answers scientific questions.

Scientific Approaches

I am using R programming language within Jupyter notebook provided in the workbench to better understand the available data elements as well as their structure.

Anticipated Findings

The only anticipated finding from this workspace is more experience with the All of Us cohort dataset as well as its workbench which will enable us to be better equipped to carry out future studies using data collected from All of Us participants.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Ozan Dikilitas - Research Fellow, Mayo Clinic

Duplicate of How to Work with All of Us Physical Measurements Data

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

N/A

Anticipated Findings

N/A

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • James Geller - Late Career Tenured Researcher, New Jersey Institute of Technology

Duplicate of How to Get Started with Registered Tier Data

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Tutorial Workspace. It is meant to provide instruction for key Researcher Workbench components and All of Us data representation.)

Scientific Questions Being Studied

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data.

What should you expect? This notebook will give you an overview of what data is available in the current Curated Data Repository (CDR). It will also teach you how to retrieve information about Electronic Health Record (EHR), Physical Measurements (PM), and Survey data.

Scientific Approaches

This Tutorial Workspace contains two Jupyter Notebooks (one written in Python, the other in R). Each notebook is divided into the following sections:

1. Setup: How to set up this notebook, install and import software packages, and select the correct version of the CDR.
2. Data Availability Part 1: How to summarize the number of unique participants with major data types: Physical Measurements, Survey, and EHR;
3. Data Availability Part 2: How to delve a little deeper into data availability within each major data type;
4. Data Organization: An explanation of how data is organized according to our common data model.
5. Example Queries: How to directly query the CDR, using two examples of SQL queries to extract demographic data.
6. Expert Tip: How to access the base version of the CDR, for users that want to do their own cleaning.

Anticipated Findings

By reading and running the notebooks in this Tutorial Workspace, you will understand the following:

All of Us data are made available in a Curated Data Repository. Participants may contribute any combination of survey, physical measurement, and electronic health record data. Not all participants contribute all possible data types. Each unique piece of health information is given a unique identifier called a concept_id and organized into specific tables according to our common data model. You can use these concept_ids to query the CDR and pull data on specific health information relevant to your analysis. See our support article Learning the Basics of the All of Us Dataset for more info.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Duplicate of Phenotype - Depression

Project Purpose(s)

  • Disease Focused Research (Major Depression Disorder)
  • Educational ...
  • Methods Development
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Questions Being Studied

The Notebooks in this Workspace can be used to implement well-known phenotype algorithms of depression in one’s own research.

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

This Workspace contains an implementation of a phenotype algorithm for depression: This algorithm was obtained from the eMERGE network. Citation: TBA. KPWA/UW. Depression. PheKB; 2018 Available from: https://phekb.org/phenotype/1095

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Sara Taylor - Research Fellow, Massachusetts Institute of Technology

Duplicate of Phenotype - Type 2 Diabetes

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Questions Being Studied

The Notebooks in this Workspace can be used to implement well-known phenotype algorithms in one’s own research.

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

Jennifer Pacheco and Will Thompson. Northwestern University. Type 2 Diabetes Mellitus. PheKB; 2012 Available from: https://phekb.org/phenotype/18

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Nayyar Ahmed - Other, University of Pittsburgh

Duplicate of Systemic Disease and Glaucoma

Project Purpose(s)

  • Disease Focused Research (primary open angle glaucoma)
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy. ) ...

Scientific Questions Being Studied

We have previously published a predictive model of glaucoma progression using electronic health record (EHR) data pertaining to systemic attributes from a single institution. We aim to use the All of Us dataset to 1) serve as external validation for this single-center model and 2) to train new models focused on predicting glaucoma progression using systemic predictors. This is important to understand whether the original findings are generalizable and provide additional knowledge about the utility of systemic predictors on a national-level dataset.

Scientific Approaches

We will develop predictive models using the All of Us dataset using multivariable logistic regression, random forests, and artificial neural networks.

Anticipated Findings

We anticipate that the All of Us data will validate the findings from the model, which demonstrated that blood pressure-related metrics and certain medication classes had predictive value for glaucoma progression. In addition, we anticipate that the models trained with All of Us data will outperform the model trained with single institution data due to larger sample size and greater diversity.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Sally Baxter - Research Fellow, University of California, San Diego

Collaborators:

  • Tsung-Ting Kuo - Early Career Tenure-track Researcher, University of California, San Diego
  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Paulina Paul - Project Personnel, University of California, San Diego
  • Lucila Ohno-Machado
  • Luca Bonomi - Research Fellow, University of California, San Diego
  • Katherine Kim - Early Career Tenure-track Researcher, University of California, Davis
  • Jihoon Kim - Project Personnel, University of California, San Diego

Duplicate of Test Workspace 2

Project Purpose(s)

  • Control Set ...

Scientific Questions Being Studied

Test

Scientific Approaches

Not available.

Anticipated Findings

Test

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Eric Song - Administrator, All of Us Program Operational Use

Duplicate of Working with All of Us Physical Measurements Data v1

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

Not available.

Anticipated Findings

N/A

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Duplicate of Working with All of Us Physical Measurements Data v1

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

Not available.

Anticipated Findings

N/A

Demographic Categories of Interest

  • Sex at Birth
  • Geography

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Cheryl Clark

Duplicate version of Demo - PheWAS Smoking for learning

Project Purpose(s)

  • Methods Development
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

As a demonstration project, this study will present the results of Phenome-Wide Association Studies (PheWAS) to show how the various sources of data contained within All of Us research dataset can be used to inform scientific discovery. We will perform separate PheWAS studies with smoking status as the independent variable. Specific questions include:

1. How can one implement a PheWAS within the All of Us Researcher Workbench?
2. How can one use heterogeneous data sources within the All of Us dataset to explore disease associations using self-reported exposures (Participant Provided Information, or “PPI”) and exposures captured in the electronic medical record (EHR).

Scientific Approaches

As a method for assessing the health burden of smoking on potential observed phenotypes, we implement a Phenome-Wide Association study. A Phenome-wide association study consists of an array of association tests over an indexed representation of the human phenome. In this analysis, we will conduct PheWAS for EHR derived smoking and PPI derived smoking exposures included in the All of Us research dataset. We will be representing "Smoking Exposure” in three ways:
EHR Smoking ICD Billing Codes
Participant Provided Information (PPI) Smoking lifetime 100 cigarettes yes/no
Participant Provided Information (PPI) Smoking lifetime smoking everyday
To perform PheWAS, we will map ICD representations of disease to a common vocabulary of PheCodes. We then use Jupyter Notebooks to create reusable functions to perform PheWAS and generate Manhattan Plots to summarize associations.

Anticipated Findings

For this study, we anticipate that we will be able to replicate known disease associations with smoking exposure. This will serve to demonstrate the quality, utility, and diversity of the All of Us data and tools and the power of gathering multiple data sources for a single phenotype, providing researchers options for study design and validation. Importantly the entire pheWAS package is made available for reuse by researchers in the Workbench, for new hypothesis generation.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • jie na - Project Personnel, Mayo Clinic

Collaborators:

  • Guoqian Jiang - Mid-career Tenured Researcher, Mayo Clinic

Duplicate_for_DRC_obesity_code_review

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Educational ...
  • Methods Development

Scientific Questions Being Studied

National obesity prevention and intervention strategies may benefit from precision medicine approaches that incorporate integrated data on environments, social determinants of health, and genomic factors. We examined the quality and utility of the All of Us Research Hub Workbench for accelerating precision medicine by replicating methods from existing studies that examine the prevalence of obesity at the population level. We evaluated the measurements of obesity in the participant measurement (PM) data set and the electronic health record (EHR) data set using methods similar to the Ward et al. NEJM December 2019 publication that assessed prevalence of obesity in the US by state using BRFSS data.

Scientific Approaches

For this population-based cross-sectional study of All of Us Research Workbench participants, we excluded individuals with measurements obtained during pregnancy or inpatient visits and individuals from states with fewer than 100 participants. Physical measurements (PM) of height and weight at the time of program enrollment of 142,116 participants and measured weight and height extracted from electronic health records (EHR) of 40,885 individuals were used to calculate body-mass index (BMI). We did a complete case analysis for All of Us participants with known sex (male or female), race, income and education levels and estimated state-specific and demographic subgroup-specific prevalence of categories of BMI [obesity (BMI ≥30) and extreme obesity (BMI ≥ 35)] nationwide and for each state: overall and by subgroups, male and female. We examined the difference between EHR and PM calculated BMI by state.

Anticipated Findings

Using states with at least 100 participants, PM data included 142,116 individuals (mean [SD] age, 51.2 [16.6] and EHR data on height and weight included 40,885 individuals (mean [SD] age, 52.5 [16.5]. The median BMI for PM participants was 28.4 [24.4 to 33.7]; the median BMI for EHR was 29.0 [24.8 to 34.5]. The PM national prevalence for obesity (includes BMI>30 and BMI >35) and extreme obesity (BMI >35) were 41.2 % (95% Confidence Interval [CI], 40.9 to 41.4) and 20.8% (95% CI, 20.6 to 21.0), respectively, with large variations across states. Women had higher prevalence of extreme obesity than men in all selected states. Subgroups with extreme obesity (BMI, >35) prevalence greater than 25% included subgroup, N, prevalence %, (95% CI): Black NH, 8913, 28.9 (25.8 to 32.0) , individuals with income less than $25,000, 13,244, 25.1 (22.1 to 28.1); education of high school to some college, 17, 272, 26.1 (23.1 to 29.1) and the region of the South, 6,639, 25.3 (22.3 to 28.3).

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

earlyonsetcolorectcalcancer

Project Purpose(s)

  • Disease Focused Research (colorectal cancer) ...

Scientific Questions Being Studied

Examine the demographic ,geographic, inflammatory biomarker differences of early onset versus late-onset colorectal cancer to determine potential biomarkers for identification of individuals at increased risk for colorectal cancer that may benefit from early screening.

Scientific Approaches

Compare individuals with and without colorectal cancer overall in the All of Us cohort by demographics, geography, and biomarkers associated with increased risk of CRC (ESR, triglycerides, BMI, systolic blood pressure, waist circumference, ApoB100, hemoglobin A1C) . Look at biomarkers at least 2 years prior to year of diagnosis.

Anticipated Findings

Identify biomarkers that may guide future research into the biology of early onset colorectal cancer.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Effect of Pyridoxine in Type 2 Diabetics

Project Purpose(s)

  • Disease Focused Research (Diabetes mellitus) ...

Scientific Questions Being Studied

The aim of this study is to investigate if Pyridoxine use can benefit diabetics in preventing long term complications by inhibiting formation of activated glycation end products and improving clinical outcomes.

Diabetes and hyperglycemia are affecting over 415 million people worldwide, and by 2040 the number is expected to increase to 642 million. Chronic hyperglycemia results in the glycation of proteins and other biomolecules resulting in generation of AGEs. Glycation can be identified as the core reason for diabetes associated disorders. The interaction of AGEs with their receptor elicits oxidative stress and as a result evokes proliferative, inflammatory, thrombotic and fibrotic reactions in a variety of cells. Therefore, inhibiting the glycation process might be an effective way to prevent the complications of chronic hyperglycemia.

Scientific Approaches

Dataset: type 2 DM patients
Inclusion Criteria: Diabetics not on insulin, Age > 18 and < 70. A1c >6.5%
Exclusion Criteria: History of uncontrolled DM with A1c >8.5, hemoglobinopathy, sickle cell disease, thalassemia, anemia(iron def, pernicious anemia, B12 def., Folate def.) blood transfusion in the last 9 months, coagulopathy, blood thinner treatment, treatment with B6/B12/folate/iron in the last 3 months,treatment for TB or INH treatment, asplenia, pregnant or planning pregnancy in the next 6 months.

Research Method:
Blood and urine labs such as HGB A1c, HGB/HCT, fructosamine, fasting lipids, microalbuminuria, 24-hour creatinine/ protein, reticulocyte count and glycated albumin will be analysed.
Lab parameters and outcomes with patients on pyridoxine 100 mg po daily will be compared with subjects not on pyridoxine.

Anticipated Findings

1. Pyridoxine can decrease HbA1c in Type 2 diabetics
2. Pyridoxine can decrease Glycated albumin, Glycomark, microalbuminuria

If pyridoxine can reduce AGE without the risk of hypoglycemia and without other side effects , Pyridoxine should be used in diabetic patients. This will save diabetic patients from the known complications of hyperglycemia without the side effect of anti-diabetic medication.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Bijun Kannadath - Early Career Tenure-track Researcher, University of Arizona

Collaborators:

  • Jiali Ling - Project Personnel, University of Arizona

End of Life Prediction

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

We plan to test the compatibility of the omop-learn library with the All of Us dataset. This library was developed by the Clinical Machine Learning Group at MIT, and facilitates rapid prototyping in Python of predictive machine-learning models using longitudinal medical data from an OMOP CDM-standard database. This test is important to verify that other researchers will be able to use the omop-learn library to run their own predictive tasks on this particular dataset, as well as other OMOP datasets.

Scientific Approaches

We plan to test the library with the task of predicting mortality over a six-month window for patients over the age of 70. Using the omop-learn library, we will choose a cohort based on their age and enrollment during the training and outcome windows. Then, we will build a sparse feature matrix of drug, condition, procedure, and specialty features. Using this dataset, we will use two methods to predict outcomes: a simple logistic regression model, and the SARD model, which is a model deep-learning algorithm.

Anticipated Findings

We anticipate that omop-learn will be compatible with the All of Us dataset. Our findings here will allow us to identify and resolve any compatibility issues we encounter, which will enable researchers to use omop-learn for their own predictive modeling tasks.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • justin Lim - Graduate Trainee, Massachusetts Institute of Technology

Evidence of the Latino Epidemiologic Paradox in the All of Us Research Project

Project Purpose(s)

  • Disease Focused Research (Cardiovascular disease)
  • Population Health ...
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)

Scientific Questions Being Studied

The overall goal of this project is to examine whether there is evidence of the Latino Epidemiological Paradox within the All of Us Research Project (AoURP) cohort. The specific aims are:

Specific Aim 1
To determine whether Latinos have lower prevalence of gender stratified age-adjusted CVD versus NHWs and non-Hispanic blacks in the cohort.

Specific Aim 2
To determine whether Latinos have lower prevalence of gender stratified age-adjusted cancer (overall) versus NHWs and non-Hispanic blacks in the cohort.

Specific Aim 3
To determine whether Latinos have higher prevalence of gender stratified age-adjusted diabetes and obesity (overall) versus NHWs and non-Hispanic blacks in the cohort.

Specific Aim 4
To extent possible examine differences by Latino subgroups and among foreign born versus US born Latinos.

Scientific Approaches

Study population. All of Us Research Project core participants. We will examine data from different data sources including electronic health records (EHR) and participant provided information (PPI) and physical measurements.

Main outcome variables: we will work with the DRC Research Support Team to obtain support for their existing classification scheme for common complex diseases which in this project would include cardiovascular disease, cancer (including subtypes to extent possible) and Diabetes (Type 2). For the definition of diseases we will use EHR data to preserve very objective outcomes, excluding for now survey data.

Statistical analysis
We will present all data stratified by gender adn age adjusted using direct standardization. BMI categories would be <25, 25-30, 30-35 and >35). For diabetes AIC data will be categorized (AIC <7, AIC 7-9 and AIC > 9).

Anticipated Findings

We expect to find evidence of the Latino Epidemiological Paradox within the All of Us Research Project (AoURP) cohort. We expect to find that despite multiple social and economic disadvantages, overall on many measures of population health Latinos seem to have a more favorable health advantage than other racial/ethnic minority groups such as blacks and in some measures even better health status than Non-Hispanic Whites (NHWs).

Previous studies like the Study of Latinos (SOL), which is the largest study of Latinos (16,000), aimed to examine this paradox but had the limitation that only included Latinos and thus comparative data on non-Latinos was not collected. With 40,000 Latinos core participants in the AllofUs study (as well 160,000 non Latinos), the AoURP study is uniquely positioned to contribute our knowledge and further understanding of this paradox.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth

Research Team

Owner:

  • Raul Montanez Valverde - Graduate Trainee, University of Miami

Collaborators:

  • olveen carrasquillo - Late Career Tenured Researcher, University of Miami

Exploration of data for use in predicting cancer diagnosis

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

This is my initial project, and is set up to explore the possibilities with All of Us data.

My scientific interest is to better understand why some people develop specific cancers and some do not.

Scientific Approaches

I plan to use machine learning applied to germ line DNA data to answer these questions.

Anticipated Findings

Ultimately the findings from this line of work should lead to better predictive tests for cancer. An example is that someday you might be able to take a blood sample from a young adult and tell them that they will probably develop colon cancer sometime in the next 40 years. This might lead them to screen for colon polyps more rigorously and ultimately let them avoid a late stage colon cancer diagnosis.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • James Brody - Mid-career Tenured Researcher, University of California, Irvine

Exploring

Project Purpose(s)

  • Educational
  • Other Purpose ( exploring the workbench as part of an applied biomedical informatics graduate course and that you’ll be leveraging AoU for educational purposes. ) ...

Scientific Questions Being Studied

How useful is All of Us data in biomedical and public health research? For this workspace I intend on looking around the workspace and understanding how the information in All of US will help formulate new hypotheses. I intend on using BMI data and perhaps other types of data to help me in this analysis.

Scientific Approaches

I intend on using the workspace to review tools, processes, and data across the longitudinal cohort of AoU participants.

Anticipated Findings

In exploring the All of Us workspace, I will understand if it is a viable tool for my research. If it is, I may continue to use this tool in the future.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Disability Status
  • Access to Care
  • Income Level

Research Team

Owner:

  • Michelle Gomez - Graduate Trainee, Vanderbilt University

FHIRCat-LungCancer

Project Purpose(s)

  • Disease Focused Research (lung cancer)
  • Methods Development ...
  • Control Set
  • Ancestry

Scientific Questions Being Studied

Lung cancer continues to be the leading cause of deaths from malignancy worldwide. There have been widespread efforts to develop safe and effective screening methods to detect lung cancer at an earlier stage. The US Preventive Services Task Force (USPSTF) recommends screening for lung cancer in individuals aged 55-80 years, who have a smoking history of 30 pack-years or more, and who either currently smoke or quit within the past 15 years. However, data has shown that only a third of patients diagnosed with lung cancer in the USA meet the USPSTE screening criteria, suggesting that many potential high-risk individuals are not eligible for low-dose CT screening. Therefore, there is an urgent need to seek more sophisticated risk assessment methods incorporating clinical data, and to identify those at high risk and optimize the lung cancer screening criteria.

Scientific Approaches

We plan to create methods and tools to characterize the phenotypic abnormalities associated with patient eligibility and risk factors using phenome-wide association study (PhWAS). We will explore the all of us datasets to answer our scientific questions on lung cancer screening.

Anticipated Findings

We anticipate that we can identify patient cohorts with lung cancer risk factors and demonstrate the feasibility of the use all of us datasets to conduct PheWAS study to characterize the phenotypic abnormalities associated with patient eligibility and risk factors.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Guoqian Jiang - Mid-career Tenured Researcher, Mayo Clinic

Collaborators:

  • jie na - Project Personnel, Mayo Clinic

First

Project Purpose(s)

  • Other Purpose (Learn how to use workbench (first execution)) ...

Scientific Questions Being Studied

This is my first execution of workbench. There are no specific research questions. I hope to learn how to use the workbench. As new user, I don't know what to expect and what functions the workbench has.

Scientific Approaches

that are no formal approaches planed for this project. This is the first project of this user and the purpose is to learn how to use all possible workbench functions.

Anticipated Findings

demonstrate ability to execute workbench tools and functions. It will consist of query and results that flow from such query. E.g., SQL query results.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

First Test Workspace

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

Exploratory data analysis to start, to see if can support research questions around clinical decision support applications:

- Can the results of microbiology culture tests be accurately predicted based on available patient / clinical data at the time of test ordering?

- Can the clinical orders from new specialty consultation visits be predicted based on available patient / clinical data at the time of referral from a generalist?

Scientific Approaches

Supervised and unsupervised machine learning models (e.g., collaborative filtering) applied to clinical data sources to predict subsequent labels in the form of clinical test orders and results.
Cases where patients receive empiric antibiotic prescriptions (simultaneous antibiotics with new diagnostic microbiology culture tests).
Cases where a patient is referred to and then subsequent sees a specialist (e.g., endocrinology or hematology).

Anticipated Findings

Clinical orders and tests results are sufficiently predictable given available data that they can power clinical decision support information retrieval tools to aid clinical decision making under uncertainty.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jonathan Chen - Early Career Tenure-track Researcher, Stanford University

For training and learning

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

The workspace is aimed to develop a learning module and provide and exposure to students on potential social science based implications for occupational choices.

Scientific Approaches

The workspace aims to use traditional statistical methods in Python.

Anticipated Findings

Developing a deeper understanding of the dataset and the baseline descriptives related to occupational choice.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Pankaj Patel - Late Career Tenured Researcher, Villanova University

For_DRC_obesity_code_review

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Educational ...
  • Methods Development

Scientific Questions Being Studied

National obesity prevention and intervention strategies may benefit from precision medicine approaches that incorporate integrated data on environments, social determinants of health, and genomic factors. We examined the quality and utility of the All of Us Research Hub Workbench for accelerating precision medicine by replicating methods from existing studies that examine the prevalence of obesity at the population level. We evaluated the measurements of obesity in the participant measurement (PM) data set and the electronic health record (EHR) data set using methods similar to the Ward et al. NEJM December 2019 publication that assessed prevalence of obesity in the US by state using BRFSS data.

Scientific Approaches

For this population-based cross-sectional study of All of Us Research Workbench participants, we excluded individuals with measurements obtained during pregnancy or inpatient visits and individuals from states with fewer than 100 participants. Physical measurements (PM) of height and weight at the time of program enrollment of 142,116 participants and measured weight and height extracted from electronic health records (EHR) of 40,885 individuals were used to calculate body-mass index (BMI). We did a complete case analysis for All of Us participants with known sex (male or female), race, income and education levels and estimated state-specific and demographic subgroup-specific prevalence of categories of BMI [obesity (BMI ≥30) and extreme obesity (BMI ≥ 35)] nationwide and for each state: overall and by subgroups, male and female. We examined the difference between EHR and PM calculated BMI by state.

Anticipated Findings

Using states with at least 100 participants, PM data included 142,116 individuals (mean [SD] age, 51.2 [16.6] and EHR data on height and weight included 40,885 individuals (mean [SD] age, 52.5 [16.5]. The median BMI for PM participants was 28.4 [24.4 to 33.7]; the median BMI for EHR was 29.0 [24.8 to 34.5]. The PM national prevalence for obesity (includes BMI>30 and BMI >35) and extreme obesity (BMI >35) were 41.2 % (95% Confidence Interval [CI], 40.9 to 41.4) and 20.8% (95% CI, 20.6 to 21.0), respectively, with large variations across states. Women had higher prevalence of extreme obesity than men in all selected states. Subgroups with extreme obesity (BMI, >35) prevalence greater than 25% included subgroup, N, prevalence %, (95% CI): Black NH, 8913, 28.9 (25.8 to 32.0) , individuals with income less than $25,000, 13,244, 25.1 (22.1 to 28.1); education of high school to some college, 17, 272, 26.1 (23.1 to 29.1) and the region of the South, 6,639, 25.3 (22.3 to 28.3).

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

For_HTN_code_review

Project Purpose(s)

  • Other Purpose (This work is an AoU demo project. Demo projects are efforts by the AoU Research Program designed to meet the program goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. As an approved demo project, this work was reviewed and overseen by the AoU Research Program Science Committee and the AoU Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use. ) ...

Scientific Questions Being Studied

We are using the All of Us Researcher Workbench interface to answer the question, "Is hypertension prevalence in the All of Us Research Program similar to hypertension prevalence in the 2015–2016 National Health and Nutrition Examination Survey (NHANES) ?". Clinical approaches to understanding and treating hypertension may benefit from the integration of a precision medicine approach that integrates data on environments, social determinants of health, behaviors, and genomic factors that contribute to hypertension risk. Hypertension is a major public health concern and remains a leading risk factor for stroke and cardiovascular disease.

Scientific Approaches

In this cross-sectional, population-based study, we used All of Us baseline data from patient (age>18) provided information (PPI) surveys and electronic health record (EHR) blood pressure measurements and retrospectively examined the prevalence of hypertension in the EHR cohort using Systemized Nomenclature of Medicine (SNOMED codes and blood pressure medications recorded in the EHR. We used the EHR data (SNOMED codes on 2 distinct dates and at least one hypertension medication) as the primary definition, and then add subjects with elevated systolic or elevated diastolic blood pressure on measurements 2 and 3 from PPI. We extracted each participant’s detailed dates of SNOMED code for essential hypertension from the Researcher Workbench table ‘cb_search_all_events’. We calculated an age-standardized HTN prevalence according to the age distribution of the U.S. Census, using 3 groups (18-39, 40-59, ≥ 60).

Anticipated Findings

The prevalence of hypertension in the All of Us cohort is similar to that of published literature. All of Us age-adjusted HTN prevalence was 27.9% compared to 29.6% in National Health and Nutrition Examination Survey. The All of Us cohort is a growing source of diverse longitudinal data that can be utilized to study hypertension nationwide. The prevalence of hypertension varies in the United States (U.S.) by age, sex, and socioeconomic status. Hypertension can often be treated successfully with medication, and prevented or delayed with lifestyle modifications. Even with these established hypertension intervention and prevention strategies, the prevalence of hypertension continues to be at levels of public health concern. The diversity within All of Us may provide insight into factors relevant to hypertension prevention and treatments in a variety of social and geographic contexts and population strata in the U.S.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital
  • Elizabeth Karlson - Late Career Tenured Researcher, Massachusetts General Hospital
  • Cheryl Clark

for_obesity_code_review

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Educational ...
  • Methods Development

Scientific Questions Being Studied

National obesity prevention and intervention strategies may benefit from precision medicine approaches that incorporate integrated data on environments, social determinants of health, and genomic factors. We examined the quality and utility of the All of Us Research Hub Workbench for accelerating precision medicine by replicating methods from existing studies that examine the prevalence of obesity at the population level. We evaluated the measurements of obesity in the participant measurement (PM) data set and the electronic health record (EHR) data set using methods similar to the Ward et al. NEJM December 2019 publication that assessed prevalence of obesity in the US by state using BRFSS data.

Scientific Approaches

For this population-based cross-sectional study of All of Us Research Workbench participants, we excluded individuals with measurements obtained during pregnancy or inpatient visits and individuals from states with fewer than 100 participants. Physical measurements (PM) of height and weight at the time of program enrollment of 142,116 participants and measured weight and height extracted from electronic health records (EHR) of 40,885 individuals were used to calculate body-mass index (BMI). We did a complete case analysis for All of Us participants with known sex (male or female), race, income and education levels and estimated state-specific and demographic subgroup-specific prevalence of categories of BMI [obesity (BMI ≥30) and extreme obesity (BMI ≥ 35)] nationwide and for each state: overall and by subgroups, male and female. We examined the difference between EHR and PM calculated BMI by state.

Anticipated Findings

Using states with at least 100 participants, PM data included 142,116 individuals (mean [SD] age, 51.2 [16.6] and EHR data on height and weight included 40,885 individuals (mean [SD] age, 52.5 [16.5]. The median BMI for PM participants was 28.4 [24.4 to 33.7]; the median BMI for EHR was 29.0 [24.8 to 34.5]. The PM national prevalence for obesity (includes BMI>30 and BMI >35) and extreme obesity (BMI >35) were 41.2 % (95% Confidence Interval [CI], 40.9 to 41.4) and 20.8% (95% CI, 20.6 to 21.0), respectively, with large variations across states. Women had higher prevalence of extreme obesity than men in all selected states. Subgroups with extreme obesity (BMI, >35) prevalence greater than 25% included subgroup, N, prevalence %, (95% CI): Black NH, 8913, 28.9 (25.8 to 32.0) , individuals with income less than $25,000, 13,244, 25.1 (22.1 to 28.1); education of high school to some college, 17, 272, 26.1 (23.1 to 29.1) and the region of the South, 6,639, 25.3 (22.3 to 28.3).

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital