Research Projects Directory

Research Projects Directory

Information about each research project within the Workbench is available in the Research Projects Directory below. Approved researchers provide their project’s research purpose, description, populations of interest, and more. This information helps All of Us ensure transparency on the type of research being conducted.

At this time, all listed projects are using data in the Registered Tier. The Registered Tier contains individual-level data from electronic health records, survey answers, physical measurements, and Fitbit. These data have been altered to protect participant privacy.

Note: Researcher Workbench users provide information about their research projects independently. Any views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program.

Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

There are currently 422 active workspaces. This information was updated on 3/2/2021.

Sort By Title:

D014 - Opioids

Project Purpose(s)

  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

As a demonstration project, this study will present the results of prevalence of opioid use in the United States. Specific questions include:

1. What is the prevalence of prescription opioids received from healthcare systems?
2. What is the prevalence of opioids misuse including nonmedical prescription opioids use and street opioid use?
3. Data in both previous questions will also be stratified by geographic region

Scientific Approaches

We will identify prevalence of opioid use in two ways and stratified by state.
First, we use EHR Drug Exposures to capture use of prescription opioid.
Second, we use lifestyle survey questionnaire to capture substance use reported by patients themselves:
1. In your LIFETIME, which of the following substances have you ever used?
2. In the PAST THREE MONTHS, how often have you used this substance?
The prevalence will be stratified by state, therefore EHR Observation Table will be used to capture this information.

Anticipated Findings

For this study, we anticipate that we will be able to replicate previous national studies of estimating prevalence of opioids. All of Us workbench research data also provides an alternative tool for assessing prevalence rate of substance use and prescription opioids for US population.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Hsueh-Han Yeh - Research Associate, Henry Ford Health System

D015-housing

Project Purpose(s)

  • Population Health
  • Other Purpose (The data can provide evidence of AOU ability to replicate findings around social determinants and the ability to identify vulnerable populations in our cohort.) ...

Scientific Questions Being Studied

What is the prevalence of housing insecurity among current participants in the All of Us study? What individual-level factors are related to housing insecurity, including demographics, indicators of health care access, and perceived health status?

Scientific Approaches

We will determine the prevalence of housing insecurity in the All of Us study sample using data collected in the Basics module (“worried or concerned about not having a place to live”). We will use housing insecurity as the dependent variable in a multivariate analysis to determine the relationship of healthcare access and health services utilization. Finally, we will report the independent relationship between housing insecurity and healthcare access, adjusting for the covariates and conducting stratified analyses as appropriate.

Anticipated Findings

Recently, investigators examined the relationship of housing insecurity using the 2011-2015 BRFSS and found a 12.6% prevalence among the >228,000 in the study sample. All of Us can replicate these findings among its core participants using questionnaire items similar to those used by investigators.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Amy Tang - Early Career Tenure-track Researcher, Henry Ford Health System

D027-MS

Project Purpose(s)

  • Disease Focused Research (multiple sclerosis)
  • Other Purpose (Provide evidence of AOU ability to replicate findings on the prevalence and demographics of MS ) ...

Scientific Questions Being Studied

Objective: Determine the prevalence, demographics and regional distribution of multiple sclerosis (MS) in the All of Us Research Program?

Scientific Approaches

Study population: All of Us Research Program participants who have given access to their electronic health record information and who have answered the Basics survey, and who have answered Personal Medical History survey.

Data analysis: We will determine the prevalence of multiple sclerosis in the All of Us Research Program electronic medical record data and personal medical history survey with three different cohorts: patients had EHR only, survey only and both EHR and Survey. Those data will then be stratified by age, sex, race/ethnicity and region as self-reported in the Basics PPI survey.

Anticipated Findings

We anticipate that the AoURP will have prevalence and demographics of MS as recent previous studies. We further anticipate that findings regarding MS in AoURP participants' EHR will be similar to those in the survey data.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Cathryn Peltz - Other, Henry Ford Health System

Collaborators:

  • Amy Tang - Early Career Tenure-track Researcher, Henry Ford Health System

D029

Project Purpose(s)

  • Disease Focused Research (cardio vascular disease, cancer (all types), diabetes )
  • Population Health ...

Scientific Questions Being Studied

The overall goal of this project is to examine whether there is evidence of the Latino Epidemiological Paradox within the All of Us Research Project (AoURP) cohort.
. In this proposal, we will perform analysis that would seek to examine this phenomenon. We will address the following aims:
• Specific Aim 1. To determine whether Latinos have lower prevalence of gender stratified age-adjusted CVD versus NHWs and non-Hispanic blacks in the cohort.
• Specific Aim 2. To determine whether Latinos have lower prevalence of gender stratified age-adjusted cancer (overall) versus NHWs and non-Hispanic blacks in the cohort
• Specific Aim #3. To determine whether Latinos have higher prevalence of gender stratified age-adjusted diabetes and obesity (overall) versus NHWs and non-Hispanic blacks in the cohort
• Specific Aim #4: To extent possible examine differences by Latino subgroups and among foreign born versus US born Latinos.

Scientific Approaches

Not available.

Anticipated Findings

to determine whether there is evidence of the Latino epidemiological paradox in the AoURP cohort.

Demographic Categories of Interest

Not available.

Research Team

Owner:

  • olveen carrasquillo - Late Career Tenured Researcher, University of Miami

D16_HTN_revision_after_code_review

Project Purpose(s)

  • Other Purpose (This work is an AoU demo project. Demo projects are efforts by the AoU Research Program designed to meet the program goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. As an approved demo project, this work was reviewed and overseen by the AoU Research Program Science Committee and the AoU Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use. ) ...

Scientific Questions Being Studied

We are using the All of Us Researcher Workbench interface to answer the question, "Is hypertension prevalence in the All of Us Research Program similar to hypertension prevalence in the 2015–2016 National Health and Nutrition Examination Survey (NHANES) ?". Clinical approaches to understanding and treating hypertension may benefit from the integration of a precision medicine approach that integrates data on environments, social determinants of health, behaviors, and genomic factors that contribute to hypertension risk. Hypertension is a major public health concern and remains a leading risk factor for stroke and cardiovascular disease.

Scientific Approaches

In this cross-sectional, population-based study, we used All of Us baseline data from patient (age>18) provided information (PPI) surveys and electronic health record (EHR) blood pressure measurements and retrospectively examined the prevalence of hypertension in the EHR cohort using Systemized Nomenclature of Medicine (SNOMED codes and blood pressure medications recorded in the EHR. We used the EHR data (SNOMED codes on 2 distinct dates and at least one hypertension medication) as the primary definition, and then add subjects with elevated systolic or elevated diastolic blood pressure on measurements 2 and 3 from PPI. We extracted each participant’s detailed dates of SNOMED code for essential hypertension from the Researcher Workbench table ‘cb_search_all_events’. We calculated an age-standardized HTN prevalence according to the age distribution of the U.S. Census, using 3 groups (18-39, 40-59, ≥ 60).

Anticipated Findings

The prevalence of hypertension in the All of Us cohort is similar to that of published literature. All of Us age-adjusted HTN prevalence was 27.9% compared to 29.6% in National Health and Nutrition Examination Survey. The All of Us cohort is a growing source of diverse longitudinal data that can be utilized to study hypertension nationwide. The prevalence of hypertension varies in the United States (U.S.) by age, sex, and socioeconomic status. Hypertension can often be treated successfully with medication, and prevented or delayed with lifestyle modifications. Even with these established hypertension intervention and prevention strategies, the prevalence of hypertension continues to be at levels of public health concern. The diversity within All of Us may provide insight into factors relevant to hypertension prevention and treatments in a variety of social and geographic contexts and population strata in the U.S.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital
  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Elizabeth Karlson - Late Career Tenured Researcher, Massachusetts General Hospital

Data element analysis of AllofUS

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

Exploring the ALLofUS datasets, to discover the usage of different dta elements and they are used to define and categorize different cohorts of patients. The a analysis will also aim to identify different phenotype methods based on avaialble data elements in the dataset.

Scientific Approaches

Using the complete dataset, we will study the volume and usage of data elements potentially used for phenotyping. This included conducting an analysis of different data element volume and the diversity of values used for distinct data elements.

Anticipated Findings

Our findings will include an understanding of how each data element is used, how commonly and how often values are populated and what are common values for different elements, all in an effort to discover different techniques for phenotype development.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Craig Mayer - Project Personnel, NIH

Data Exploration and Familiaraization

Project Purpose(s)

  • Social / Behavioral
  • Other Purpose (The purpose of this workspace is for me to become familiar with the dataset, the workspace and the tools) ...

Scientific Questions Being Studied

At this point I do not have a specific hypothesis. I'd like to explore the dataset and become familiar with the environment and tools to better understand what research studies I can do with these data in the future

Scientific Approaches

I am currently mostly interested in the time-series data from wearable devices. In particular, I'd like to explore the use of the intra-day timeseries heart rate and activity data in order to detect stress exposure. I plan to develop Python scripts to process and analyze these data.

Anticipated Findings

The expected outcome will be better familiarity with tools and data elements and better understanding of the possibilities and limitations of the dataset.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Serguei Pakhomov - Late Career Tenured Researcher, University of Minnesota

Data Management

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

Students may be asked to verbally give a brief summary of what they learned from the reading during the lecture portion of the class. This summary, along with discussion during class and engagement over Teams will contribute to Instructors’ subjective assessment of students’ participation.
A final project will be assigned around the fourth week of the class. This project will tie together multiple concepts introduced in the course. In the last class session, each student will turn in a writeup of their final project and present their work to the class.

Scientific Approaches

Some of the work over the course of the semester will include review of tools, processes, and data across the longitudinal cohort of AoU participants.

Anticipated Findings

A final project will be assigned around the fourth week of the class. This project will tie together multiple concepts introduced in the course. In the last class session, each student will turn in a writeup of their final project and present their work to the class.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Marily Barron - Graduate Trainee, Vanderbilt University

Data Quality and Data Characterization

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

This research project will use AoU data to test data quality methods. It will also use AoU to provide reference benchmark data for testing data quality. The analysis will include data characterization. Data will be analyzed if it conform to expected patterns.

Scientific Approaches

Achilles R package developed by OHDSI is example of data quality and data characterization tool. The approach will include running SQL or other analytical queries on AoU dataset.

Anticipated Findings

We will understand how data is structured either as a whole and what are the differences in data from sites.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Collaborators:

  • Craig Mayer - Project Personnel, NIH

DataExploration

Project Purpose(s)

  • Social / Behavioral
  • Educational ...
  • Methods Development

Scientific Questions Being Studied

Explore the collected data set so far and determine the type of research and education activities we can perform in future work.

Scientific Approaches

Descriptive statistics will be calculated to understand the data. In certain cases, we will also use data visualization. We will use Python and R packages for the data analysis.

Anticipated Findings

A clear understanding about the current data set.

Demographic Categories of Interest

  • Age
  • Geography
  • Disability Status
  • Access to Care

Research Team

Owner:

  • Leming Zhou - Project Personnel, University of Pittsburgh

Dataset V4

Project Purpose(s)

  • Population Health
  • Ancestry ...

Scientific Questions Being Studied

I propose to develop and apply a local ancestry mapping approach to jointly characterize the genetic and environmental contributions to health disparities in diverse cosmopolitan populations. The approach I propose is powered by biobank-scale data sets, which combines genome-wide genetic information with rich sources of environmental, lifestyle and clinical data for many thousands of participants represented in electronic health records. My approach entails a combination of algorithm development, genome analytics, and electronic health record analysis, with an emphasis on the development and application of genetic ancestry-inference algorithms to address specific questions regarding the relationship between genetic ancestry, environment and health outcomes.
The specific aims for the initial phase of my research program are:
Genetic ancestry inference at scale. Develop algorithms for genetic ancestry inference at biobank scale, with an emphasis on local ancestry inference.

Scientific Approaches

Step 1: Objectively identify coherent ancestry groups without relying on self-identified race/ethnicity (SIRE) labels or ancestral reference populations.
Algorithm development and approach: Principal component analysis (PCA) of All of US genotypes followed by clustering can be used to objectively identify 'coherent' ancestry groups. Once groups are ascertained using this unbiased approach, group labels can be determined using SIRE or reference populations.
Step 2: Characterize the remaining 'admixed' individuals as a combination of ancestry components from the coherent ancestry groups.
Algorithm development and approach: Once coherent groups have been identified, the remaining admixed individuals can be characterized as combinations of haplotypes with origins from the coherent ancestry groups. The current state-of-the-art approach available to the research community, RFMix, is highly accurate but does not scale well for larger datasets.

Anticipated Findings

I will initially apply this approach broadly to all existing health-related phenotypes in the All of US dataset and later focus in on specific health-related phenotypes as informed by the data. Subsequent work, especially with the All of Us program, will include a more hypothesis driven approach that relies on the analysis of specific health-related phenotypes of interest, particularly as they relate to known health disparities and/or conditions of interest to specific communities.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography

Research Team

Owner:

  • Leonardo Marino-Ramirez - Senior Researcher, NIH

Dementia-Hypertension-Diabetes-2

Project Purpose(s)

  • Disease Focused Research (dementia)
  • Methods Development ...
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy.)

Scientific Questions Being Studied

Alzheimer’s disease is a neurodegenerative condition characterized by a progressive decline in cognitive function (dementia). Studies suggest that patients with elevated blood pressure (hypertension) are at risk of Alzheimer’s disease type dementias. High blood sugar levels or Type2 Diabetes Mellitus may also be associated with an increased risk of dementia. Some minority populations may have an increased incidence of hypertension and diabetes. For example, African Americans have a higher incidence of hypertension. Therefore we will to investigate the grouping of racial and ethnic categories, with respect to the incidence of hypertension, diabetes and dementia, to determine whether minority groups have a stronger association between dementia and co-morbidities by race/ ethnicity.
The goal of this demonstration project is to validate previous research showing potential interactions between dementia, diabetes, and hypertension, with an explicit consideration of race/ ethnicity.

Scientific Approaches

Data from participants (aged 40 or over) will be subjected to statistical analysis to identify interactions between the incidence of dementia, Diabetes, and Hypertension, and self-identified Race/ Ethnicity. We will only analyze participants in this age group, because the incidence of dementia is very low in patients younger than 40. We will only analyze patients with electronic health care data, because we have to ensure that patients have not had a diagnosis of hypertension, dementia or diabetes.

The statistical analysis package R will be used to create contingency tables, perform chi-squared and Cochran-Mantel-Haenszel tests. Figures will be created in R.

Anticipated Findings

We expect that our data will confirm an increased rate of dementia in African Americans with hypertension and diabetes, compared to white participants. We will determine whether other minorities also see a difference in incidence of dementia, hypertension diabetes and interactions between the them.

If there is an increased incidence of dementia in people with hypertension or diabetes, this may suggest that populations with these disorders need more careful monitoring of their conditions, as they may increase the chance of developing dementia. potentially future All of Us projects may be able to determine whether long term control of hypertension (or Diabetes/ blood glucose) may reduce the potential for developing dementia.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Robert Meller - Mid-career Tenured Researcher, Morehouse School of Medicine

Collaborators:

  • Shashwat Deepali Nagar - Graduate Trainee, Georgia Tech
  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Francis Ratsimbazafy - Other, All of Us Program Operational Use
  • King Jordan - Mid-career Tenured Researcher, Georgia Tech

Demo for webinar-2.11-2021

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

This is a demo to potential research users of the All of Us research data hub. The demo is to demonstrate how do set up a workspace using a query for the following dataset:

women
Hispanic
ages 20-55
diagnosed with Type 2 diabetes
diagnosed with depression (uncomplicated)
prescribed metformin
prescribed injectable insulin
prescribed an antidepressant (citalopram)
living at or below the Federal Poverty Limit

Scientific Approaches

This data set is being developed for demonstration purposes. The data set will include participants with the following characteristics:
women
Hispanic
ages 20-55
diagnosed with Type 2 diabetes
diagnosed with depression (uncomplicated)
prescribed metformin
prescribed injectable insulin
prescribed an antidepressant (citalopram)
living at or below the Federal Poverty Limit

Quantitative analyses will be conducted for simple demonstration purposes.

Anticipated Findings

As this is a demonstration example only, there are no expectations of further analysis of the data or that it will contribute to the body of scientific knowledge in the field.

Demographic Categories of Interest

  • Race / Ethnicity
  • Income Level

Research Team

Owner:

  • Mary Helen Mays - Mid-career Tenured Researcher, University of Puerto Rico Medical Sciences

Demo- Healthy Obesity

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Methods Development ...
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. This work was reviewed and overseen by the All of Us Research Program Science Committee and the All of Us Data and Research Center to ensure compliance with program policy, including acceptable data access and use. Summary of research purpose)

Scientific Questions Being Studied

The objective of this study was to evaluate the validity of self-reported criteria of healthy obesity in the All of Us cohort using their medical records as the gold standard.

Scientific Approaches

Data for this study was based on the All of Us (AoU) and included self-reported health questionnaire responses collected at the time of enrollment as well as routinely collected health information extracted from electronic health records (EHRs). Self-reported data to characterize healthy obesity was derived from the Personal Medical History Questionnaire based on the following health conditions: congestive heart failure, coronary heart disease, heart attack, high cholesterol, hypertension, prediabetes, type 1 diabetes, type 2 diabetes, and obesity. Our statistical analysis will calculate the proportion of subjects with confirmed healthy obesity (Positive predictive value, PPV) based on the number of participants who self-reported healthy obesity and subsequently confirmed by medical history (true positives), divided by the total number of people who reported healthy obesity.

Anticipated Findings

The results of this study will provide methods for population-based estimates of healthy obesity using self-reported data.

Demographic Categories of Interest

  • Geography

Research Team

Owner:

  • Dominick Lemas - Project Personnel, University of Florida

Collaborators:

  • Matthew McConnell - Project Personnel, University of Florida

Demographics of Mammography 2020_04

Project Purpose(s)

  • Population Health
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use) ...

Scientific Questions Being Studied

Mammography is an effective screening tool for breast cancer, often identifying tumors that can be treated before they develop invasive potential. Across the United States, it is estimated that 65% of women aged 40 and above have received a screening mammogram. However, smaller studies using data from electronic health records suggest that (1) that the actual screening rate may be lower and (2) mammography screening differs by racial, ethnic, and sociodemographic characteristics, and lower rates of mammography screening may contribute to disparities in breast cancer mortality.

In this demonstration project, we will describe the distribution of mammography screening captured by the submitted electronic health records in the large and diverse participant sample of the All of Us Research Program. Further, we will describe the participant characteristics that are associated with mammography rates in women during the ages in which national guidelines suggest routine screening.

Scientific Approaches

After limiting ourselves to All of Us research participants with electronic health record information, we will identify rates of mammography screening using the procedure and diagnosis tables. Using the participant provided information from the surveys, we will use logistic regression to identify participant characteristics that are associated with higher or lower rates of screening.

Anticipated Findings

Some prior research has attempted to validate self-reported mammography screening against electronic health record verification of the screening. Largely, this research has found that (1) mammography rates are likely lower than self-report suggests and (2) certain patient characteristics are associated with lower rates of screening.

We anticipate that these findings will largely hold in the All of Us study population, and that the diversity of the All of Us participants will allow us to better identify those who may need more assistance to achieve the recommended screening frequency.

Demographic Categories of Interest

  • Race / Ethnicity
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Molly Scannell Bryan - Early Career Tenure-track Researcher, University of Illinois at Chicago

Demonstration Project - Pregnancy and Medicaid Expansion

Project Purpose(s)

  • Disease Focused Research (Pregnancy)
  • Population Health ...

Scientific Questions Being Studied

We plan to compare pregnancy health outcomes between states that have and have not pursued Medicaid expansion. Based on previous studies, Medicaid expansion improves health outcomes in pregnant women by increasing access to prenatal and postnatal care, increasing utilization of medical services, and reducing racial disparities.

Scientific Approaches

Not available.

Anticipated Findings

We expect to replicate previous studies that have shown Medicaid expansion improves health outcomes in pregnant women by increasing access to prenatal and postnatal care, increasing utilization of medical services, and reducing racial disparities.

Demographic Categories of Interest

  • Geography
  • Access to Care
  • Income Level

Research Team

Owner:

  • Julia Vogel - Project Personnel, Scripps Research

Collaborators:

  • Shaquille Peters - Project Personnel, Scripps Research
  • Lauren Ariniello

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

depression

Project Purpose(s)

  • Population Health ...

Scientific Questions Being Studied

Built of marginalization-related diminished returns (MDRs), my research uses All of Us data has the following aims:
1- to test diminished returns of SES for marginalized groups compared to US-born heterosexual White people
2- to test differential association between SES and depression across diverse racial and ethnic groups
3- to test differential association between depression and chronic disease across diverse racial and ethnic groups

Scientific Approaches

I use cross-sectional design, to test group differences in the associations between demographic factors (age and sex), SES (education and income), chronic disease (heart disease, asthma, and etc), and depression. I use regression to test whether the effects of education on reducing risk of depression, chronic disease, or health risk behaviors are smaller for Black than White people. Similar question is asked about comparison of Hispanic and Non-Hispanic people. I include main effects of SES, main effects of race/ethnicity, and statistical interactions between race/ethnicity and SES.

Anticipated Findings

My expected findings are
1- diminished returns of SES for marginalized groups
2- differential association between depression across diverse groups
3- different association between depression and chronic disease across diverse groups
In general, associations are expected to be strongest in White and weakest in Black populations. The same is expected in children, youth, adults, and older adults.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Shervin Assari - Mid-career Tenured Researcher, Charles R. Drew University of Medicine and Science

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Depression

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)
  • Social / Behavioral ...
  • Ancestry

Scientific Questions Being Studied

My purpose is to investigate underlying genetic architecture of Major Depressive Disorder in AllofUs participants.

Scientific Approaches

My primary approach is to use a GWAS in available ancestral groups, probably using PLINK or GEMMA depending on the structure of the data. I will use summary statistics from this approach to investigate overlap between other large cohorts and traits.

I would also like to apply polygenic risk scores to assess genetic risk prediction in an independent cohort.

Anticipated Findings

This could identify novel risk loci for depression, in combination with other available datasets for depression. Downstream in-silico anaylsis will look to better understand the complex underlying biology of depression. I am particularly interested in pushing forward the current state of the field in African and Hispanic ancestries, which are currently underrepresented.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Research Team

Owner:

  • Daniel Levey - Other, Yale University

Dermatology

Project Purpose(s)

  • Disease Focused Research (psoriasis, atopic dermatitis, acne, and more) ...

Scientific Questions Being Studied

Dermatologic diseases in minority populations

Scientific Approaches

Initially exploratory. Plan on conducting cohort comparisons amongst psoriasis, acne, atopic dermatitis, and other rare dermatologic diseases as well as looking at cross-sectional data for these diseases.

Anticipated Findings

Expand the known information of dermatologic disease in minority populations.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Ahmed Yousaf - Research Fellow, West Virginia University

Determinants of cardiovascular disease across minority populations

Project Purpose(s)

  • Disease Focused Research (cardiovascular system disease)
  • Population Health ...
  • Social / Behavioral
  • Ancestry

Scientific Questions Being Studied

Cardiovascular disease (CVD) are responsible for a substantial proportion of the morbidity and mortality observed in the general population. Mounting evidence indicates that this impact disproportionately affects minority populations. This disproportionate effect is not only present in minorities defined by race/ethnicity, but also in those defined by age, sexual orientation, and other characteristics. The main questions of this study are: (1) can we use All of US to identify novel risk factors for cardiovascular disease that are specific to a given minority group? (2) Are existing risk factors for CVD shared across all minority groups? (3) How do the effects of these risk factors vary when considering more than one minority group? These questions are important to (1) identify groups of persons at particularly high risk of sustaining these conditions that may benefit from tailored diagnostic and therapeutic interventions; and (2) identify new treatments for these conditions.

Scientific Approaches

We will use the All of US dataset V3. We will identify variables that represent (1) cardiovascular disease (myocardial infarction, coronary artery disease, stroke); (2) all the known risk factors for each of these conditions; (3) physiological variables that either define a risk factor or are associated with risk of cardiovascular disease (blood pressure, cholesterol levels, hemoglobin A1C); and (4) identify the minority groups of interest. We will use linear and logistic regression to test for association between risk factors and the conditions of interest.

Anticipated Findings

We expect to find that: (1) a substantial number of the known vascular risk factors increase risk of cardiovascular disease in across all evaluated groups; (2) known risk factors for cardiovascular disease disproportionately affect some minority groups; and (3) the effect of these risk factors will be stronger in some minority groups. These findings will helps us to (1) identify groups of persons at particularly high risk of sustaining these conditions that may benefit from tailored diagnostic and therapeutic interventions; and (2) identify new treatments for these conditions.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Guido Falcone - Early Career Tenure-track Researcher, Yale University

Collaborators:

  • Julian Acosta - Research Fellow, Yale University
  • Audrey Leasure - Graduate Trainee, Yale University

Determinants of cardiovascular disease across minority populations V4

Project Purpose(s)

  • Disease Focused Research (cardiovascular system disease)
  • Population Health ...
  • Social / Behavioral
  • Ancestry

Scientific Questions Being Studied

Cardiovascular disease (CVD) are responsible for a substantial proportion of the morbidity and mortality observed in the general population. Mounting evidence indicates that this impact disproportionately affects minority populations. This disproportionate effect is not only present in minorities defined by race/ethnicity, but also in those defined by age, sexual orientation, and other characteristics. The main questions of this study are: (1) can we use All of US to identify novel risk factors for cardiovascular disease that are specific to a given minority group? (2) Are existing risk factors for CVD shared across all minority groups? (3) How do the effects of these risk factors vary when considering more than one minority group? These questions are important to (1) identify groups of persons at particularly high risk of sustaining these conditions that may benefit from tailored diagnostic and therapeutic interventions; and (2) identify new treatments for these conditions.

Scientific Approaches

We will use the All of US dataset V4. We will identify variables that represent (1) cardiovascular disease (myocardial infarction, coronary artery disease, stroke); (2) all the known risk factors for each of these conditions; (3) physiological variables that either define a risk factor or are associated with risk of cardiovascular disease (blood pressure, cholesterol levels, hemoglobin A1C); and (4) identify the minority groups of interest. We will use linear and logistic regression to test for association between risk factors and the conditions of interest.

Anticipated Findings

We expect to find that: (1) a substantial number of the known vascular risk factors increase risk of cardiovascular disease in across all evaluated groups; (2) known risk factors for cardiovascular disease disproportionately affect some minority groups; and (3) the effect of these risk factors will be stronger in some minority groups. These findings will helps us to (1) identify groups of persons at particularly high risk of sustaining these conditions that may benefit from tailored diagnostic and therapeutic interventions; and (2) identify new treatments for these conditions.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Julian Acosta - Research Fellow, Yale University

Collaborators:

  • Guido Falcone - Early Career Tenure-track Researcher, Yale University
  • Audrey Leasure - Graduate Trainee, Yale University

Diabetes and CVD Risk Factor Control and Treatment Patterns

Project Purpose(s)

  • Disease Focused Research (Cardiovascular disease and diabetes) ...

Scientific Questions Being Studied

Understanding the spectrum risks found in persons with diabetes and ASCVD in a contemporary cohort of US adults can help target the intensity of therapeutic approaches to prevent future adverse outcomes.

The project will examine the following questions:

1. Within those with diabetes, how many are higher risk (two or more risk factors) and within those with ASCVD how many are at very high risk according to the recent 2018 cholesterol guidelines?
2. Among these higher versus lower risk patients with ASCVD and DM, what is the adherence to non-smoking, healthy diet, and regular physical activity?
3. Among these groups, what is the proportion at target for BP (<130/80 mmHg), LDL-C (< 70 mg/dl if CVD, <100 mg/dl otherwise), HbA1c (<8% if with CVD or <7% if not)?
4. How does the extent of recommended medication use compare among these groups, including statins, BP medication, DM medication, antiplatelet/aspirin therapy, and influenza vaccination.

Scientific Approaches

The dataset we would like to utilize would be participants’ provided information (PPI), electronic health records (EHR) and physical measurements, which provide information about the participant’s overall health status, lifestyle, medication history, serum biochemicals and demographic characteristics.

We will categorize the sample with and without diabetes mellitus (and within diabetes those with vs, without multiple risk factors) or cardiovascular diseases (and among such persons those at high vs. very high risk), and use Chi-square test to compare the extent of adherence to lifestyle measures (in# 2 above), risk factor control (#3 above), and extent of recommended medication use (#4 above). Multiple logistic regression will be used to examine whether the extent of single and multiple risk factor control is and within these conditions, those at higher versus lower risk and use multiple logistic regression to test the behavior and risk factors difference among different groups.

Anticipated Findings

We anticipate that there are different patterns in terms of health behaviors and risk factors control across the spectrum of diabetes and cardiovascular diseases, which will help to identify the possible prevention strategies towards different kinds of DM and CVDs and realize precision medicine among the patients.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Yufan Gong - Graduate Trainee, University of California, Los Angeles

Collaborators:

  • Nathan Wong - Other, University of California, Irvine

Diabetes and CVD Risk Factor Control and Treatment Patterns using v4 data

Project Purpose(s)

  • Disease Focused Research (Cardiovascular disease and diabetes) ...

Scientific Questions Being Studied

Understanding the spectrum risks found in persons with diabetes and ASCVD in a contemporary cohort of US adults can help target the intensity of therapeutic approaches to prevent future adverse outcomes.

The project will examine the following questions:

1. Within those with diabetes, how many are higher risk (two or more risk factors) and within those with ASCVD how many are at very high risk according to the recent 2018 cholesterol guidelines?
2. Among these higher versus lower risk patients with ASCVD and DM, what is the adherence to non-smoking, healthy diet, and regular physical activity?
3. Among these groups, what is the proportion at target for BP (<130/80 mmHg), LDL-C (< 70 mg/dl if CVD, <100 mg/dl otherwise), HbA1c (<8% if with CVD or <7% if not)?
4. How does the extent of recommended medication use compare among these groups, including statins, BP medication, DM medication, antiplatelet/aspirin therapy, and influenza vaccination.

Scientific Approaches

The dataset we would like to utilize would be participants’ provided information (PPI), electronic health records (EHR) and physical measurements, which provide information about the participant’s overall health status, lifestyle, medication history, serum biochemicals and demographic characteristics.

We will categorize the sample with and without diabetes mellitus (and within diabetes those with vs, without multiple risk factors) or cardiovascular diseases (and among such persons those at high vs. very high risk), and use Chi-square test to compare the extent of adherence to lifestyle measures (in# 2 above), risk factor control (#3 above), and extent of recommended medication use (#4 above). Multiple logistic regression will be used to examine whether the extent of single and multiple risk factor control is and within these conditions, those at higher versus lower risk and use multiple logistic regression to test the behavior and risk factors difference among different groups.

Anticipated Findings

We anticipate that there are different patterns in terms of health behaviors and risk factors control across the spectrum of diabetes and cardiovascular diseases, which will help to identify the possible prevention strategies towards different kinds of DM and CVDs and realize precision medicine among the patients.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Yufan Gong - Graduate Trainee, University of California, Los Angeles

Collaborators:

  • Nathan Wong - Other, University of California, Irvine

Diabetes Comparison Workspace

Project Purpose(s)

  • Other Purpose (This workspace's main purpose will be to provide a place to learn first hand how to create and analyze data from All of Us. The "research aim" of this project will be to compare diabetes patients and control patients, however this is only meant as a directive for the ultimate purpose of better understanding workspace creation and analysis in AoU. ) ...

Scientific Questions Being Studied

What are the differences in A1c levels between diabetic and control populations and how do these comparisons vary when controlling for other covariates (age, gender, race, demographic information).

Scientific Approaches

We plan to use simple comparative statistical analyses such as t-tests and Bayesian analyses to explore group differences. Linear regression and more advanced modeling techniques (regularized regression, tree based methods) may be used to further define differences between the groups. Most of the analysis will be conducted in R.

Anticipated Findings

Anticipated findings are that A1c levels are higher among diabetes and prediabetes patients than controls.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Kyle Webb - Project Personnel, NIH

Collaborators:

  • Josh Denny - Other, All of Us Program Operational Use

DiabetesAndEyeDiseases

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

People who have diabetes are more likely to develop several eye diseases or conditions, such as diabetic retinopathy, cataracts and open-angle glaucoma. For people who have diabetes, it is important to get regular comprehensive dilated eye exam to identify these conditions and have early treatment. This study is to survey cohorts of AoU participants who have diabetes and who do not, and compare the incidence rates of these eye conditions, and to examine their EHR data for regular eye exams as a means of prevention.

Scientific Approaches

1) Building datasets: identify participants who have type 2 diabetes and randomly select equal number of participants without type 2 diabetes
2) Search the EHR records of the participants to identify eye diseases and conditions
3) Run statistical analysis to find the difference on incidence rate between the two populations

Anticipated Findings

Type 2 diabetic participants will have a higher incidence rate of eye diseases or conditions compared to non-diabetic participants. Using the EHR data, if the type 2 participants do not have more frequent eye exams compared to non-diabetic participants, the findings in the study will provide evidence to recommend for regular eye exams for diabetic patients.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • James Gao - Administrator, NIH

Diabetic Foot ulcer

Project Purpose(s)

  • Disease Focused Research (diabetes mellitus) ...

Scientific Questions Being Studied

Evaluate the differences in healthcare access among adults with diabetes and a foot ulcer, based on race/ethnicity, geography, and other socioeconomic factors

Scientific Approaches

We will utilize the existing survey data from the All of Us Research Program (AOURP) to determine the disparities in healthcare access and utilization among participants with diabetes and those with foot ulcerations.

Anticipated Findings

Racial/ethnic minorities, those living in rural areas, and of low socioeconomic classes will experience disparities in access to care and health care utilization compared to whites and general population.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Tze-Woei Tan - Mid-career Tenured Researcher, University of Arizona

Collaborators:

  • Chiu-Hsieh Hsu - Late Career Tenured Researcher, University of Arizona

Disease_convergence_and_lifestyle

Project Purpose(s)

  • Population Health
  • Methods Development ...
  • Ancestry

Scientific Questions Being Studied

Multiple genetic polymorphisms have been identified for complex diseases, but relationships, such as the biological underpinning of genetic interactions, are still elusive. Epigenomic studies have shown that genetic variants may have convergent effects, which increase the risk of developing complex diseases and comorbidities. We aim to prioritize the genetic variants with convergent effects and diseases of excess epigenomic similarity from the abundant biological resources, such as ENCODE and GTEx. We will then study the agreement between the convergent effects and interactions of genetic variants in AllofUsRP and the agreement between disease epigenomic similarity and disease comorbidities in AllofUsRP. Lifestyle and environment exposures are critical risk factors, and their effects will be modeled as well. The research will help us understand disease mechanisms and missing heritability and foster applications like drug repositioning.

Scientific Approaches

We have developed an information-theoretical based similarity for quantifying the similarity of genetic variants and disease pairs from GTEx data. We have also developed a multi-omics integration method to quantify the overall similarity of genetic variants in ENCODE. We will extend the latter method to quantify the epigenomic similarity for disease pairs. We aim to use AllofUsRP for validating the genetic interactions between genetic variants and comorbidities. Further, we will use, logistic regression, LASSO, and deep learning methods to model diseases from lifestyles and genetic interactions.

Anticipated Findings

We expect to find many unexpected biological links between the effects of distinct genetic variants, which may explain the increased risk of diseases and comorbidities. With machine learning models, we will build disease prediction models, particularly those impacted heavily by lifestyles, such as cancers. The research will generate candidates for novel drug targets and drug repositioning approaches.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Haiquan Li - Early Career Tenure-track Researcher, University of Arizona

Collaborators:

  • Edwin Baldwin - Graduate Trainee, University of Arizona

Diseases with similar symptoms as COVID-19

Project Purpose(s)

  • Disease Focused Research (COVID-19) ...

Scientific Questions Being Studied

We want to find diseases with similar symptoms as COVID-19. Such diseases can be used as features in COVID-19 prediction models.

Scientific Approaches

Use mutual information to find diseases/conditions that have high co-occurrence with COVID-19's common symptoms.

Anticipated Findings

We expect to find ~100 diseases/conditions that have high co-occurrence with COVID-19's common symptoms. Considering these diseases/conditions when building COVID-19 prediction models can help to reduce false-positive predictions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jifan Gao - Graduate Trainee, University of Wisconsin, Madison

Disparities in Kidney, Bladder, and Prostate Cancer

Project Purpose(s)

  • Disease Focused Research (cancer)
  • Population Health ...
  • Social / Behavioral
  • Control Set
  • Ancestry

Scientific Questions Being Studied

What are underlying factors that drive disparities in kidney, bladder, and prostate cancer? How do these factors result in increased risk? Identifying risk factors is important as it may allow for more targeted screening, early detection, and personalized therapies.

Scientific Approaches

We plan to look at datasets of kidney, bladder, and prostate cancers. We would like to include analysis of associations for genetic, socioeconomic or geographic variables.

Anticipated Findings

We anticipate that we may identify novel genetic, socioeconomic, or genetic factors as contributors to risk in genitourinary cancers. These findings may improve risk assessment in determining personalized screening, allow for earlier identification of disease, or facilitate targeted therapies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jacob Knorr - Graduate Trainee, Cleveland Clinic

Distress and T2D

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes mellitus) ...

Scientific Questions Being Studied

Depression, anxiety, and other forms of mental distress are frequently co-morbid with type 2 diabetes. Are there common risk factors between the two?

Scientific Approaches

Comparison demographics and medications of individuals diagnosed with type 2 diabetes with and without mental distress.

Anticipated Findings

If we know that someone is at risk for mental distress, we might be able to provide increased support to mitigate the effects.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Sara Taylor - Research Fellow, Massachusetts Institute of Technology

DIVERS

Project Purpose(s)

  • Disease Focused Research (Baseline analysis of the population with documented vaccination record.)
  • Population Health ...
  • Social / Behavioral

Scientific Questions Being Studied

Immunizations are one of the most important and effective preventative health measures available, but relative to public health goals, are underutilized in adults. Developing a better understanding of how vaccines are used positions us to develop strategies to mitigate modifiable risk factors and improve vaccination rates. Results from our study will address a knowledge gap in understanding data characteristics available in the All of Us dataset for patients who have received one or more vaccine(s). Results from this study is intended to provide us with stronger justification for access to medical and pharmacy claims data to develop prediction models on which variables have the highest amplitude of impact.

Scientific Approaches

The All of Us database will be used as a source population for a convenience sample in this cross-sectional study to characterize the sociodemographic, health-related and lifestyle characteristics of adults who receive single and/or multiple types of vaccines in addition to health and lifestyle choices. Data on a cohort of patients who received vaccine doses during a period of time that may vary depending on the indicated use of each vaccine will collected from the All of Us database and reviewed for trends. These categories of factors will also be used to compare groups of participants who completed the hepatitis B and HPV vaccine series with those who started, but did not complete these vaccine series. Descriptive statistics, correlations and cross-tabulation will be used to describe specific differences in racial/ethnic, socioeconomic, gender-based, and health- and lifestyle-related determinants of the use of vaccines in patients included in this unique database.

Anticipated Findings

This proposed study provides a cross-sectional evaluation of the All of Us program data to develop a baseline understanding of the relevant and available vaccination data. Most existing literature describing adult vaccination rates are based on self-reports and few focus on multiple vaccines with correlating health data. Comprehensive data regarding vaccination rates exist, but lack investigation to specific lifestyle, health, and sociodemographic characteristics. More importantly descriptions of vaccine studies have been mainly limited to individual vaccine types. Developing a baseline assessment of individuals who have vaccine data included in the All of Us program will provide insights into which characteristics are modifiable as well as a description of the data available. This is intended to serve as a starting point for future research endeavors.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Alexandre Chan - Late Career Tenured Researcher, University of California, Irvine

Collaborators:

  • Stanley Jia - Undergraduate Student, University of California, Irvine
  • Ding Quan Ng - Graduate Trainee, University of California, Irvine

Diversity within Eating Disorders

Project Purpose(s)

  • Population Health ...

Scientific Questions Being Studied

We will explore sociodemographic variables (e.g., gender, sexual orientation, race/ethnicity) in relation to eating disorder diagnoses. We specifically are interested in disparities in the occurrence of eating disorder diagnoses, access to treatment, age of initial diagnosis, and associated distress and impairment. That is, are some sociodemographic groups more likely to receive a diagnosis of eating disorders, receive care, have varying age of initial diagnosis, and/or experience disproportionate distress/impairment, than other sociodemographic groups? These research questions are important, as there is limited research exploring intersecting identities within eating disorders.

Scientific Approaches

Within the All of Us dataset V3, we will exact sociodemographic variables (gender, sexual orientation, race/ethnicity) eating disorder diagnosis and treatment, and items from the 'Overall Health' survey to assess distress/impairment and quality of life. Initially, descriptive statistics will be used to report the frequencies of eating disorder diagnoses and treatment as a function of the aforementioned sociodemographic variables. Should there be adequate statistical power, logistic regression models will be employed with sociodemographic variables set as 'predictors' of binary eating disorder 'outcomes.' Additionally, within individuals diagnosed with an eating disorder, we will examine sociodemographic differences in distress/impairment and quality of life via linear regression. Metrics of effect size estimates will also be reported. Should statistical power allow us, interaction terms by sociodemographic variables will also be tested.

Anticipated Findings

There is limited research on intersecting identities among individuals diagnosed with eating disorders. By employing the All of Us dataset, we may be able to identify health disparities in the occurrence of eating disorders and/or associated distress/impairment and quality of life. Results may help guide additional research efforts into understanding the mechanisms which may place some populations at disproportionate risk, which subsequently could lead to refined and tailored eating disorder prevention and treatment approaches.

Demographic Categories of Interest

  • Race / Ethnicity
  • Gender Identity
  • Sexual Orientation
  • Access to Care

Research Team

Owner:

  • Aaron Blashill - Mid-career Tenured Researcher, San Diego State University

Collaborators:

  • Melissa Simone - Research Fellow, University of Minnesota
  • Alexandra Convertino - Graduate Trainee, San Diego State University
  • Jamie-Lee Pennesi - Research Fellow, San Diego State University
  • Jonathan Helm - Early Career Tenure-track Researcher, San Diego State University
  • Autumn Askew - Project Personnel, University of Minnesota

DJS: Duplicate of JAMA PheWAS Final Review 05-21-2020

Project Purpose(s)

  • Methods Development
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

As a demonstration project, this study will present the results of Phenome-Wide Association Studies (PheWAS) to show how the various sources of data contained within All of Us research dataset can be used to inform scientific discovery. We will perform separate PheWAS studies with smoking status as the independent variable. Specific questions include:

1. How can one implement a PheWAS within the All of Us Researcher Workbench?
2. How can one use heterogeneous data sources within the All of Us dataset to explore disease associations using self-reported exposures (Participant Provided Information, or “PPI”) and exposures captured in the electronic medical record (EHR).”

There is no pre-specified hypothesis. It is important to determine if PheWAS can be conducted within the All of Us workbench

Scientific Approaches

As a demonstration project, this study will present the results of Phenome-Wide Association Studies (PheWAS) to show how the various sources of data contained within All of Us research dataset can be used to inform scientific discovery. We will perform separate PheWAS studies with smoking status as the independent variable. Specific questions include:

1. How can one implement a PheWAS within the All of Us Researcher Workbench?
2. How can one use heterogeneous data sources within the All of Us dataset to explore disease associations using self-reported exposures (Participant Provided Information, or “PPI”) and exposures captured in the electronic medical record (EHR).”

There is no pre-specified hypothesis. It is important to determine if PheWAS can be conducted within the All of Us workbench

Anticipated Findings

For this study, we anticipate that we will be able to replicate known disease associations with smoking exposure. This will serve to demonstrate the quality, utility, and diversity of the All of Us data and tools and the power of gathering multiple data sources for a single phenotype, providing researchers options for study design and validation. Importantly the entire PheWAS package is made available for reuse by researchers in the Workbench, for new hypothesis generation.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • David Schlueter - Research Fellow, NIH

DRC_Duplicate of For_HTN_code_review

Project Purpose(s)

  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy.) ...

Scientific Questions Being Studied

We are using the All of Us Researcher Workbench interface to answer the question, "Is hypertension prevalence in the All of Us Research Program similar to hypertension prevalence in the 2015–2016 National Health and Nutrition Examination Survey (NHANES) ?". Clinical approaches to understanding and treating hypertension may benefit from the integration of a precision medicine approach that integrates data on environments, social determinants of health, behaviors, and genomic factors that contribute to hypertension risk. Hypertension is a major public health concern and remains a leading risk factor for stroke and cardiovascular disease.

Scientific Approaches

In this cross-sectional, population-based study, we used All of Us baseline data from patient (age>18) provided information (PPI) surveys and electronic health record (EHR) blood pressure measurements and retrospectively examined the prevalence of hypertension in the EHR cohort using Systemized Nomenclature of Medicine (SNOMED codes and blood pressure medications recorded in the EHR. We used the EHR data (SNOMED codes on 2 distinct dates and at least one hypertension medication) as the primary definition, and then add subjects with elevated systolic or elevated diastolic blood pressure on measurements 2 and 3 from PPI. We extracted each participant’s detailed dates of SNOMED code for essential hypertension from the Researcher Workbench table ‘cb_search_all_events’. We calculated an age-standardized HTN prevalence according to the age distribution of the U.S. Census, using 3 groups (18-39, 40-59, ≥ 60).

Anticipated Findings

The prevalence of hypertension in the All of Us cohort is similar to that of published literature. All of Us age-adjusted HTN prevalence was 27.9% compared to 29.6% in National Health and Nutrition Examination Survey. The All of Us cohort is a growing source of diverse longitudinal data that can be utilized to study hypertension nationwide. The prevalence of hypertension varies in the United States (U.S.) by age, sex, and socioeconomic status. Hypertension can often be treated successfully with medication, and prevented or delayed with lifestyle modifications. Even with these established hypertension intervention and prevention strategies, the prevalence of hypertension continues to be at levels of public health concern. The diversity within All of Us may provide insight into factors relevant to hypertension prevention and treatments in a variety of social and geographic contexts and population strata in the U.S.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Francis Ratsimbazafy - Other, All of Us Program Operational Use
  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

DRC_Duplicate of for_obesity_code_review

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Educational ...
  • Methods Development
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy.)

Scientific Questions Being Studied

National obesity prevention and intervention strategies may benefit from precision medicine approaches that incorporate integrated data on environments, social determinants of health, and genomic factors. We examined the quality and utility of the All of Us Research Hub Workbench for accelerating precision medicine by replicating methods from existing studies that examine the prevalence of obesity at the population level. We evaluated the measurements of obesity in the participant measurement (PM) data set and the electronic health record (EHR) data set using methods similar to the Ward et al. NEJM December 2019 publication that assessed prevalence of obesity in the US by state using BRFSS data.

Scientific Approaches

For this population-based cross-sectional study of All of Us Research Workbench participants, we excluded individuals with measurements obtained during pregnancy or inpatient visits and individuals from states with fewer than 100 participants. Physical measurements (PM) of height and weight at the time of program enrollment of 142,116 participants and measured weight and height extracted from electronic health records (EHR) of 40,885 individuals were used to calculate body-mass index (BMI). We did a complete case analysis for All of Us participants with known sex (male or female), race, income and education levels and estimated state-specific and demographic subgroup-specific prevalence of categories of BMI [obesity (BMI ≥30) and extreme obesity (BMI ≥ 35)] nationwide and for each state: overall and by subgroups, male and female. We examined the difference between EHR and PM calculated BMI by state.

Anticipated Findings

Using states with at least 100 participants, PM data included 142,116 individuals (mean [SD] age, 51.2 [16.6] and EHR data on height and weight included 40,885 individuals (mean [SD] age, 52.5 [16.5]. The median BMI for PM participants was 28.4 [24.4 to 33.7]; the median BMI for EHR was 29.0 [24.8 to 34.5]. The PM national prevalence for obesity (includes BMI>30 and BMI >35) and extreme obesity (BMI >35) were 41.2 % (95% Confidence Interval [CI], 40.9 to 41.4) and 20.8% (95% CI, 20.6 to 21.0), respectively, with large variations across states. Women had higher prevalence of extreme obesity than men in all selected states. Subgroups with extreme obesity (BMI, >35) prevalence greater than 25% included subgroup, N, prevalence %, (95% CI): Black NH, 8913, 28.9 (25.8 to 32.0) , individuals with income less than $25,000, 13,244, 25.1 (22.1 to 28.1); education of high school to some college, 17, 272, 26.1 (23.1 to 29.1) and the region of the South, 6,639, 25.3 (22.3 to 28.3).

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital
  • Jun Qian - Other, All of Us Program Operational Use

Drug Repurposing Validation Study

Project Purpose(s)

  • Drug Development
  • Methods Development ...

Scientific Questions Being Studied

New drug development is expensive, takes a long time, and often fails. An alternative approach to finding effective drugs for diseases is to find new indications for existing drugs, a method known as drug repurposing. Since existing drugs have well-characterized safety profiles, drug repurposing can potentially reduce the risk of failure caused by adverse reactions and save costs associated with studies aimed at gathering safety data. However, a known technical barrier to successfully repurposing drugs is the identification of good drug candidates. Clinical data stored in electronic health records (EHR) offer an opportunity to systematically identify drug repurposing candidates.

Scientific Approaches

We plan to use the All of Us dataset to validate an approach to identify drug repurposing candidates. We will look at the effect of drug repurposing candidates on lab biomarkers, using a self-controlled case series study design. To identify patient cohorts, we will use SQL code designed to extract data from EHR data organized using the OMOP/OHDSI Common Data Model. We will then use an R package that we have developed to process the extracted EHR data and conduct statistical analyses.

Anticipated Findings

We hope to validate the drug repurposing candidates that we have identified in our local EHR database, and to demonstrate the portability of our pipeline in the All of Us dataset. The main contribution of this study would be a generalizable approach to mine clinical data in EHRs to identify promising drug repurposing candidates, drugs that future experiments can be designed to verify.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Patrick Wu - Graduate Trainee, Vanderbilt University

Duplicate (latest ver, for testing) of Systemic Disease and Glaucoma

Project Purpose(s)

  • Disease Focused Research (Primary open angle glaucoma)
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy. ) ...

Scientific Questions Being Studied

We have previously published a predictive model of glaucoma progression using electronic health record (EHR) data pertaining to systemic attributes from a single institution. We aim to use the All of Us dataset to 1) serve as external validation for this single-center model and 2) to train new models focused on predicting glaucoma progression using systemic predictors. This is important to understand whether the original findings are generalizable and provide additional knowledge about the utility of systemic predictors on a national-level dataset.

Scientific Approaches

We plan to primarily work with EHR data contained in All of Us for a cohort of adult participants diagnosed with primary open-angle glaucoma. We will extract data on systemic conditions and medications for this cohort, as well as physical measurements and vital signs. We will clean the data such that the format is consistent with the data from our previously published model. Then, we will use this data as an external validation of a logistic regression model derived from our prior study that was based at a single academic center. Next, we will use All of Us data to train a new set of models, using techniques such as logistic regression, random forests, and artificial neural networks. We will optimize these models using feature selection methods and class balancing procedures. By evaluating performance metrics such as area under the curve (AUC), precision, recall, and accuracy, we will assess whether we can achieve superior predictive performance when training models using All of Us.

Anticipated Findings

We anticipate that the All of Us data will validate the findings from the model, which demonstrated that blood pressure-related metrics and certain medication classes had predictive value for glaucoma progression. In addition, we anticipate that the models trained with All of Us data will outperform the model trained with single institution data due to larger sample size and greater diversity. These findings will support further investigation in understanding the relationship between systemic conditions like blood pressure with glaucoma progression.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Bharanidharan Radha Saseendrakumar - Project Personnel, University of California, San Diego

Duplicate of ARI Workspace -7-29-20 #1

Project Purpose(s)

  • Disease Focused Research (Autoimmune diseases) ...

Scientific Questions Being Studied

The goal of our research is to determine prevalence of autoimmune diseases, individually and as a class of disease, in the US. This work will help understand the likelihood of having autoimmune disease and we hope it will improve the ability of doctors to diagnose patients as it will establish the prior probability of having one of these many diseases.

Scientific Approaches

We will create three data sets for analysis:

1. A list of diseases rated in the following ways:

a. Evidence Class
i. Strong evidence it is autoimmune
ii. Moderate evidence it is autoimmune
iii. Weak evidence for autoimmunity
iv. A comorbidity of autoimmune disease
v. Symptom or symptom set with no known mechanism

b. Autoinflammatory versus autoimmune flag

c. “Not always autoimmune” flag – to indicate diseases that could have alternative mechanisms of cause

2. A list of patients, anonymized, with socioeconomic, geographic and other data that would be of interest to patients and public health officials to understand which communities are affected by these diseases
3. Outcomes data for patients over time assessing quality of life using PROMIS metrics

Anticipated Findings

The current NIH estimate of 23.5 million people with autoimmune disease was a guess by a knowledgable clinician, but has no scientific support. As a consequence, there are numerous figures in the public sphere and nobody knows which one is correct.

Many reports say autoimmune diseases are on the increase, but since the number is unknown, it is impossible to say whether this is a public health issue or not. Having a methodology that can be used to recompute the number of people with autoimmune disease will help us understand if these reports are true.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Aaron Abend - Senior Researcher, Autoimmune Registry

Duplicate of Atrial Fibrillation and Race

Project Purpose(s)

  • Disease Focused Research (atrial fibrillation) ...

Scientific Questions Being Studied

Exploring number of minorities who have atrial fibrillation anywhere in the electronic health record.

Scientific Approaches

Gathering total number of people who have had atrial fibrillation within the database and then looking at the number of people from each race with AF. Also looking at inflammation.

Anticipated Findings

Prior research has found that minorities, particularly African Americans and Hispanics, have lower risk for AF. Looking at why this is.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Lisa White - Project Personnel, University of Arizona

Duplicate of Barrett's Esophagus

Project Purpose(s)

  • Other Purpose (To conduct research to understand the demographics, risk factors, and outcomes of people diagnosed with Barrett's esophagus. The purpose of this research is to increase our scientific understanding of this condition with the hope of improving management and potential risks. ) ...

Scientific Questions Being Studied

To conduct research to understand the demographics, risk factors, and outcomes of people diagnosed with Barrett's esophagus. The purpose of this research is to increase our scientific understanding of this condition with the hope of improving management and potential risks. This research is exploratory in nature at this point. Specific hypotheses to be tested will be added here at a later date and will be fully outlined before any investigation of such has begun.

Scientific Approaches

The primary dataset will be a Barrett's esophagus cohort. This may be compared with a matched cohorts from the general AOU population or individuals with gastroesophageal reflux, for example. Descriptive statistics and regression models will be used to assess demographics, risk factors and risks of Barrett's esophagus. This research is exploratory in nature at this point. Specific hypotheses to be tested will be added here at a later date and will be fully outlined before any investigation of such has begun.

Anticipated Findings

This research is exploratory in nature at this point. Anticipated findings and contributions to scientific knowledge will become more clear when specific hypotheses to be tested are added here at a later date.

Demographic Categories of Interest

  • Age
  • Sex at Birth
  • Geography
  • Education Level
  • Income Level

Research Team

Owner:

  • Michael Cook - Mid-career Tenured Researcher, NIH

Duplicate of Dementia-Hypertension-Diabetes-2_DatasetV3

Project Purpose(s)

  • Disease Focused Research (dementia)
  • Methods Development ...
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy.)

Scientific Questions Being Studied

Alzheimer’s disease is a neurodegenerative condition characterized by a progressive decline in cognitive function (dementia). Studies suggest that patients with elevated blood pressure (hypertension) are at risk of Alzheimer’s disease type dementias. High blood sugar levels or Type2 Diabetes Mellitus may also be associated with an increased risk of dementia. Some minority populations may have an increased incidence of hypertension and diabetes. For example, African Americans have a higher incidence of hypertension. Therefore we will to investigate the grouping of racial and ethnic categories, with respect to the incidence of hypertension, diabetes and dementia, to determine whether minority groups have a stronger association between dementia and co-morbidities by race/ ethnicity.
The goal of this demonstration project is to validate previous research showing potential interactions between dementia, diabetes, and hypertension, with an explicit consideration of race/ ethnicity.

Scientific Approaches

Data from participants (aged 40 or over) will be subjected to statistical analysis to identify interactions between the incidence of dementia, Diabetes, and Hypertension, and self-identified Race/ Ethnicity. We will only analyze participants in this age group, because the incidence of dementia is very low in patients younger than 40. We will only analyze patients with electronic health care data, because we have to ensure that patients have not had a diagnosis of hypertension, dementia or diabetes.

The statistical analysis package R will be used to create contingency tables, perform chi-squared and Cochran-Mantel-Haenszel tests. Figures will be created in R.

Anticipated Findings

We expect that our data will confirm an increased rate of dementia in African Americans with hypertension and diabetes, compared to white participants. We will determine whether other minorities also see a difference in incidence of dementia, hypertension diabetes and interactions between the them.

If there is an increased incidence of dementia in people with hypertension or diabetes, this may suggest that populations with these disorders need more careful monitoring of their conditions, as they may increase the chance of developing dementia. potentially future All of Us projects may be able to determine whether long term control of hypertension (or Diabetes/ blood glucose) may reduce the potential for developing dementia.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Robert Meller - Mid-career Tenured Researcher, Morehouse School of Medicine

Collaborators:

  • Shashwat Deepali Nagar - Graduate Trainee, Georgia Tech
  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Francis Ratsimbazafy - Other, All of Us Program Operational Use
  • King Jordan - Mid-career Tenured Researcher, Georgia Tech
  • Kelsey Mayo - Other, All of Us Program Operational Use

Duplicate of Demo - All of Us Descriptive Statistics

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the All of Us Data and Research Center to ensure compliance with program policy, including acceptable data access and use.)

Scientific Questions Being Studied

As a demonstration project, this study will present the overview of the data types available based on participant count, separating the surveys into Part 1 which includes the first three surveys ("The Basics”, “Overall Health” and “Lifestyle) participants completed, and Part 2 (“Healthcare Access & Utilization”, “Family History”, and “Personal Medical History”) which includes the second set of three surveys that were made available 90 days after enrollment. This study will also look at the overview of the electronic health records (EHR) data available and the physical measurements (PM) data obtained at time of enrollment to the program. We will also look at the total number of participants who have any survey response, PM, and EHR data combined and break it down by age, race, sex at birth, gender identity and look at the breakdown by under-representative biomedical research (UBR) groups.

Scientific Approaches

In this study, we will apply data visualization libraries to aggregate information about the Cohort. We will measure age by using the age reflected when the CDR was generated. Presence of a data type survey, PM, or EHR is counted if at least one observation is present within each category. We will use "The Basics" survey to select race and ethnicity and responses will be mapped to the race variable in the OMOP Person table. All participants responding ‘American Indian or Alaska Native’ will be removed from the CDR as All of Us engages the NIH Tribal Council on the research use of data. Program designations of status as UBR will be adapted to data available in the CDR .

Anticipated Findings

In this study, we anticipate creating plots to describe all of the participant breakdown by age, race, ethnicity, gender, sex at birth and per datatype. We will be using these plots for our All of Us Research Program Demonstration Projects publication as visuals describing the initial cohort released at Beta launch of the Researcher Workbench.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Patricia Izbicki - Project Personnel, University of Miami

Duplicate of Demo - All of Us Descriptive Statistics

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the All of Us Data and Research Center to ensure compliance with program policy, including acceptable data access and use.)

Scientific Questions Being Studied

As a demonstration project, this study will present the overview of the data types available based on participant count, separating the surveys into Part 1 which includes the first three surveys ("The Basics”, “Overall Health” and “Lifestyle) participants completed, and Part 2 (“Healthcare Access & Utilization”, “Family History”, and “Personal Medical History”) which includes the second set of three surveys that were made available 90 days after enrollment. This study will also look at the overview of the electronic health records (EHR) data available and the physical measurements (PM) data obtained at time of enrollment to the program. We will also look at the total number of participants who have any survey response, PM, and EHR data combined and break it down by age, race, sex at birth, gender identity and look at the breakdown by under-representative biomedical research (UBR) groups.

Scientific Approaches

In this study, we will apply data visualization libraries to aggregate information about the Cohort. We will measure age by using the age reflected when the CDR was generated. Presence of a data type survey, PM, or EHR is counted if at least one observation is present within each category. We will use "The Basics" survey to select race and ethnicity and responses will be mapped to the race variable in the OMOP Person table. All participants responding ‘American Indian or Alaska Native’ will be removed from the CDR as All of Us engages the NIH Tribal Council on the research use of data. Program designations of status as UBR will be adapted to data available in the CDR .

Anticipated Findings

In this study, we anticipate creating plots to describe all of the participant breakdown by age, race, ethnicity, gender, sex at birth and per datatype. We will be using these plots for our All of Us Research Program Demonstration Projects publication as visuals describing the initial cohort released at Beta launch of the Researcher Workbench.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Patricia Izbicki - Project Personnel, University of Miami

Duplicate of Demo - Medication Sequencing

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes, depression)
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

1- What are the main prescribed medication sequences that participants with type 2 diabetes and depression took over three years of treatment?
In this questions, we are extracting the anti-diabetes and anti-depressant medications used to to treated participants who have T2D and depression codes. We retrieved medications prescribed after the first diagnosis code for each disease. We represented the medications using their ATC 4th level.
2- What is the most common first anti-diabetic and anti-depressant that were prescribed for All of Us participants? We extracted the first medications prescribed to treat T2D and depression. We identified the most common first medication with the highest number of participants.
3- Is there a change in the percentages of participants who were prescribed first common medication, treated using one medication, treated only using one common medication between 2000-2018?

Scientific Approaches

In this project, we plan on using the medication sequencing developed at Columbia University and the OHDSI network as a means to characterize treatment pathways at scale. Further, we want to demonstrate implementation of these medication sequencing algorithms in the All of Us research dataset to show how the various sources of data contained within the program can be used to characterize treatment pathways at scale. We will perform separate medication sequence analyses for three different common, complex diseases: type 2 diabetes, depression
1- Data manipulation
Using python and BigQuery to:
A- Retrieve medication and their classes
B-Create the medications sequences

2- Visualization:
A- Creating sunburst to visualize the sequences
B- Plotting the percentages of participants the first common medication and one medication during three years

Anticipated Findings

For this study, we anticipate demonstrating the validity of the data by showing expected treatment patterns despite gathering data from over 30 individual EHR sites. Specifically, we expect to find:
1- Variation in the medication sequences prescribed to treat All of Us participants who had type 2 diabetes and depression.
2- The most common medication used to treat participants as first line treatment with type 2 diabetes and depression diagnosis.
3- A trend or change over time of prescribing the first common medication over the study period
4- Trend overtime for the percentage of participants
Importantly, the detailed code developed herein is made available within the Researcher Workbench to researchers, so that they may more easily extract medication data and class information using a common medication ontology, an approach useful in many discovery studies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Duplicate of Demo - Medication Sequencing

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes, depression)
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

1- What are the main prescribed medication sequences that participants with type 2 diabetes and depression took over three years of treatment?
In this questions, we are extracting the anti-diabetes and anti-depressant medications used to to treated participants who have T2D and depression codes. We retrieved medications prescribed after the first diagnosis code for each disease. We represented the medications using their ATC 4th level.
2- What is the most common first anti-diabetic and anti-depressant that were prescribed for All of Us participants? We extracted the first medications prescribed to treat T2D and depression. We identified the most common first medication with the highest number of participants.
3- Is there a change in the percentages of participants who were prescribed first common medication, treated using one medication, treated only using one common medication between 2000-2018?

Scientific Approaches

In this project, we plan on using the medication sequencing developed at Columbia University and the OHDSI network as a means to characterize treatment pathways at scale. Further, we want to demonstrate implementation of these medication sequencing algorithms in the All of Us research dataset to show how the various sources of data contained within the program can be used to characterize treatment pathways at scale. We will perform separate medication sequence analyses for three different common, complex diseases: type 2 diabetes, depression
1- Data manipulation
Using python and BigQuery to:
A- Retrieve medication and their classes
B-Create the medications sequences

2- Visualization:
A- Creating sunburst to visualize the sequences
B- Plotting the percentages of participants the first common medication and one medication during three years

Anticipated Findings

For this study, we anticipate demonstrating the validity of the data by showing expected treatment patterns despite gathering data from over 30 individual EHR sites. Specifically, we expect to find:
1- Variation in the medication sequences prescribed to treat All of Us participants who had type 2 diabetes and depression.
2- The most common medication used to treat participants as first line treatment with type 2 diabetes and depression diagnosis.
3- A trend or change over time of prescribing the first common medication over the study period
4- Trend overtime for the percentage of participants
Importantly, the detailed code developed herein is made available within the Researcher Workbench to researchers, so that they may more easily extract medication data and class information using a common medication ontology, an approach useful in many discovery studies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Duplicate of Demo - Medication Sequencing

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes, depression)
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

1- What are the main prescribed medication sequences that participants with type 2 diabetes and depression took over three years of treatment?
In this questions, we are extracting the anti-diabetes and anti-depressant medications used to to treated participants who have T2D and depression codes. We retrieved medications prescribed after the first diagnosis code for each disease. We represented the medications using their ATC 4th level.
2- What is the most common first anti-diabetic and anti-depressant that were prescribed for All of Us participants? We extracted the first medications prescribed to treat T2D and depression. We identified the most common first medication with the highest number of participants.
3- Is there a change in the percentages of participants who were prescribed first common medication, treated using one medication, treated only using one common medication between 2000-2018?

Scientific Approaches

In this project, we plan on using the medication sequencing developed at Columbia University and the OHDSI network as a means to characterize treatment pathways at scale. Further, we want to demonstrate implementation of these medication sequencing algorithms in the All of Us research dataset to show how the various sources of data contained within the program can be used to characterize treatment pathways at scale. We will perform separate medication sequence analyses for three different common, complex diseases: type 2 diabetes, depression
1- Data manipulation
Using python and BigQuery to:
A- Retrieve medication and their classes
B-Create the medications sequences

2- Visualization:
A- Creating sunburst to visualize the sequences
B- Plotting the percentages of participants the first common medication and one medication during three years

Anticipated Findings

For this study, we anticipate demonstrating the validity of the data by showing expected treatment patterns despite gathering data from over 30 individual EHR sites. Specifically, we expect to find:
1- Variation in the medication sequences prescribed to treat All of Us participants who had type 2 diabetes and depression.
2- The most common medication used to treat participants as first line treatment with type 2 diabetes and depression diagnosis.
3- A trend or change over time of prescribing the first common medication over the study period
4- Trend overtime for the percentage of participants
Importantly, the detailed code developed herein is made available within the Researcher Workbench to researchers, so that they may more easily extract medication data and class information using a common medication ontology, an approach useful in many discovery studies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Stanley Jia - Undergraduate Student, University of California, Irvine

Duplicate of Demo - Medication Sequencing

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes, depression)
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

1- What are the main prescribed medication sequences that participants with type 2 diabetes and depression took over three years of treatment?
In this questions, we are extracting the anti-diabetes and anti-depressant medications used to to treated participants who have T2D and depression codes. We retrieved medications prescribed after the first diagnosis code for each disease. We represented the medications using their ATC 4th level.
2- What is the most common first anti-diabetic and anti-depressant that were prescribed for All of Us participants? We extracted the first medications prescribed to treat T2D and depression. We identified the most common first medication with the highest number of participants.
3- Is there a change in the percentages of participants who were prescribed first common medication, treated using one medication, treated only using one common medication between 2000-2018?

Scientific Approaches

In this project, we plan on using the medication sequencing developed at Columbia University and the OHDSI network as a means to characterize treatment pathways at scale. Further, we want to demonstrate implementation of these medication sequencing algorithms in the All of Us research dataset to show how the various sources of data contained within the program can be used to characterize treatment pathways at scale. We will perform separate medication sequence analyses for three different common, complex diseases: type 2 diabetes, depression
1- Data manipulation
Using python and BigQuery to:
A- Retrieve medication and their classes
B-Create the medications sequences

2- Visualization:
A- Creating sunburst to visualize the sequences
B- Plotting the percentages of participants the first common medication and one medication during three years

Anticipated Findings

For this study, we anticipate demonstrating the validity of the data by showing expected treatment patterns despite gathering data from over 30 individual EHR sites. Specifically, we expect to find:
1- Variation in the medication sequences prescribed to treat All of Us participants who had type 2 diabetes and depression.
2- The most common medication used to treat participants as first line treatment with type 2 diabetes and depression diagnosis.
3- A trend or change over time of prescribing the first common medication over the study period
4- Trend overtime for the percentage of participants
Importantly, the detailed code developed herein is made available within the Researcher Workbench to researchers, so that they may more easily extract medication data and class information using a common medication ontology, an approach useful in many discovery studies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Duplicate of Demo - PheWAS Smoking

Project Purpose(s)

  • Methods Development
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

As a demonstration project, this study will present the results of Phenome-Wide Association Studies (PheWAS) to show how the various sources of data contained within All of Us research dataset can be used to inform scientific discovery. We will perform separate PheWAS studies with smoking status as the independent variable. Specific questions include:

1. How can one implement a PheWAS within the All of Us Researcher Workbench?
2. How can one use heterogeneous data sources within the All of Us dataset to explore disease associations using self-reported exposures (Participant Provided Information, or “PPI”) and exposures captured in the electronic medical record (EHR).

Scientific Approaches

As a method for assessing the health burden of smoking on potential observed phenotypes, we implement a Phenome-Wide Association study. A Phenome-wide association study consists of an array of association tests over an indexed representation of the human phenome. In this analysis, we will conduct PheWAS for EHR derived smoking and PPI derived smoking exposures included in the All of Us research dataset. We will be representing "Smoking Exposure” in three ways:
EHR Smoking ICD Billing Codes
Participant Provided Information (PPI) Smoking lifetime 100 cigarettes yes/no
Participant Provided Information (PPI) Smoking lifetime smoking everyday
To perform PheWAS, we will map ICD representations of disease to a common vocabulary of PheCodes. We then use Jupyter Notebooks to create reusable functions to perform PheWAS and generate Manhattan Plots to summarize associations.

Anticipated Findings

For this study, we anticipate that we will be able to replicate known disease associations with smoking exposure. This will serve to demonstrate the quality, utility, and diversity of the All of Us data and tools and the power of gathering multiple data sources for a single phenotype, providing researchers options for study design and validation. Importantly the entire pheWAS package is made available for reuse by researchers in the Workbench, for new hypothesis generation.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jie Chen - Late Career Tenured Researcher, Augusta University

Duplicate of Depression Cohort Demo

Project Purpose(s)

  • Other Purpose (Demo) ...

Scientific Questions Being Studied

Demo

Scientific Approaches

Demo

Anticipated Findings

Demo

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Bin Yang - Graduate Trainee, Columbia University

Duplicate of Disease_convergence_and_lifestyle

Project Purpose(s)

  • Population Health
  • Methods Development ...
  • Ancestry

Scientific Questions Being Studied

Multiple genetic polymorphisms have been identified for complex diseases, but relationships, such as the biological underpinning of genetic interactions, are still elusive. Epigenomic studies have shown that genetic variants may have convergent effects, which increase the risk of developing complex diseases and comorbidities. We aim to prioritize the genetic variants with convergent effects and diseases of excess epigenomic similarity from the abundant biological resources, such as ENCODE and GTEx. We will then study the agreement between the convergent effects and interactions of genetic variants in AllofUsRP and the agreement between disease epigenomic similarity and disease comorbidities in AllofUsRP. Lifestyle and environment exposures are critical risk factors, and their effects will be modeled as well. The research will help us understand disease mechanisms and missing heritability and foster applications like drug repositioning.

Scientific Approaches

We have developed an information-theoretical based similarity for quantifying the similarity of genetic variants and disease pairs from GTEx data. We have also developed a multi-omics integration method to quantify the overall similarity of genetic variants in ENCODE. We will extend the latter method to quantify the epigenomic similarity for disease pairs. We aim to use AllofUsRP for validating the genetic interactions between genetic variants and comorbidities. Further, we will use, logistic regression, LASSO, and deep learning methods to model diseases from lifestyles and genetic interactions.

Anticipated Findings

We expect to find many unexpected biological links between the effects of distinct genetic variants, which may explain the increased risk of diseases and comorbidities. With machine learning models, we will build disease prediction models, particularly those impacted heavily by lifestyles, such as cancers. The research will generate candidates for novel drug targets and drug repositioning approaches.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Haiquan Li - Early Career Tenure-track Researcher, University of Arizona

Duplicate of Epidemiology of PCOS

Project Purpose(s)

  • Disease Focused Research (Polycystic ovary syndrome)
  • Population Health ...

Scientific Questions Being Studied

Polycystic ovary syndrome (PCOS) is the most common endocrine disorder in women of reproductive age and one of the leading causes of infertility. Minority females with PCOS are more at risk of developing detrimental metabolic outcomes. Therefore, are scientific question is what PCOS risk factors differ by race and/or ethnicity?

Scientific Approaches

We will leverage the All of Us data to identify females with PCOS and characterize their risk factors using demographic data, ICD codes, lab values, and socioeconomic status.

Anticipated Findings

We anticipate that we will find phenotypic differences related to metabolic dysfunction between racially and ethnically diverse females with PCOS.

Demographic Categories of Interest

  • Race / Ethnicity
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Ky'Era Actkins - Graduate Trainee, Meharry Medical College

Duplicate of Exploring All of Us

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

This workspace will be used to get to know the tools and features of All of Us. We hope that by getting this experience, we can better help researchers at our institution who are using the Workbench for research.

Scientific Approaches

We are interested in understanding how to work with this data in R and Jupyter notebooks.

Anticipated Findings

As this is an exploration of All of Us and its features, there are no anticipated findings. However, by doing this exploration we may be better able to support researchers producing findings from their research.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Amy Yarnell - Other, University of Maryland, Baltimore

Collaborators:

  • Jean-Paul Courneya - Other, University of Maryland, Baltimore

Duplicate of Genetics_of_comorbidity

Project Purpose(s)

  • Methods Development
  • Ancestry ...
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)

Scientific Questions Being Studied

As a demonstration project, this study shows the reproducibility and utility of comorbidities as candidate phenotypes for mining datasets derived from clinical practice. Diseases annotations extracted from billing data have been shown by Electronic Medical Records and Genomics Network to be unreliable phenotypes for GWAS studies. We propose that co-occurrences of specific diseases together can be designed and validated with accuracy and reliability in high throughput, as they represent known and sometimes novel clinical syndromes. Specific questions:

Can we reproduce comorbidities discovered from external datasets (e.g., Healthcare Cost and Utilization Project - HCUP) through statistical regression in the All of Us dataset, while controlling potentially confounders such as gender, age, race and ethnicity?

Can we reproduce candidate medical genetics syndromes derived from mining of GWAS jointly with other biomolecular datasets and previously confirmed as comorbid in clinical datasets?

Scientific Approaches

As a method to assess comorbidities, we will implement previously validated analyses to reliably assign the presence of specific diseases to a subject and to calculate their excess co-occurrence in the dataset with statistical significance and effect sizes. We will:

1. Utilize the OMOP code corresponding to the SNOMED-coded mapping that we curated from 262 GWAS.
2. Query the All of Us dataset to identify subjects with any of these diseases.
3. Use our published logistic regression (controlled for age, gender, and race) to calculate the effect size and significance of each pair of disease (comorbidities).
4. The significant comorbidities discovered in the All of Us dataset will be compared and contrasted with those observed as reproducible in HCUP dataset. A subset of comorbidities that we previously confirmed with converging molecular genetics from trans-EQTL patterns across chromosomes are also studied as candidate medical genetic syndromes among comorbid syndromes.

Anticipated Findings

For this study, we anticipate that we will confirm the reliability of the All-of-Us clinical practice dataset to reliably recapitulate comorbidities that were reproducibly observed in independent datasets. These "confirmed comorbidities" will serve as reliable phenotypes for future studies, such as calculating a Phenome-wide association study or computing a Genome-wide association study to demonstrate the genetic underpinning of new clinical syndromes consisting of more than one disease (e.g., the metabolic syndrome).
Importantly, the curated phenotypes and the tested logistic regression are made available for reuse by researchers in the Workbench, for new hypothesis generation.

Findings will be disseminated by the following: through scientific journals, GitHub and the workbench

Outcomes anticipated from the research: a method for querying reliably ~400 to 500 reproducible comorbidities

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth

Research Team

Owner:

  • Jianrong Li - Project Personnel, University of Arizona

Duplicate of How to Work with All of Us Physical Measurements Data

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

N/A

Anticipated Findings

N/A

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jie Chen - Late Career Tenured Researcher, Augusta University

Duplicate of How to Work with All of Us Physical Measurements Data

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

N/A

Anticipated Findings

N/A

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Harry Hochheiser - Mid-career Tenured Researcher, University of Pittsburgh

Duplicate of How to Work with All of Us Physical Measurements Data

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

The purpose of this workspace is to get familiar with the available data as well as its structure in the All of Us cohort. There are no active research questions being pursued at this stage but experience obtained through this workspace will help enable us to formulate plans as to how best utilize this rich dataset to answers scientific questions.

Scientific Approaches

I am using R programming language within Jupyter notebook provided in the workbench to better understand the available data elements as well as their structure.

Anticipated Findings

The only anticipated finding from this workspace is more experience with the All of Us cohort dataset as well as its workbench which will enable us to be better equipped to carry out future studies using data collected from All of Us participants.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Ozan Dikilitas - Research Fellow, Mayo Clinic

Duplicate of How to Work with All of Us Physical Measurements Data

Project Purpose(s)

  • Educational
  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

N/A

Anticipated Findings

N/A

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • James Geller - Late Career Tenured Researcher, New Jersey Institute of Technology

Duplicate of How to Backup Notebooks and Intermediate Results

Project Purpose(s)

  • Other Purpose (Demonstrate to workbench users how to create snapshots of notebooks and backups of intermediate results stored in other files such as plot images and derived data.) ...

Scientific Questions Being Studied

Not applicable - these utility notebooks do not perform any analyses.

Scientific Approaches

Not available.

Anticipated Findings

Not applicable - these utility notebooks do not perform any analyses.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jie Chen - Late Career Tenured Researcher, Augusta University

Duplicate of How to Get Started with Registered Tier Data

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Tutorial Workspace. It is meant to provide instruction for key Researcher Workbench components and All of Us data representation.)

Scientific Questions Being Studied

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data.

What should you expect? This notebook will give you an overview of what data is available in the current Curated Data Repository (CDR). It will also teach you how to retrieve information about Electronic Health Record (EHR), Physical Measurements (PM), and Survey data.

Scientific Approaches

This Tutorial Workspace contains two Jupyter Notebooks (one written in Python, the other in R). Each notebook is divided into the following sections:

1. Setup: How to set up this notebook, install and import software packages, and select the correct version of the CDR.
2. Data Availability Part 1: How to summarize the number of unique participants with major data types: Physical Measurements, Survey, and EHR;
3. Data Availability Part 2: How to delve a little deeper into data availability within each major data type;
4. Data Organization: An explanation of how data is organized according to our common data model.
5. Example Queries: How to directly query the CDR, using two examples of SQL queries to extract demographic data.
6. Expert Tip: How to access the base version of the CDR, for users that want to do their own cleaning.

Anticipated Findings

By reading and running the notebooks in this Tutorial Workspace, you will understand the following:

All of Us data are made available in a Curated Data Repository. Participants may contribute any combination of survey, physical measurement, and electronic health record data. Not all participants contribute all possible data types. Each unique piece of health information is given a unique identifier called a concept_id and organized into specific tables according to our common data model. You can use these concept_ids to query the CDR and pull data on specific health information relevant to your analysis. See our support article Learning the Basics of the All of Us Dataset for more info.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

Duplicate of How to Get Started with Registered Tier Data

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Tutorial Workspace. It is meant to provide instruction for key Researcher Workbench components and All of Us data representation.)

Scientific Questions Being Studied

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data.

What should you expect? This notebook will give you an overview of what data is available in the current Curated Data Repository (CDR). It will also teach you how to retrieve information about Electronic Health Record (EHR), Physical Measurements (PM), and Survey data.

Scientific Approaches

This Tutorial Workspace contains two Jupyter Notebooks (one written in Python, the other in R). Each notebook is divided into the following sections:

1. Setup: How to set up this notebook, install and import software packages, and select the correct version of the CDR.
2. Data Availability Part 1: How to summarize the number of unique participants with major data types: Physical Measurements, Survey, and EHR;
3. Data Availability Part 2: How to delve a little deeper into data availability within each major data type;
4. Data Organization: An explanation of how data is organized according to our common data model.
5. Example Queries: How to directly query the CDR, using two examples of SQL queries to extract demographic data.
6. Expert Tip: How to access the base version of the CDR, for users that want to do their own cleaning.

Anticipated Findings

By reading and running the notebooks in this Tutorial Workspace, you will understand the following:

All of Us data are made available in a Curated Data Repository. Participants may contribute any combination of survey, physical measurement, and electronic health record data. Not all participants contribute all possible data types. Each unique piece of health information is given a unique identifier called a concept_id and organized into specific tables according to our common data model. You can use these concept_ids to query the CDR and pull data on specific health information relevant to your analysis. See our support article Learning the Basics of the All of Us Dataset for more info.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Janis Geary - Research Fellow, Arizona State University

Duplicate of How to Get Started with Registered Tier Data

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Tutorial Workspace. It is meant to provide instruction for key Researcher Workbench components and All of Us data representation.)

Scientific Questions Being Studied

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data.

What should you expect? This notebook will give you an overview of what data is available in the current Curated Data Repository (CDR). It will also teach you how to retrieve information about Electronic Health Record (EHR), Physical Measurements (PM), and Survey data.

Scientific Approaches

This Tutorial Workspace contains two Jupyter Notebooks (one written in Python, the other in R). Each notebook is divided into the following sections:

1. Setup: How to set up this notebook, install and import software packages, and select the correct version of the CDR.
2. Data Availability Part 1: How to summarize the number of unique participants with major data types: Physical Measurements, Survey, and EHR;
3. Data Availability Part 2: How to delve a little deeper into data availability within each major data type;
4. Data Organization: An explanation of how data is organized according to our common data model.
5. Example Queries: How to directly query the CDR, using two examples of SQL queries to extract demographic data.
6. Expert Tip: How to access the base version of the CDR, for users that want to do their own cleaning.

Anticipated Findings

By reading and running the notebooks in this Tutorial Workspace, you will understand the following:

All of Us data are made available in a Curated Data Repository. Participants may contribute any combination of survey, physical measurement, and electronic health record data. Not all participants contribute all possible data types. Each unique piece of health information is given a unique identifier called a concept_id and organized into specific tables according to our common data model. You can use these concept_ids to query the CDR and pull data on specific health information relevant to your analysis. See our support article Learning the Basics of the All of Us Dataset for more info.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Laura Goetz - Early Career Tenure-track Researcher, Translational Genomics Research Institute

Duplicate of How to Work with All of Us Survey Data

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Tutorial Workspace created by the Researcher Workbench Support team. It is meant to provide instruction for key Researcher Workbench components and All of Us data representation.)

Scientific Questions Being Studied

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data.

What should you expect?
By running the notebooks in this workspace, you should get familiar with how to query PPI questions/surveys, what the frequencies of answers for each question in each PPI module are.

Scientific Approaches

Not available.

Anticipated Findings

By reading and running the notebooks in this Tutorial Workspace, researchers will learn the following:
- how to query the survey data,
- how to summarize PPI modules, and questions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jie Chen - Late Career Tenured Researcher, Augusta University

Duplicate of Hypertension (version 4)

Project Purpose(s)

  • Population Health
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use) ...

Scientific Questions Being Studied

Uncontrolled hypertension is a primary contributor to coronary heart disease, stroke, and heart failure. Hypertension can be treated successfully in many cases with medication and prevented or delayed with lifestyle modifications. Even with this success, the prevalence of hypertension continues to be at levels of public health concern, and its control in the United States is far below what is possible. In this demonstration project, we focus on the prevalence of hypertension and its awareness, treatment, and control in a large and diverse participant sample of the All of Us Research Program. Specific questions include:
1) What is the prevalence of hypertension among participants in the All of Us Research Program?
2) Among hypertensive participants, what is the prevalence of awareness, treatment, and control?
3) How do these estimates compare to the general US population assessed in the National Health and Nutrition Examination Survey (NHANES), 2015-2016?

Scientific Approaches

This descriptive analysis is based on blood pressure measurements from the participants’ physical measurement evaluations, and data derived from participant provided information (PPI) and electronic health records (EHR).
1) Demographic factors such as age, sex, race/ethnicity, educational attainment, income and health insurance were assessed in the PPI questionnaire.
2) PPI questionnaire data was also used to define self-reported doctor diagnosis of hypertension and self-reported hypertension medication use.
3) EHR evidence of hypertension diagnosis was defined as the presence of ICD9/ICD10 codes corresponding to hypertension any time before baseline.
4) EHR evidence of hypertension medication use was defined as at least one drug exposure to hypertension medications any time before baseline.

Anticipated Findings

For this study, we anticipate that the prevalence, awareness, treatment, and control of hypertension will be different across demographic strata. This will help to identify health disparities and improve health equity in vulnerable populations. We also anticipate that estimates will be different between the All of Us Research Program and the general US population assessed in NHANES 2015-2016. Understanding these differences will help to characterize potential selection bias and demonstrate the quality and utility of the All of Us data and tools.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Maria Argos - Mid-career Tenured Researcher, University of Illinois at Chicago

Duplicate of Mental Health Demonstration Project V3

Project Purpose(s)

  • Disease Focused Research (generalized anxiety disorder, depressive disorder, bipolar disorder)
  • Other Purpose (“This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use”.) ...

Scientific Questions Being Studied

As a demonstration project, this study aimed to explore the usability of the All of Us dataset and examined the prevalence of mental health conditions in the All of Us Research Program cohort. Specifically, we explored the lifetime prevalence of depressive disorder, bipolar disorder, and generalized anxiety disorder.

Our study looked prevalence rates for the above conditions in the following ways:
1. Prevalence in EHR data available by various demographic factors
2. Cohort characteristics
3. Congruency for diagnoses in EHR and self-report questionnaire
4. Among individuals who self-report as having been diagnosed with a mental health condition listed above, the percentage of individuals in treatment and associations between treatment and various demographic factors

Scientific Approaches

In this analysis, we calculated prevalence of mental health conditions by leveraging demographic information, questionnaire responses, and EHR data Specifically, we utilized the following surveys: Basics, Overall Health, Personal Medical History, and Healthcare Access PPIs. We utilized EHR data by creating a cohort of individuals with specific diagnoses code in their EHR. We referenced all relevant parent and child SNOMED codes for each mental health condition of the investigation (documented in Concept Set). Associations were calculated using Chi Square.

Anticipated Findings

We anticipated that the prevalence rates found in All of Us will be consistent with previous large scale studies, such as the National Comorbidity Survey. We found that the All of Us dataset is sensitive to detecting mood disorders and is usable for examining mental health conditions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Kai Yin Ho - Project Personnel, Northwestern University

Duplicate of Mental Health Demonstration Project V4

Project Purpose(s)

  • Disease Focused Research (generalized anxiety disorder, depressive disorder, bipolar disorder)
  • Other Purpose (“This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use”.) ...

Scientific Questions Being Studied

As a demonstration project, this study aimed to explore the usability of the All of Us dataset and examined the prevalence of mental health conditions in the All of Us Research Program cohort. Specifically, we explored the lifetime prevalence of depressive disorder, bipolar disorder, and generalized anxiety disorder.

Our study looked prevalence rates for the above conditions in the following ways:
1. Prevalence in EHR data available by various demographic factors
2. Cohort characteristics
3. Congruency for diagnoses in EHR and self-report questionnaire
4. Among individuals who self-report as having been diagnosed with a mental health condition listed above, the percentage of individuals in treatment and associations between treatment and various demographic factors

Scientific Approaches

In this analysis, we calculated prevalence of mental health conditions by leveraging demographic information, questionnaire responses, and EHR data Specifically, we utilized the following surveys: Basics, Overall Health, Personal Medical History, and Healthcare Access PPIs. We utilized EHR data by creating a cohort of individuals with specific diagnoses code in their EHR. We referenced all relevant parent and child SNOMED codes for each mental health condition of the investigation (documented in Concept Set). Associations were calculated using Chi Square.

Anticipated Findings

We anticipated that the prevalence rates found in All of Us will be consistent with previous large scale studies, such as the National Comorbidity Survey. We found that the All of Us dataset is sensitive to detecting mood disorders and is usable for examining mental health conditions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Kai Yin Ho - Project Personnel, Northwestern University

Duplicate of NOVA

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

Historically, dietary quality has been assessed via singular nutrient categories. Recently, there has been more attention turned to ultra-processed food (UPF) from the NOVA groups as major factor in multiple non communicable disease. The many facets include calorie dense and inexpensive convenience items.
However, limited centralized data is available related to industrial ingredients' impacts/interaction on human health. From maternal nutrition throughout the life-cycle, this is a wildly underresearched dietary quality lens. This may be due to limited consensus and lack of collaboration or awareness on nonnutrition fields. Utilizing recently converted code from Stata to Python, I would like to investigate if any statistically significant patterns are observed when NOVA is appled to All of Us data sets.

Scientific Approaches

Datasets include various populations, descriptive statistics and dietary assessment data. I developed NOVA coding with major input from the creators of NOVA for interoperability with Python.

Anticipated Findings

Examining populations related to race, ethnicity, age, geographical location and other comobidities will reveal phenotypic differences between high UPF consumers and low UPF consumers. High and low categories to be defined per cohort. Results from the exploratory analyses may inform future precision nutrition interventions and policy related to food industry marketing to children.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Kathryn Whyte - Research Fellow, Columbia University

Duplicate of Phenotype - Breast Cancer

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Questions Being Studied

The Notebooks in this Workspace can be used to implement well-known phenotype algorithms in one’s own research.

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

Ning Shang, George Hripcsak, Chunhua Weng, Wendy K. Chung, & Katherine Crew. Breast Cancer. Retrieved from https://phekb.org/phenotype/breast-cancer.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jie Chen - Late Career Tenured Researcher, Augusta University

Duplicate of Phenotype - Depression

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Questions Being Studied

The Notebooks in this Workspace can be used to implement well-known phenotype algorithms in one’s own research.

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

This Workspace contains an implementation of a phenotype algorithm for depression: This algorithm was obtained from the eMERGE network. Citation: TBA. KPWA/UW. Depression. PheKB; 2018 Available from: https://phekb.org/phenotype/1095

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • David Schlueter - Research Fellow, NIH

Duplicate of Phenotype - Ischemic Heart Disease

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Questions Being Studied

The Notebooks in this workspace can be used to implement well-known phenotype algorithms in one’s own research.

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

Christianne L. Roumie; Jana Shirey-Rice, Sunil Kripalani. Vanderbilt University. MidSouth CDRN - Coronary Heart Disease Algorithm. PheKB; 2014. Available from https://phekb.org/phenotype/234

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Xin Wang - Project Personnel, Massachusetts General Hospital

Duplicate of Phenotype - Ischemic Heart Disease

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Questions Being Studied

The Notebooks in this workspace can be used to implement well-known phenotype algorithms in one’s own research.

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

Christianne L. Roumie; Jana Shirey-Rice, Sunil Kripalani. Vanderbilt University. MidSouth CDRN - Coronary Heart Disease Algorithm. PheKB; 2014. Available from https://phekb.org/phenotype/234

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Henry Zheng - Graduate Trainee, University of California, Los Angeles

Duplicate of Phenotype - Ischemic Heart Disease

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Questions Being Studied

The Notebooks in this workspace can be used to implement well-known phenotype algorithms in one’s own research.

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

Christianne L. Roumie; Jana Shirey-Rice, Sunil Kripalani. Vanderbilt University. MidSouth CDRN - Coronary Heart Disease Algorithm. PheKB; 2014. Available from https://phekb.org/phenotype/234

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Romit Bhattacharya - Research Fellow, The Broad Institute

Collaborators:

  • Sarah Urbut - Research Fellow, The Broad Institute

Duplicate of Phenotype - Type 2 Diabetes

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Questions Being Studied

The Notebooks in this Workspace can be used to implement well-known phenotype algorithms in one’s own research.

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

Jennifer Pacheco and Will Thompson. Northwestern University. Type 2 Diabetes Mellitus. PheKB; 2012 Available from: https://phekb.org/phenotype/18

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Xin Wang - Project Personnel, Massachusetts General Hospital

Duplicate of Phenotype - Type 2 Diabetes

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Questions Being Studied

The Notebooks in this Workspace can be used to implement well-known phenotype algorithms in one’s own research.

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

Jennifer Pacheco and Will Thompson. Northwestern University. Type 2 Diabetes Mellitus. PheKB; 2012 Available from: https://phekb.org/phenotype/18

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Nayyar Ahmed - Other, University of Pittsburgh

Duplicate of R2019Q4R3 - Phenotype - Depression

Project Purpose(s)

  • Educational
  • Methods Development ...
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Questions Being Studied

The Notebooks in this Workspace can be used to implement well-known phenotype algorithms in one’s own research.

Scientific Approaches

Not Applicable

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

This Workspace contains an implementation of a phenotype algorithm for depression: This algorithm was obtained from the eMERGE network. Citation: TBA. KPWA/UW. Depression. PheKB; 2018 Available from: https://phekb.org/phenotype/1095

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Omar Costilla Reyes - Research Fellow, Massachusetts Institute of Technology

Duplicate of Self-reported fractures in RA

Project Purpose(s)

  • Disease Focused Research (rheumatoid arthritis) ...

Scientific Questions Being Studied

We seek to determine the accuracy of self-reported fractures in both men and women with rheumatoid arthritis. We will describe variations in the validity of self-reported fracture according to sex, race, socioeconomic factors, and other clinical factors. Fragility fractures are a significant cause of morbidity and mortality in patients with rheumatoid arthritis. Self-report of fractures is often used to identify fracture outcomes in large-scale studies of fracture prevention or treatment; hence, research findings related to fracture are predicated on the accuracy of the self-report of fractures. In the general population, there is evidence of variation in accuracy of self-reported fractures by gender, race, age, and clinical characteristics such as smoking status and BMI. However, to the best of our knowledge, there are no prior studies evaluating the validity of self-reported fractures in rheumatoid arthritis patients, a particularly vulnerable population for fractures.

Scientific Approaches

For all persons with rheumatoid arthritis and a self-report of fracture in the last 5 years from the enrollment survey, their EMR data will be searched by querying for all ICD-9 and ICD-10 codes correlating to new incident fracture in the 5 years prior to survey completion. Descriptive analyses of baseline characteristics (age, race, annual household income, clinical characteristics) will be presented for all persons with self-reported fracture, stratified by whether that fracture was confirmed or unconfirmed in the EMR. To assess the accuracy of self-reported fractures, confirmation rates, overall and according to covariate levels, will be calculated as the proportion of self-reported fractures with a fracture confirmed at any site. False positive rates and positive predictive values were similarly computed. Logistic regression models will be used to compute odds ratio (OR) and 95% confidence interval (95% CI) for predictors of unconfirmed fractures (i.e. false positives).

Anticipated Findings

In previous studies of self-report fracture validation in the general population, the rate of false positives ranges widely from 5-50% depending on the fracture site. We anticipate there will be lower rates of false positive self-reports of fracture in rheumatoid arthritis patients because they are being followed more closely for musculoskeletal complaints. While men, black women, and lower educational attainment among other clinical characteristics have been associated with increased false positive rates of self-reported fractures in the general population, it is unclear if this will translate to the rheumatoid arthritis patient population. Our study will be the first to characterize validity of self-reported fractures specifically in rheumatoid arthritis patients, a particularly vulnerable population for fracture.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Deepak Nag Ayyala - Early Career Tenure-track Researcher, Augusta University

Duplicate of Self-reported fractures in RA

Project Purpose(s)

  • Disease Focused Research (rheumatoid arthritis) ...

Scientific Questions Being Studied

We seek to determine the accuracy of self-reported fractures in both men and women with rheumatoid arthritis. We will describe variations in the validity of self-reported fracture according to sex, race, socioeconomic factors, and other clinical factors. Fragility fractures are a significant cause of morbidity and mortality in patients with rheumatoid arthritis. Self-report of fractures is often used to identify fracture outcomes in large-scale studies of fracture prevention or treatment; hence, research findings related to fracture are predicated on the accuracy of the self-report of fractures. In the general population, there is evidence of variation in accuracy of self-reported fractures by gender, race, age, and clinical characteristics such as smoking status and BMI. However, to the best of our knowledge, there are no prior studies evaluating the validity of self-reported fractures in rheumatoid arthritis patients, a particularly vulnerable population for fractures.

Scientific Approaches

For all persons with rheumatoid arthritis and a self-report of fracture in the last 5 years from the enrollment survey, their EMR data will be searched by querying for all ICD-9 and ICD-10 codes correlating to new incident fracture in the 5 years prior to survey completion. Descriptive analyses of baseline characteristics (age, race, annual household income, clinical characteristics) will be presented for all persons with self-reported fracture, stratified by whether that fracture was confirmed or unconfirmed in the EMR. To assess the accuracy of self-reported fractures, confirmation rates, overall and according to covariate levels, will be calculated as the proportion of self-reported fractures with a fracture confirmed at any site. False positive rates and positive predictive values were similarly computed. Logistic regression models will be used to compute odds ratio (OR) and 95% confidence interval (95% CI) for predictors of unconfirmed fractures (i.e. false positives).

Anticipated Findings

In previous studies of self-report fracture validation in the general population, the rate of false positives ranges widely from 5-50% depending on the fracture site. We anticipate there will be lower rates of false positive self-reports of fracture in rheumatoid arthritis patients because they are being followed more closely for musculoskeletal complaints. While men, black women, and lower educational attainment among other clinical characteristics have been associated with increased false positive rates of self-reported fractures in the general population, it is unclear if this will translate to the rheumatoid arthritis patient population. Our study will be the first to characterize validity of self-reported fractures specifically in rheumatoid arthritis patients, a particularly vulnerable population for fracture.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Deepak Nag Ayyala - Early Career Tenure-track Researcher, Augusta University

Duplicate of Spectrum of tumors after breast cancer

Project Purpose(s)

  • Disease Focused Research (breast cancer)
  • Population Health ...
  • Other Purpose (Interested to know what secondary or recurrent tumors developed after breast cancer and are there any demographic/behavioral/medications/medical conditions that modify/influence the risk of secondary tumors (recurrent/new tumors))

Scientific Questions Being Studied

Interested to know what secondary or recurrent tumors developed after breast cancer and are there any demographic/behavioral/medications/medical conditions that modify/influence the risk of secondary tumors (recurrent/new tumors)

Scientific Approaches

Plan to
Create a cohort of all participants diagnosed with breast cancer
This is case only study
For the above cohort collect all tumors that happened after diagnosis of breast cancer
Use cox model in order to determine the risk of subsequent tumors (outcome will be time to the second tumor)
Since the range of second/recurrent tumors is large we propose to look first at the spectrum of tumors and then do analyses for specific second tumors like a recurrence
Need to examine what are the factors associated with the second tumor

Anticipated Findings

We anticipate to determine guidelines for risk of second tumors after diagnosis of breast cancer

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Argyrios Ziogas - Late Career Tenured Researcher, University of California, Irvine

Duplicate of Test Workspace 2

Project Purpose(s)

  • Control Set ...

Scientific Questions Being Studied

Test

Scientific Approaches

Not available.

Anticipated Findings

Test

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Eric Song - Administrator, All of Us Program Operational Use

Duplicate of Uncovering disease factors related to NF1, NF2 and Schwannomatosis

Project Purpose(s)

  • Disease Focused Research (neurofibromatosis type 1, neurofibromatosis type 2, schwannomatosis)
  • Drug Development ...
  • Methods Development
  • Ancestry

Scientific Questions Being Studied

We intend to study the relationship between genetic factors, health records, and the symptoms of neurofibromatosis type 1, type 2, and schwannomatosis (NF),. At the present, we are exploring the data to formalize a specific research question. We hope to identify specific predictive biomarkers or therapies for NF.

Scientific Approaches

We intend to use health record, survey, and (when available) genomic data from participants with NF and to analyze these datasets using statistical modeling and machine learning approaches for categorical and continuous data.

Anticipated Findings

We hope to and anticipate this study will increase our understanding of neurofibromatosis type 1, type 2, and schwannomatosis. Our findings would contribute to the body of scientific knowledge by revealing new biological causes for the symptoms associated with these disease, which may lead to new ways to treat these symptoms.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Robert Allaway - Senior Researcher, Sage Bionetworks

Duplicate of Working with All of Us Physical Measurements Data v1

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

Not available.

Anticipated Findings

N/A

Demographic Categories of Interest

  • Sex at Birth
  • Geography

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Cheryl Clark

Duplicate of Working with All of Us Physical Measurements Data v1

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

How to navigate around physical measurements?

Scientific Approaches

Not available.

Anticipated Findings

N/A

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Duplicate version of Demo - PheWAS Smoking for learning

Project Purpose(s)

  • Methods Development
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

As a demonstration project, this study will present the results of Phenome-Wide Association Studies (PheWAS) to show how the various sources of data contained within All of Us research dataset can be used to inform scientific discovery. We will perform separate PheWAS studies with smoking status as the independent variable. Specific questions include:

1. How can one implement a PheWAS within the All of Us Researcher Workbench?
2. How can one use heterogeneous data sources within the All of Us dataset to explore disease associations using self-reported exposures (Participant Provided Information, or “PPI”) and exposures captured in the electronic medical record (EHR).

Scientific Approaches

As a method for assessing the health burden of smoking on potential observed phenotypes, we implement a Phenome-Wide Association study. A Phenome-wide association study consists of an array of association tests over an indexed representation of the human phenome. In this analysis, we will conduct PheWAS for EHR derived smoking and PPI derived smoking exposures included in the All of Us research dataset. We will be representing "Smoking Exposure” in three ways:
EHR Smoking ICD Billing Codes
Participant Provided Information (PPI) Smoking lifetime 100 cigarettes yes/no
Participant Provided Information (PPI) Smoking lifetime smoking everyday
To perform PheWAS, we will map ICD representations of disease to a common vocabulary of PheCodes. We then use Jupyter Notebooks to create reusable functions to perform PheWAS and generate Manhattan Plots to summarize associations.

Anticipated Findings

For this study, we anticipate that we will be able to replicate known disease associations with smoking exposure. This will serve to demonstrate the quality, utility, and diversity of the All of Us data and tools and the power of gathering multiple data sources for a single phenotype, providing researchers options for study design and validation. Importantly the entire pheWAS package is made available for reuse by researchers in the Workbench, for new hypothesis generation.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • jie na - Project Personnel, Mayo Clinic

Collaborators:

  • Guoqian Jiang - Mid-career Tenured Researcher, Mayo Clinic

Duplicate_for_DRC_obesity_code_review

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Educational ...
  • Methods Development

Scientific Questions Being Studied

National obesity prevention and intervention strategies may benefit from precision medicine approaches that incorporate integrated data on environments, social determinants of health, and genomic factors. We examined the quality and utility of the All of Us Research Hub Workbench for accelerating precision medicine by replicating methods from existing studies that examine the prevalence of obesity at the population level. We evaluated the measurements of obesity in the participant measurement (PM) data set and the electronic health record (EHR) data set using methods similar to the Ward et al. NEJM December 2019 publication that assessed prevalence of obesity in the US by state using BRFSS data.

Scientific Approaches

For this population-based cross-sectional study of All of Us Research Workbench participants, we excluded individuals with measurements obtained during pregnancy or inpatient visits and individuals from states with fewer than 100 participants. Physical measurements (PM) of height and weight at the time of program enrollment of 142,116 participants and measured weight and height extracted from electronic health records (EHR) of 40,885 individuals were used to calculate body-mass index (BMI). We did a complete case analysis for All of Us participants with known sex (male or female), race, income and education levels and estimated state-specific and demographic subgroup-specific prevalence of categories of BMI [obesity (BMI ≥30) and extreme obesity (BMI ≥ 35)] nationwide and for each state: overall and by subgroups, male and female. We examined the difference between EHR and PM calculated BMI by state.

Anticipated Findings

Using states with at least 100 participants, PM data included 142,116 individuals (mean [SD] age, 51.2 [16.6] and EHR data on height and weight included 40,885 individuals (mean [SD] age, 52.5 [16.5]. The median BMI for PM participants was 28.4 [24.4 to 33.7]; the median BMI for EHR was 29.0 [24.8 to 34.5]. The PM national prevalence for obesity (includes BMI>30 and BMI >35) and extreme obesity (BMI >35) were 41.2 % (95% Confidence Interval [CI], 40.9 to 41.4) and 20.8% (95% CI, 20.6 to 21.0), respectively, with large variations across states. Women had higher prevalence of extreme obesity than men in all selected states. Subgroups with extreme obesity (BMI, >35) prevalence greater than 25% included subgroup, N, prevalence %, (95% CI): Black NH, 8913, 28.9 (25.8 to 32.0) , individuals with income less than $25,000, 13,244, 25.1 (22.1 to 28.1); education of high school to some college, 17, 272, 26.1 (23.1 to 29.1) and the region of the South, 6,639, 25.3 (22.3 to 28.3).

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

earlyonsetcolorectcalcancer

Project Purpose(s)

  • Disease Focused Research (colorectal cancer) ...

Scientific Questions Being Studied

Examine the demographic ,geographic, inflammatory biomarker differences of early onset versus late-onset colorectal cancer to determine potential biomarkers for identification of individuals at increased risk for colorectal cancer that may benefit from early screening.

Scientific Approaches

Compare individuals with and without colorectal cancer overall in the All of Us cohort by demographics, geography, and biomarkers associated with increased risk of CRC (ESR, triglycerides, BMI, systolic blood pressure, waist circumference, ApoB100, hemoglobin A1C) . Look at biomarkers at least 2 years prior to year of diagnosis.

Anticipated Findings

Identify biomarkers that may guide future research into the biology of early onset colorectal cancer.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Effect of Pyridoxine in Type 2 Diabetics V4

Project Purpose(s)

  • Disease Focused Research (Diabetes mellitus) ...

Scientific Questions Being Studied

The aim of this study is to investigate if Pyridoxine use can benefit diabetics in preventing long term complications by inhibiting formation of activated glycation end products and improving clinical outcomes.

Diabetes and hyperglycemia are affecting over 415 million people worldwide, and by 2040 the number is expected to increase to 642 million. Chronic hyperglycemia results in the glycation of proteins and other biomolecules resulting in generation of AGEs. Glycation can be identified as the core reason for diabetes associated disorders. The interaction of AGEs with their receptor elicits oxidative stress and as a result evokes proliferative, inflammatory, thrombotic and fibrotic reactions in a variety of cells. Therefore, inhibiting the glycation process might be an effective way to prevent the complications of chronic hyperglycemia.

Scientific Approaches

Dataset: type 2 DM patients
Inclusion Criteria: Diabetics not on insulin, Age > 18 and < 70. A1c >6.5%
Exclusion Criteria: History of uncontrolled DM with A1c >8.5, hemoglobinopathy, sickle cell disease, thalassemia, anemia(iron def, pernicious anemia, B12 def., Folate def.) blood transfusion in the last 9 months, coagulopathy, blood thinner treatment, treatment with B6/B12/folate/iron in the last 3 months,treatment for TB or INH treatment, asplenia, pregnant or planning pregnancy in the next 6 months.

Research Method:
Blood and urine labs such as HGB A1c, HGB/HCT, fructosamine, fasting lipids, microalbuminuria, 24-hour creatinine/ protein, reticulocyte count and glycated albumin will be analysed.
Lab parameters and outcomes with patients on pyridoxine 100 mg po daily will be compared with subjects not on pyridoxine.

Anticipated Findings

1. Pyridoxine can decrease HbA1c in Type 2 diabetics
2. Pyridoxine can decrease Glycated albumin, Glycomark, microalbuminuria

If pyridoxine can reduce AGE without the risk of hypoglycemia and without other side effects , Pyridoxine should be used in diabetic patients. This will save diabetic patients from the known complications of hyperglycemia without the side effect of anti-diabetic medication.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Bijun Kannadath - Early Career Tenure-track Researcher, University of Arizona

Collaborators:

  • Jiali Ling - Project Personnel, University of Arizona

EHRs and Drug Exposures

Project Purpose(s)

  • Population Health
  • Social / Behavioral ...

Scientific Questions Being Studied

We intend to study the interaction of drug exposures (especially controlled substances like oxycodone and fentanyl) with other prescription drug related covariates. We are interested in the interaction among these and other prescription drugs. We believe that this data is potentially rich source of text and other covariate data to study this question.

Scientific Approaches

We will be using statistical machine learning to model the data and perform inference on the parameter of these models to infer scientific trends and relationships. In particular, we will fit multinomial logistic regression to predict prescription drug usage from other covariates in the data. Interpretation of the model parameters should inform us about whether these relationships are statistically significant or not.

Anticipated Findings

We hope to build on the knowledge that may be used by governmental and non-governmental organizations to help build accurate knowledge of factors and potential prescription drug interactions that and their related effects. We hope this information will help inform the relevant parties and help more equitably focus limited resources to the affected groups and problems.

Demographic Categories of Interest

  • Geography

Research Team

Owner:

  • Greg Hunt - Early Career Tenure-track Researcher, College of William and Mary

End of Life Prediction

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

We plan to test the compatibility of the omop-learn library with the All of Us dataset. This library was developed by the Clinical Machine Learning Group at MIT, and facilitates rapid prototyping in Python of predictive machine-learning models using longitudinal medical data from an OMOP CDM-standard database. This test is important to verify that other researchers will be able to use the omop-learn library to run their own predictive tasks on this particular dataset, as well as other OMOP datasets.

Scientific Approaches

We plan to test the library with the task of predicting mortality over a six-month window for patients over the age of 70. Using the omop-learn library, we will choose a cohort based on their age and enrollment during the training and outcome windows. Then, we will build a sparse feature matrix of drug, condition, procedure, and specialty features. Using this dataset, we will use two methods to predict outcomes: a simple logistic regression model, and the SARD model, which is a model deep-learning algorithm.

Anticipated Findings

We anticipate that omop-learn will be compatible with the All of Us dataset. Our findings here will allow us to identify and resolve any compatibility issues we encounter, which will enable researchers to use omop-learn for their own predictive modeling tasks.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • justin Lim - Graduate Trainee, Massachusetts Institute of Technology

Epidemiology of PCOS

Project Purpose(s)

  • Disease Focused Research (Polycystic ovary syndrome)
  • Population Health ...

Scientific Questions Being Studied

Polycystic ovary syndrome (PCOS) is the most common endocrine disorder in women of reproductive age and one of the leading causes of infertility. Minority females with PCOS are more at risk of developing detrimental metabolic outcomes. Therefore, are scientific question is what PCOS risk factors differ by race and/or ethnicity?

Scientific Approaches

We will leverage the All of Us data to identify females with PCOS and characterize their risk factors using demographic data, ICD codes, lab values, and socioeconomic status.

Anticipated Findings

We anticipate that we will find phenotypic differences related to metabolic dysfunction between racially and ethnically diverse females with PCOS.

Demographic Categories of Interest

  • Race / Ethnicity
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Ky'Era Actkins - Graduate Trainee, Meharry Medical College

Erwin Update: DJS: Duplicate of JAMA PheWAS Final Review 05-21-2020

Project Purpose(s)

  • Methods Development
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.) ...

Scientific Questions Being Studied

As a demonstration project, this study will present the results of Phenome-Wide Association Studies (PheWAS) to show how the various sources of data contained within All of Us research dataset can be used to inform scientific discovery. We will perform separate PheWAS studies with smoking status as the independent variable. Specific questions include:

1. How can one implement a PheWAS within the All of Us Researcher Workbench?
2. How can one use heterogeneous data sources within the All of Us dataset to explore disease associations using self-reported exposures (Participant Provided Information, or “PPI”) and exposures captured in the electronic medical record (EHR).”

There is no pre-specified hypothesis. It is important to determine if PheWAS can be conducted within the All of Us workbench

Scientific Approaches

As a demonstration project, this study will present the results of Phenome-Wide Association Studies (PheWAS) to show how the various sources of data contained within All of Us research dataset can be used to inform scientific discovery. We will perform separate PheWAS studies with smoking status as the independent variable. Specific questions include:

1. How can one implement a PheWAS within the All of Us Researcher Workbench?
2. How can one use heterogeneous data sources within the All of Us dataset to explore disease associations using self-reported exposures (Participant Provided Information, or “PPI”) and exposures captured in the electronic medical record (EHR).”

There is no pre-specified hypothesis. It is important to determine if PheWAS can be conducted within the All of Us workbench

Anticipated Findings

For this study, we anticipate that we will be able to replicate known disease associations with smoking exposure. This will serve to demonstrate the quality, utility, and diversity of the All of Us data and tools and the power of gathering multiple data sources for a single phenotype, providing researchers options for study design and validation. Importantly the entire PheWAS package is made available for reuse by researchers in the Workbench, for new hypothesis generation.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • David Schlueter - Research Fellow, NIH

Collaborators:

  • Andrea Ramirez - Other, All of Us Program Operational Use
  • Jianglin Feng - Other, University of Arizona
  • Jason Karnes - Early Career Tenure-track Researcher, University of Arizona

Evaluating the Role of Potential Neuroprotective Agents in Glaucoma

Project Purpose(s)

  • Disease Focused Research (Glaucoma) ...

Scientific Questions Being Studied

Glaucoma, a chronic neurodegenerative disease of retinal ganglion cells (RGCs), is a leading cause of irreversible blindness worldwide. Its management currently focuses on lowering intraocular pressure to slow disease progression. However, disease-modifying, neuroprotective treatments for glaucoma remain a major unmet need. Several studies have been performed demonstrating potential for "repurposing" existing medications for protecting RGCs and mitigating the risk of developing glaucoma and/or slowing the progression of glaucoma. However, many of these studies have been performed in basic science settings using animal models and lack large-scale human data for validation. In this workspace, we plan to explore existing data on a diverse nationwide cohort to further evaluate the potential validity of candidate neuroprotective medications for influencing risk of glaucoma.

Scientific Approaches

We will primarily employ methods in clinical epidemiology, biostatistics, and machine learning. For questions relating to risk of developing glaucoma, we will examine data from a general older adult population and investigate whether use of candidate medications decreases the risk of developing glaucoma. For questions relating to risk of progressing glaucoma, we will examine data from a cohort of participants with existing diagnoses of glaucoma and evaluate whether use of candidate medications increases the risk of diagnosis codes related to greater severity. Depending on cohort size and the specific medication of interest, we may use cohort study or case-control study approaches. To examine associations of risk, we may use regression modeling, chi-squared analyses, longitudinal modeling, or machine learning methods, depending on the available cohort sizes and the characteristics of the predictor and outcome variables.

Anticipated Findings

We anticipate that this work will provide some validation of findings from the basic science literature by using real-world clinical data to support findings in animal models. Some of the medications may prove to be supported by human data, whereas others may not. Furthermore, we may even identify novel candidate medications not previously identified in the literature. This will advance the scientific knowledge of novel therapeutics for the prevention and treatment of glaucoma.

Demographic Categories of Interest

  • Age

Research Team

Owner:

  • Sally Baxter - Research Fellow, University of California, San Diego

Collaborators:

  • Bharanidharan Radha Saseendrakumar - Project Personnel, University of California, San Diego

Evidence of the Latino Epidemiologic Paradox in the All of Us Research Project

Project Purpose(s)

  • Disease Focused Research (Cardiovascular disease)
  • Population Health ...
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)

Scientific Questions Being Studied

The overall goal of this project is to examine whether there is evidence of the Latino Epidemiological Paradox within the All of Us Research Project (AoURP) cohort. The specific aims are:

Specific Aim 1
To determine whether Latinos have lower prevalence of gender stratified age-adjusted CVD versus NHWs and non-Hispanic blacks in the cohort.

Specific Aim 2
To determine whether Latinos have lower prevalence of gender stratified age-adjusted cancer (overall) versus NHWs and non-Hispanic blacks in the cohort.

Specific Aim 3
To determine whether Latinos have higher prevalence of gender stratified age-adjusted diabetes and obesity (overall) versus NHWs and non-Hispanic blacks in the cohort.

Specific Aim 4
To extent possible examine differences by Latino subgroups and among foreign born versus US born Latinos.

Scientific Approaches

Study population. All of Us Research Project core participants. We will examine data from different data sources including electronic health records (EHR) and participant provided information (PPI) and physical measurements.

Main outcome variables: we will work with the DRC Research Support Team to obtain support for their existing classification scheme for common complex diseases which in this project would include cardiovascular disease, cancer (including subtypes to extent possible) and Diabetes (Type 2). For the definition of diseases we will use EHR data to preserve very objective outcomes, excluding for now survey data.

Statistical analysis
We will present all data stratified by gender adn age adjusted using direct standardization. BMI categories would be <25, 25-30, 30-35 and >35). For diabetes AIC data will be categorized (AIC <7, AIC 7-9 and AIC > 9).

Anticipated Findings

We expect to find evidence of the Latino Epidemiological Paradox within the All of Us Research Project (AoURP) cohort. We expect to find that despite multiple social and economic disadvantages, overall on many measures of population health Latinos seem to have a more favorable health advantage than other racial/ethnic minority groups such as blacks and in some measures even better health status than Non-Hispanic Whites (NHWs).

Previous studies like the Study of Latinos (SOL), which is the largest study of Latinos (16,000), aimed to examine this paradox but had the limitation that only included Latinos and thus comparative data on non-Latinos was not collected. With 40,000 Latinos core participants in the AllofUs study (as well 160,000 non Latinos), the AoURP study is uniquely positioned to contribute our knowledge and further understanding of this paradox.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth

Research Team

Owner:

  • Raul Montanez Valverde - Graduate Trainee, University of Miami

Collaborators:

  • olveen carrasquillo - Late Career Tenured Researcher, University of Miami

Exercise, HIV, Mental Health, Medication Adherence, Substance use

Project Purpose(s)

  • Social / Behavioral ...

Scientific Questions Being Studied

The aims of these analyses will be to determine if poor mental health and substance use mediate the relationship between exercise and medication adherence in people living with HIV.

I hypothesize that people living with HIV who exercise will have better mental health and less substance use behaviors and thus more consistent medication adherence. We expect the opposite for people living with HIV who exercise less.

Scientific Approaches

The dataset will include all people living with HIV who also answer questionnaires about their exercise, mental health, substance use, and medication adherence.

Anticipated Findings

I hypothesize that people living with HIV who exercise will have better mental health and less substance use behaviors and thus more consistent medication adherence. We expect the opposite for people living with HIV who exercise less.

Demographic Categories of Interest

  • Race / Ethnicity
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Nick SantaBarbara - Research Fellow, University of California, Los Angeles

Exploration of data for use in predicting cancer diagnosis

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

This is my initial project, and is set up to explore the possibilities with All of Us data.

My scientific interest is to better understand why some people develop specific cancers and some do not.

Scientific Approaches

I plan to use machine learning applied to germ line DNA data to answer these questions.

Anticipated Findings

Ultimately the findings from this line of work should lead to better predictive tests for cancer. An example is that someday you might be able to take a blood sample from a young adult and tell them that they will probably develop colon cancer sometime in the next 40 years. This might lead them to screen for colon polyps more rigorously and ultimately let them avoid a late stage colon cancer diagnosis.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • James Brody - Mid-career Tenured Researcher, University of California, Irvine

Exploratory Analysis

Project Purpose(s)

  • Population Health
  • Methods Development ...

Scientific Questions Being Studied

Lab reference ranges have traditionally been derived from healthy populations. To the extent that a patient fits into that population, the reference range is pertinent to the patient. Unfortunately, for patients with chronic diseases or multiple comorbidities, their "normal" (usual value) may not fall into the reference interval for a healthy cohort and be flagged as “abnormal” even though it does not represent a significant or actionable change in the patient’s pathophysiology. In an internal stakeholder analysis at the Veteran's Affairs, we found 1-17% (median 5%) of the abnormal alerts were clinically useful; the rest (95%) represent noise (values that had to be reviewed but have no clinical significance). We believe that novel techniques may enable the creation precision
references intervals that are unique to each patient.

Scientific Approaches

Starting with population based reference intervals, as we acquire information about a patient from prior clinical lab work, diagnostic coding information, and disease evolution, we should be able to set individualized (precision) reference intervals. We hope to adopt methods from statistical process control and Bayesian statistics to adapt population level reference ranges to the individual.

Anticipated Findings

We hope to create a computational model for establishing precision medicine reference intervals for three common outpatient laboratory tests with low signal to noise ratios in patients with chronic diseases. This model would provide better reference ranges for the unique physiology of these patients and would provide physicians with a better understanding of abnormality in their context.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Alistair Johnson - Early Career Tenure-track Researcher, Massachusetts Institute of Technology

Explore

Project Purpose(s)

  • Other Purpose (The data will be used to explore the workbench to get more clear about what kind of information I can use for further research.) ...

Scientific Questions Being Studied

I'm learning the workflow of the workbench and trying to get basic information like data structure, information content, and statistic numbers.

Scientific Approaches

I will use the full datasets for exploring purposes. Observe the dataset and try to use all tools I can to determine which one is suitable for our future studies.

Anticipated Findings

This exploring study will help me find what kind of data the workbench can provide and what kind of tools I can use.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Yujia Zhou - Project Personnel, University of Texas Health Science Center, Houston

explore

Project Purpose(s)

  • Disease Focused Research (disease of mental health)
  • Population Health ...
  • Social / Behavioral
  • Drug Development
  • Methods Development

Scientific Questions Being Studied

This research is focused on leveraging digital phenotyping techniques to understand user's behavioral patterns and its relationship to mental health, We would like to understand the behavioral markers of the disease as manifested with mobile phone usage. The research focuses on understanding the patterns of behavior as manifested in sensor data. Moreover, since the All of Us dataset offers data from a wide range of experimental modalities, it offers the opportunity to find correlations in the data never explored before such as the relationship of behavior and clinical records. Also, the dataset allows performing research in a wide range of cultural backgrounds and age ranges.

Scientific Approaches

At the Institute for Medical and Engineering Sciences at MIT, we are interested in a data science approach to understanding human behavior via digital phenotyping. We are also interested in the rich context of the individual. We plan to leverage new methods in machine learning such as deep learning to make sense of longitudinal behavioral data.

Anticipated Findings

As of today, we do not know what the biomarkers or behavioral markers of mental disease are. This research hopes to shine light into the most basic understanding of human behavior and its relationship to mental disease as measured with mobile sensing technology. The All of Us dataset allows us to perform this research in a wider audience and with more experimental variables that as never performed before.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Disability Status
  • Education Level
  • Income Level

Research Team

Owner:

  • Omar Costilla Reyes - Research Fellow, Massachusetts Institute of Technology

Exploring

Project Purpose(s)

  • Educational
  • Other Purpose ( exploring the workbench as part of an applied biomedical informatics graduate course and that you’ll be leveraging AoU for educational purposes. ) ...

Scientific Questions Being Studied

How useful is All of Us data in biomedical and public health research? For this workspace I intend on looking around the workspace and understanding how the information in All of US will help formulate new hypotheses. I intend on using BMI data and perhaps other types of data to help me in this analysis.

Scientific Approaches

I intend on using the workspace to review tools, processes, and data across the longitudinal cohort of AoU participants.

Anticipated Findings

In exploring the All of Us workspace, I will understand if it is a viable tool for my research. If it is, I may continue to use this tool in the future.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Disability Status
  • Access to Care
  • Income Level

Research Team

Owner:

  • Michelle Gomez - Graduate Trainee, Vanderbilt University

Exploring All of Us

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

This workspace will be used to get to know the tools and features of All of Us. We hope that by getting this experience, we can better help researchers at our institution who are using the Workbench for research.

Scientific Approaches

We are interested in understanding how to work with this data in R and Jupyter notebooks.

Anticipated Findings

As this is an exploration of All of Us and its features, there are no anticipated findings. However, by doing this exploration we may be better able to support researchers producing findings from their research.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Amy Yarnell - Other, University of Maryland, Baltimore

Collaborators:

  • Jean-Paul Courneya - Other, University of Maryland, Baltimore

Eye Related

Project Purpose(s)

  • Disease Focused Research (eye disease) ...

Scientific Questions Being Studied

I am currently exploring the data to determine the amount of eye health related information exists in All of US. This will help determine future research questions.

Scientific Approaches

At this stage, my use of the workbench is exploratory. I will be using data mining techniques to look for eye health related data to inform future research.

Anticipated Findings

I expect to find some data relevant to eye health to provide a basis for further study.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Kerry Goetz - Project Personnel, NIH

Fecal Incontinence

Project Purpose(s)

  • Disease Focused Research (fecal incontinence) ...

Scientific Questions Being Studied

Exploring the data to see what is available in this field to see what questions can be asked, specifically in regards to racial differences in fecal incontinence prevalence and treatments.

Scientific Approaches

Unclear at this point; much depends on the availability of a robust set of research for this disease but would likely use simple univariable and multivariable analyses to answer questions about racial differences in this disease.

Anticipated Findings

Differences in racial prevalence of fecal incontinence are real and have impacts on treatments.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Disability Status
  • Access to Care
  • Education Level

Research Team

Owner:

  • Kyle Staller - Early Career Tenure-track Researcher, Massachusetts General Hospital

First

Project Purpose(s)

  • Other Purpose (Learn how to use workbench (first execution)) ...

Scientific Questions Being Studied

This is my first execution of workbench. There are no specific research questions. I hope to learn how to use the workbench. As new user, I don't know what to expect and what functions the workbench has.

Scientific Approaches

that are no formal approaches planed for this project. This is the first project of this user and the purpose is to learn how to use all possible workbench functions.

Anticipated Findings

demonstrate ability to execute workbench tools and functions. It will consist of query and results that flow from such query. E.g., SQL query results.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Vojtech Huser - Other, NIH

First Test Workspace

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

Exploratory data analysis to start, to see if can support research questions around clinical decision support applications:

- Can the results of microbiology culture tests be accurately predicted based on available patient / clinical data at the time of test ordering?

- Can the clinical orders from new specialty consultation visits be predicted based on available patient / clinical data at the time of referral from a generalist?

Scientific Approaches

Supervised and unsupervised machine learning models (e.g., collaborative filtering) applied to clinical data sources to predict subsequent labels in the form of clinical test orders and results.
Cases where patients receive empiric antibiotic prescriptions (simultaneous antibiotics with new diagnostic microbiology culture tests).
Cases where a patient is referred to and then subsequent sees a specialist (e.g., endocrinology or hematology).

Anticipated Findings

Clinical orders and tests results are sufficiently predictable given available data that they can power clinical decision support information retrieval tools to aid clinical decision making under uncertainty.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jonathan Chen - Early Career Tenure-track Researcher, Stanford University

For training and learning

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

The workspace is aimed to develop a learning module and provide and exposure to students on potential social science based implications for occupational choices.

Scientific Approaches

The workspace aims to use traditional statistical methods in Python.

Anticipated Findings

Developing a deeper understanding of the dataset and the baseline descriptives related to occupational choice.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Pankaj Patel - Late Career Tenured Researcher, Villanova University

For_DRC_obesity_code_review

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Educational ...
  • Methods Development

Scientific Questions Being Studied

National obesity prevention and intervention strategies may benefit from precision medicine approaches that incorporate integrated data on environments, social determinants of health, and genomic factors. We examined the quality and utility of the All of Us Research Hub Workbench for accelerating precision medicine by replicating methods from existing studies that examine the prevalence of obesity at the population level. We evaluated the measurements of obesity in the participant measurement (PM) data set and the electronic health record (EHR) data set using methods similar to the Ward et al. NEJM December 2019 publication that assessed prevalence of obesity in the US by state using BRFSS data.

Scientific Approaches

For this population-based cross-sectional study of All of Us Research Workbench participants, we excluded individuals with measurements obtained during pregnancy or inpatient visits and individuals from states with fewer than 100 participants. Physical measurements (PM) of height and weight at the time of program enrollment of 142,116 participants and measured weight and height extracted from electronic health records (EHR) of 40,885 individuals were used to calculate body-mass index (BMI). We did a complete case analysis for All of Us participants with known sex (male or female), race, income and education levels and estimated state-specific and demographic subgroup-specific prevalence of categories of BMI [obesity (BMI ≥30) and extreme obesity (BMI ≥ 35)] nationwide and for each state: overall and by subgroups, male and female. We examined the difference between EHR and PM calculated BMI by state.

Anticipated Findings

Using states with at least 100 participants, PM data included 142,116 individuals (mean [SD] age, 51.2 [16.6] and EHR data on height and weight included 40,885 individuals (mean [SD] age, 52.5 [16.5]. The median BMI for PM participants was 28.4 [24.4 to 33.7]; the median BMI for EHR was 29.0 [24.8 to 34.5]. The PM national prevalence for obesity (includes BMI>30 and BMI >35) and extreme obesity (BMI >35) were 41.2 % (95% Confidence Interval [CI], 40.9 to 41.4) and 20.8% (95% CI, 20.6 to 21.0), respectively, with large variations across states. Women had higher prevalence of extreme obesity than men in all selected states. Subgroups with extreme obesity (BMI, >35) prevalence greater than 25% included subgroup, N, prevalence %, (95% CI): Black NH, 8913, 28.9 (25.8 to 32.0) , individuals with income less than $25,000, 13,244, 25.1 (22.1 to 28.1); education of high school to some college, 17, 272, 26.1 (23.1 to 29.1) and the region of the South, 6,639, 25.3 (22.3 to 28.3).

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

For_HTN_code_review

Project Purpose(s)

  • Other Purpose (This work is an AoU demo project. Demo projects are efforts by the AoU Research Program designed to meet the program goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. As an approved demo project, this work was reviewed and overseen by the AoU Research Program Science Committee and the AoU Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use. ) ...

Scientific Questions Being Studied

We are using the All of Us Researcher Workbench interface to answer the question, "Is hypertension prevalence in the All of Us Research Program similar to hypertension prevalence in the 2015–2016 National Health and Nutrition Examination Survey (NHANES) ?". Clinical approaches to understanding and treating hypertension may benefit from the integration of a precision medicine approach that integrates data on environments, social determinants of health, behaviors, and genomic factors that contribute to hypertension risk. Hypertension is a major public health concern and remains a leading risk factor for stroke and cardiovascular disease.

Scientific Approaches

In this cross-sectional, population-based study, we used All of Us baseline data from patient (age>18) provided information (PPI) surveys and electronic health record (EHR) blood pressure measurements and retrospectively examined the prevalence of hypertension in the EHR cohort using Systemized Nomenclature of Medicine (SNOMED codes and blood pressure medications recorded in the EHR. We used the EHR data (SNOMED codes on 2 distinct dates and at least one hypertension medication) as the primary definition, and then add subjects with elevated systolic or elevated diastolic blood pressure on measurements 2 and 3 from PPI. We extracted each participant’s detailed dates of SNOMED code for essential hypertension from the Researcher Workbench table ‘cb_search_all_events’. We calculated an age-standardized HTN prevalence according to the age distribution of the U.S. Census, using 3 groups (18-39, 40-59, ≥ 60).

Anticipated Findings

The prevalence of hypertension in the All of Us cohort is similar to that of published literature. All of Us age-adjusted HTN prevalence was 27.9% compared to 29.6% in National Health and Nutrition Examination Survey. The All of Us cohort is a growing source of diverse longitudinal data that can be utilized to study hypertension nationwide. The prevalence of hypertension varies in the United States (U.S.) by age, sex, and socioeconomic status. Hypertension can often be treated successfully with medication, and prevented or delayed with lifestyle modifications. Even with these established hypertension intervention and prevention strategies, the prevalence of hypertension continues to be at levels of public health concern. The diversity within All of Us may provide insight into factors relevant to hypertension prevention and treatments in a variety of social and geographic contexts and population strata in the U.S.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital
  • Elizabeth Karlson - Late Career Tenured Researcher, Massachusetts General Hospital
  • Cheryl Clark

for_obesity_code_review

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Educational ...
  • Methods Development

Scientific Questions Being Studied

National obesity prevention and intervention strategies may benefit from precision medicine approaches that incorporate integrated data on environments, social determinants of health, and genomic factors. We examined the quality and utility of the All of Us Research Hub Workbench for accelerating precision medicine by replicating methods from existing studies that examine the prevalence of obesity at the population level. We evaluated the measurements of obesity in the participant measurement (PM) data set and the electronic health record (EHR) data set using methods similar to the Ward et al. NEJM December 2019 publication that assessed prevalence of obesity in the US by state using BRFSS data.

Scientific Approaches

For this population-based cross-sectional study of All of Us Research Workbench participants, we excluded individuals with measurements obtained during pregnancy or inpatient visits and individuals from states with fewer than 100 participants. Physical measurements (PM) of height and weight at the time of program enrollment of 142,116 participants and measured weight and height extracted from electronic health records (EHR) of 40,885 individuals were used to calculate body-mass index (BMI). We did a complete case analysis for All of Us participants with known sex (male or female), race, income and education levels and estimated state-specific and demographic subgroup-specific prevalence of categories of BMI [obesity (BMI ≥30) and extreme obesity (BMI ≥ 35)] nationwide and for each state: overall and by subgroups, male and female. We examined the difference between EHR and PM calculated BMI by state.

Anticipated Findings

Using states with at least 100 participants, PM data included 142,116 individuals (mean [SD] age, 51.2 [16.6] and EHR data on height and weight included 40,885 individuals (mean [SD] age, 52.5 [16.5]. The median BMI for PM participants was 28.4 [24.4 to 33.7]; the median BMI for EHR was 29.0 [24.8 to 34.5]. The PM national prevalence for obesity (includes BMI>30 and BMI >35) and extreme obesity (BMI >35) were 41.2 % (95% Confidence Interval [CI], 40.9 to 41.4) and 20.8% (95% CI, 20.6 to 21.0), respectively, with large variations across states. Women had higher prevalence of extreme obesity than men in all selected states. Subgroups with extreme obesity (BMI, >35) prevalence greater than 25% included subgroup, N, prevalence %, (95% CI): Black NH, 8913, 28.9 (25.8 to 32.0) , individuals with income less than $25,000, 13,244, 25.1 (22.1 to 28.1); education of high school to some college, 17, 272, 26.1 (23.1 to 29.1) and the region of the South, 6,639, 25.3 (22.3 to 28.3).

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Functional GI Disorders Among Black/AA Patients

Project Purpose(s)

  • Disease Focused Research (Functional GI Diseases) ...

Scientific Questions Being Studied

The aim of this study is to explore and characterize common functional GI diseases among patients who identify as African American or Black. We are interested specifically in irritable bowel syndrome (IBS). As a part of this research study, we will be comparing comorbidities, demographic, socioeconomic information, and medical management among among patients who identify as African American or Black, as well as Caucasians. Minority populations are more likely to face health disparities and issues related to access to care, however they are frequently underrepresented in clinical research. Research focused on investigating diseases which affect people from a an array of racial and ethnic backgrounds is one way to help take steps towards ensuring quality of clinical care for all patients.

Scientific Approaches

To complete this study, we plan to utilize several datasets, based primarily on four different cohort populations from the All of Us database, including a cohort of Black patients with IBS, a cohort of Black patients without IBS, a cohort of White patients with IBS, and one without. By creating an age and sex matched control group, we will not only be able to compare differences in overall health status, but also differences in patient socioeconomic status, perceptions of the health care which they receive based on survey data, as well as basic demographics. Following the creation of the cohorts, we will use the exported data, All of US notebook and statistical software to identify whether or not there are meaningful differences between the groups.

Anticipated Findings

There have not been many studies of functional GI diseases, or IBS in general which have been specifically focused on Black populations, however one epidemiological study from 2005 demonstrated that IBS occurs less frequently among African Americans, although the disease IBS affects quality of life among both ethnicities, the degree of impairment is similar. A second population-based study has been published which explored racial differences in the overlap between IBS and dyspepsia between African American and Caucasians. Comorbid functional GI disorders other than dyspepsia have not been explored. Specific perceptions of care have also not been thoroughly explored in this population. Our study utilizing All of Us Data will contribute to the field by creating a more holistic picture of the characteristics, medical management and health perspectives of black patients living with IBS, using a generalizable population database.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity

Research Team

Owner:

  • Taylor Boyd - Graduate Trainee, Massachusetts General Hospital

Collaborators:

  • Casey Silvernale - Graduate Trainee, Massachusetts General Hospital
  • Kyle Staller - Early Career Tenure-track Researcher, Massachusetts General Hospital

Functional GI Disorders Among Black/AA Patients

Project Purpose(s)

  • Disease Focused Research (Functional GI Diseases) ...

Scientific Questions Being Studied

The aim of this study is to explore and characterize common functional GI diseases among patients who identify as African American or Black. We are interested in comparing comorbidities, demographic and socioeconomic information, and medical management among this group. Minority populations are more likely to face health disparities and issues related to access to care, however are frequently underrepresented in clinical research. Research focused on Investigating diseases, which towards to ensure better quality of clinical care for all patients.

Scientific Approaches

To complete this study, we plan to utilize several datasets, based primarily on four different cohort populations from the All of Us database, including a cohort of Black patients with IBS, a cohort of Black patients without IBS, a cohort of White patients with IBS, and one without. By creating an age and sex matched control group, we will not only be able to compare differences in overall health status, but also differences in patient socioeconomic status, perceptions of the health care which they receive based on survey data, as well as basic demographics. Following the creation of the cohorts, we will use the exported data, All of US notebook and statistical software to identify whether or not there are meaningful differences between the groups.

Anticipated Findings

There have not been many studies of functional GI diseases, or IBS in general which have been specifically focused on Black populations, however one epidemiological study from 2005 demonstrated that IBS occurs less frequently among African Americans, although the disease IBS affects quality of life among both ethnicities, the degree of impairment is similar. A second population-based study has been published which explored racial differences in the overlap between IBS and dyspepsia between African American and Caucasians. Comorbid functional GI disorders other than dyspepsia have not been explored. Our study utilizing All of Us Data will contribute to the field by creating a more holistic picture of the characteristics, medical management and health perspectives of black patients living with IBS, using a generalizable population database.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity

Research Team

Owner:

  • Taylor Boyd - Graduate Trainee, Massachusetts General Hospital