Research Projects Directory

Research Projects Directory

At this time, all listed projects are using data in the registered tier. The registered tier contains individual-level data from electronic health records, survey answers, and physical measurements. These data have been altered to protect participant privacy.

Note: Researcher Workbench users provide information about their research projects independently. Any views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program.

Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

There are currently 73 active workspaces. This information was updated on 7/2/2020.

Sort By Title:

Gender Identity Algorithm

Project Purpose(s)

  • Population Health
  • Methods Development ...

Scientific Questions Being Studied

Understanding of the unique health needs of gender minorities, including transgender and gender non-binary individuals, is critical. Gender minorities face greater health disparities due in part to a lack of research into population-specific health concerns. Consequently, it is critical to understand health outcomes such as cancer risk s in this population. However, the primary challenge of studying cancer and other rare diseases among gender minorities is that they represent a hard-to-reach population and the collection of gender identity is often lacking in national surveys or healthcare databases. Previous investigations have used diagnosis codes to create algorithms to identify non-binary gender identity in large databases but lacked patient-reported gender identity. We seek to build an algorithm in All of Us, which uses self-reported gender identity (the gold standard), to identify and characterize the health of gender minorities more accurately.

Scientific Approaches

We propose to create an algorithm with diagnostic, procedure, and medication codes from electronic health records to identify transgender individuals using a gold standard for gender identity. We will use machine learning techniques to classify patients as transgender/non-binary or cisgender among the >100,000 individuals, including 400 transgender and 520 non-binary participants, in All of Us. Predictors of transgender status will be selected based on consultations with clinicians with expertise in transgender healthcare. These variables include sociodemographic (age, race/ethnicity), ICD-9/10 diagnosis codes (gender dysphoria), procedure codes for gender-affirming procedures (e.g. hysterectomy), and prescriptions for gender-affirming hormone therapy. We will use 10-fold cross-validation for the internal validation of the models. We will calculate sensitivity, specificity, positive predictive value, and negative predictive value to assess model performance.

Anticipated Findings

Previous investigations into the use of diagnosis codes for identifying non-binary gender identities did not have a gold standard for defining gender identity. We aim to create an algorithm that accurately defines transgender gender identity in large administrative databases to aid in future research efforts. The next step will be to apply this algorithm to other electronic health data such as Medicaid, Medicare, and private insurance databases. The ability to identify a population of transgender patients in large healthcare datasets will be a boon to health research on transgender and non-binary individuals.

Demographic Categories of Interest

  • Sex at Birth
  • Gender Identity

Research Team

Owner:

  • Sarah Jackson - Research Fellow, NIH

Genetic data

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

We are studying how genetic data may be incorporated into electronic health records, exploring the data formats and interoperability issues as well as data presentation needs for clinicians and patients.

Scientific Approaches

We will use various methods of data visualization and thinkaloud usability protocols to test ability to visualize the data in meaningful ways with value for the clinician decision maker, and possibly use the FHIR standard for data interoperability.

Anticipated Findings

The findings will help us understand how to incorporate genetic data with other clinical data for decision making, and what are the most efficient and usable ways to visualize data for this purpose.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Yalini Senathirajah - Early Career Tenure-track Researcher, University of Pittsburgh

Health Disparities

Project Purpose(s)

  • Population Health ...

Scientific Questions Being Studied

Are there any disparities in pregnancies in medicaid expansion vs. non-expansion space.

Scientific Approaches

Not available.

Anticipated Findings

Medicaid expansion states will have better outcomes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Shaquille Peters - Project Personnel, Scripps Research

HK_workspace_train_v1

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

How breast cancer is diagnosed in different ancestral populations.

Scientific Approaches

Not available.

Anticipated Findings

Diagnoses are detected at different stages in different ancestral populations.

Demographic Categories of Interest

Not available.

Research Team

Owner:

  • Hooman Kamel - Mid-career Tenured Researcher, Cornell University

hma4

Project Purpose(s)

  • Disease Focused Research (cardiovascualr diseases, diabetes and dementia)
  • Population Health ...
  • Social / Behavioral
  • Educational
  • Drug Development
  • Methods Development
  • Control Set
  • Ancestry
  • Commercial
  • Ethical, Legal, and Social Implications (ELSI)

Scientific Questions Being Studied

Diabetes, cardiovascualar diseases and dementia are the major challenges to human health, which are determined by genetic susceptibility, environmental risk factors, and their interactions. However, the evidence on the G×E (genetic facotrs* environmental factors) interaction and unconfounded estimates of a modifiable exposure is still laking. This study plan aim to investigate whether modifiable factors for such disease may interact with the genetic variations in relation to risks of diabetes, cardiovascualar diseases and dementia.

Scientific Approaches

Datasets: all the genotype and phenotype related datasets
Reseach method:I plan to use G×E interaction , COX model, Losigical model in my study.
Tools: R.
Scientific question: whether modifiable factors could modify the association between genetic risk and disease risks (Diabetes, cardiovascualar diseases and dementia)

Anticipated Findings

Some individual environmental fators or an overall modifiable-risk-factor profile may modify the association between genetic risk and disease risk.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Hao Ma - Research Fellow, Tulane University

HTN_stroke_race

Project Purpose(s)

  • Disease Focused Research (stroke)
  • Other Purpose (“This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use”. ) ...

Scientific Questions Being Studied

Prior studies indicate that there are racial differences in the impact of elevated blood pressure on stroke risk. The exact blood pressure threshold at which to begin antihypertensive therapy remains controversial, as does the ideal blood pressure target. Regardless of the specific thresholds and targets chosen, current guidelines and clinical practice patterns do not account for race and ethnicity when managing blood pressure. However, prior studies indicate that there are racial differences in the impact of elevated blood pressure on stroke risk; in a large, longitudinal cohort study, every 10-mm Hg increase in systolic blood pressure was associated with an 8% increase in stroke risk among white individuals versus a 24% increase in black individuals. These findings suggest that blood pressure targets may need to be personalized, at least based on race/ethnicity and ideally based on genetics, vascular risk factors, and lifestyle factors.

Scientific Approaches

The study population comprised 108,322 participants with a SBP measurement, of whom 369 had stroke before/after the measurement. In an unadjusted logistic regression model, systolic blood pressure was significantly associated with stroke (OR per mm Hg, 1.01; 95% CI, 1.00-1.01; P < 0.001). This confirms a well-established finding from numerous prior studies. We then examined this association stratified by black versus non-black. Among black participants, SBP was significantly associated with stroke (OR per mm Hg, 1.01; 95% CI, 1.00-1.02; P = 0.002); in patients of other races, SBP was non-significantly associated with stroke (OR per mm Hg, 1.00; 95% CI, 1.00-1.01; P = 0.11). The lack of association in non-black participants is most likely due to insufficient power, but the different strength of association between black and non-black participants confirms prior findings in other cohorts such as REGARDS.

Anticipated Findings

That the association between hypertension and stroke is stronger among African Americans compared to patients of other races. These results suggest that ethnicity-specific blood pressure thresholds may be superior to a uniform population-wide threshold and promises to inform current uncertainties about blood pressure management.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Cenai Zhang - Project Personnel, Cornell University

Collaborators:

  • Margaret Ross - Late Career Tenured Researcher, Cornell University
  • Hooman Kamel - Mid-career Tenured Researcher, Cornell University

Hypertension

Project Purpose(s)

  • Population Health
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use) ...

Scientific Questions Being Studied

Uncontrolled hypertension is a primary contributor to coronary heart disease, stroke, and heart failure. Hypertension can be treated successfully in many cases with medication and prevented or delayed with lifestyle modifications. Even with this success, the prevalence of hypertension continues to be at levels of public health concern, and its control in the United States is far below what is possible. In this demonstration project, we focus on the prevalence of hypertension and its awareness, treatment, and control in a large and diverse participant sample of the All of Us Research Program. Specific questions include:
1) What is the prevalence of hypertension among participants in the All of Us Research Program?
2) Among hypertensive participants, what is the prevalence of awareness, treatment, and control?
3) How do these estimates compare to the general US population assessed in the National Health and Nutrition Examination Survey (NHANES), 2015-2016?

Scientific Approaches

This descriptive analysis is based on blood pressure measurements from the participants’ physical measurement evaluations, and data derived from participant provided information (PPI) and electronic health records (EHR).
1) Demographic factors such as age, sex, race/ethnicity, educational attainment, income and health insurance were assessed in the PPI questionnaire.
2) PPI questionnaire data was also used to define self-reported doctor diagnosis of hypertension and self-reported hypertension medication use.
3) EHR evidence of hypertension diagnosis was defined as the presence of ICD9/ICD10 codes corresponding to hypertension any time before baseline.
4) EHR evidence of hypertension medication use was defined as at least one drug exposure to hypertension medications any time before baseline.

Anticipated Findings

For this study, we anticipate that the prevalence, awareness, treatment, and control of hypertension will be different across demographic strata. This will help to identify health disparities and improve health equity in vulnerable populations. We also anticipate that estimates will be different between the All of Us Research Program and the general US population assessed in NHANES 2015-2016. Understanding these differences will help to characterize potential selection bias and demonstrate the quality and utility of the All of Us data and tools.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Madhawa Saranadasa - Graduate Trainee, University of Illinois at Chicago

Collaborators:

  • Maria Argos - Mid-career Tenured Researcher, University of Illinois at Chicago