Research Projects Directory

Research Projects Directory

Information about each research project within the Workbench is available in the Research Projects Directory below. Approved researchers provide their project’s research purpose, description, populations of interest, and more. This information helps All of Us ensure transparency on the type of research being conducted.

At this time, all listed projects are using data in the Registered Tier. The Registered Tier contains individual-level data from electronic health records, survey answers, physical measurements, and Fitbit. These data have been altered to protect participant privacy.

Note: Researcher Workbench users provide information about their research projects independently. Any views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program.

Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

There are currently 422 active workspaces. This information was updated on 3/2/2021.

Sort By Title:

patterns of inomnia

Project Purpose(s)

  • Disease Focused Research (psychiatric disorders) ...

Scientific Questions Being Studied

I intend to study the distribution of insomnia and other sleep disorders in the US, and their relationships with psychiatric disorders such as depression and anxiety. It is well known that people with mental health conditions frequently have difficulty sleeping. We are trying to better understand these relationships so we can develop better approaches to the prevention and treatment of sleep and psychiatric disorders.

Scientific Approaches

At this initial stage we will primarily be examining associations among variables related to sleep and mental health. As we better understand the All of Us data we will develop more sophisticated analysis plans.

Anticipated Findings

As we better understand the relationships between sleep and mental health in the US, we will be able to develop models for identifying individuals at risk for these types of problems. We also hope to learn new approaches for the prevention and treatment of sleep and psychiatric disorders.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography

Research Team

Owner:

  • Philip Gehrman - Mid-career Tenured Researcher, University of Pennsylvania

Collaborators:

  • Man Wing Chung - Graduate Trainee, University of Pennsylvania
  • Christine Ramsey - Other, Yale University

Pdc obesity map 11102019

Project Purpose(s)

  • Population Health ...

Scientific Questions Being Studied

Exploring the data to determine obesity patterns by region in USA and by race/ethnicity

Scientific Approaches

Not available.

Anticipated Findings

I expect that the regional obesity maps generated with all of us data will parallel the cdc maps

Demographic Categories of Interest

Not available.

Research Team

Owner:

  • Paulette Chandler - Early Career Tenure-track Researcher, Massachusetts General Hospital

Collaborators:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Phenome-wide associations of metabolic disorder measurements_v3

Project Purpose(s)

  • Population Health
  • Social / Behavioral ...

Scientific Questions Being Studied

THe aims of this project are to identify known and novel disease associations with cardiometabolic traits, utilizing the All of Us (AoU) dataset. Evaluate if known racial/ethnic, education, and socioeconomic differences in cardiometabolic disorder can be replicated utilizing the AoU dataset. We hope to expand the scope to include all relevant measures related to cardiometabolic disorders and assess the possibility for selection bias and issues of generalizability in cohort participant selection. There are well established disparities in rates of metabolic disorders related to race/ethnicity, gender, and socioeconomic status. There is also a general lack of diversity and the potential for healthy-patient bias in large epidemiological datasets. For these reasons we seek to use All of Us data to forerun projects that are more inclusive and facilitate a change in traditionally underrepresented research.

Scientific Approaches

Utilizing the CDC National Health and Nutrition Examination Survey(NHANES), a nationally representative sample, we will compare prevalence rates and racial/ethnic and gender group distributions of key metabolic disorder parameters. To quantitatively investigate the generalizability of the AoU data we will assess differences in the demographic and healthy-lifestyle characteristics between the AofU data and the NHANES data. We will use linear, logistic, and Poisson regression where appropriate to compare differences between groups.

Anticipated Findings

This project will serve as a springboard for future collaborations and grant applications utilizing AoU data and will generate information that will help future researchers better understand both the internal and external validity of the AofU dataset. We will build a foundation for understanding both the internal and external validity of this novel data source having this formative work influence the scientific communities’ understanding of the All of Us data source. We anticipate that this work will be highly cited and useful for future generations of researchers.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Jo-el Banini - Undergraduate Student, University of Arizona

Collaborators:

  • Yann Klimentidis - Mid-career Tenured Researcher, University of Arizona
  • Amit Arora - Graduate Trainee, University of Arizona
  • Victoria Bland - Graduate Trainee, University of Arizona

Phenome-wide associations of metabolic disorder measurements_v4

Project Purpose(s)

  • Population Health
  • Social / Behavioral ...

Scientific Questions Being Studied

THe aims of this project are to identify known and novel disease associations with cardiometabolic traits, utilizing the All of Us (AoU) dataset. Evaluate if known racial/ethnic, education, and socioeconomic differences in cardiometabolic disorder can be replicated utilizing the AoU dataset. We hope to expand the scope to include all relevant measures related to cardiometabolic disorders and assess the possibility for selection bias and issues of generalizability in cohort participant selection. There are well established disparities in rates of metabolic disorders related to race/ethnicity, gender, and socioeconomic status. There is also a general lack of diversity and the potential for healthy-patient bias in large epidemiological datasets. For these reasons we seek to use All of Us data to forerun projects that are more inclusive and facilitate a change in traditionally underrepresented research.

Scientific Approaches

Utilizing the CDC National Health and Nutrition Examination Survey(NHANES), a nationally representative sample, we will compare prevalence rates and racial/ethnic and gender group distributions of key metabolic disorder parameters. To quantitatively investigate the generalizability of the AoU data we will assess differences in the demographic and healthy-lifestyle characteristics between the AofU data and the NHANES data. We will use linear, logistic, and Poisson regression where appropriate to compare differences between groups.

Anticipated Findings

This project will serve as a springboard for future collaborations and grant applications utilizing AoU data and will generate information that will help future researchers better understand both the internal and external validity of the AofU dataset. We will build a foundation for understanding both the internal and external validity of this novel data source having this formative work influence the scientific communities’ understanding of the All of Us data source. We anticipate that this work will be highly cited and useful for future generations of researchers.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Amit Arora - Graduate Trainee, University of Arizona

Collaborators:

  • Yann Klimentidis - Mid-career Tenured Researcher, University of Arizona
  • Jo-el Banini - Undergraduate Student, University of Arizona
  • Victoria Bland - Graduate Trainee, University of Arizona

Phenotype Library

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

We want to validate phenotype algorithms we developed using different datasets. If the algorithms are valid in the All of Us dataset, we will conduct disease-focused research using the resulting phenotypes.

Scientific Approaches

Not Applicable

Anticipated Findings

The validation of our phenotype algorithms in the All of Us cohort will be assessed. Using valid phenotype algorithms can make our research method consistent across different datasets/cohorts. Our previous findings thus can be validated using the All of Us data. New disease-focused research could be conducted in the All of Us program.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Xin Wang - Project Personnel, Massachusetts General Hospital

PheWAS of mCNV/VNTR/STRs across populations

Project Purpose(s)

  • Ancestry ...

Scientific Questions Being Studied

The aim of our research is to link both common and rare tandem repeat (TR) expansions across the human genome to disease phenotypes across a varied and diverse patient population. Furthermore, we wish to model the modulation of these repeat expansions to explain how variations in repeat size and copy number translate to variable disease states, and develop genotype groupings based on these repeat expansion categories.

Scientific Approaches

We plan to use the vast phenotypic disease data available with the whole genome sequencing data to perform phenotype wide association studies (PheWAS) using a number of bioinformatic tools including BOLT-LMM and REGENIE. We then plan to analyze these results with R to identify statistically significant associations between rare tandem repeat variants and disease phenotypes. Additionally, we will attempt to identify if common tandem repeat copy number variations are associated with phenotypic expression.

Anticipated Findings

Our hope is that this study will identify several novel short tandem repeat (STR) and variable number tandem repeat (VNTR) variant candidates that may be explanatory for a number of human diseases, and potentially reveal targetable genomic regions/sequences for these diseases' treatment. Additionally, we hope to demonstrate that this kind of genetic survey of common and rare tandem repeats, which are generally ignored variant types, provides key scientific and clinical insight into human genetics and disease.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Adrian Bubie - Project Personnel, Icahn School of Medicine at Mount Sinai

Physical Measurements + Cancer

Project Purpose(s)

  • Other Purpose (Data is proposed to be used for preliminary data analysis.) ...

Scientific Questions Being Studied

I am exploring the data to determine if correlations exist between physical measurements and cancer diagnosis. If so, I hope to determine how exercise may improve cancer outcomes.

Scientific Approaches

My scientific approach includes:
1) Data exploration of obesity physical measurements (waist circumference, BMI, etc.) and cancer diagnosis type (breast, colon, cervical, etc.),
2) Analysis to observe correlations,
3) Additional analysis to determine differences in treatments within group correlations.

Anticipated Findings

I hope to determine how increasing physical activity improves cancer prognosis by observing changes in physical measurements and cancer treatments over time. Considering minority populations are disproportionately afflicted by cancer, this research seeks to reduce cancer health disparities in underrepresented biomedical research populations by improving cancer outcomes and reducing disease burden while promoting physical activity.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Christina Jordan - Early Career Tenure-track Researcher, University of Mississippi Medical Center

PMEPC_health_status

Project Purpose(s)

  • Population Health
  • Educational ...

Scientific Questions Being Studied

At root, how are individual health survey responses different between volunteering adults and the larger population of the United States? The current social-medical climate calls for expanded digital citizenship from individuals. Researchers, governments, and corporations have the expectation that data resulting from expanded digital citizenship efforts will revolutionize our understanding of health[2], but procedures should be developed to first ensure these data sets represent the full spectrum of individual health, from poor through excellent. What systemic tools are best to compare the health of populations, and what are the results when All of Us volunteer data are evaluated with these tools?Also, how effective are schemes by data aggregators in targeting certain individuals. Should health status diversity be actively curated in addition to more common factors like geography, age, and race?

Scientific Approaches

First, identifying a population within the All of Us data with health survey responses, like the SF-36[3], and also a control data from the source population of said volunteers. I am most interested in self reported health, not from the perspective of care providers. Health status should include physical and mental measures. In the second phase, I will conduct hypothesis testing to identify differences, and in addition conduct a data quality assessment. With enough data, I hope to use stratification to see if any health status differences are amplified by race-ethnicity, gender, and geography. This methodological plan is achievable in my time as a social distanced student, and will structure my growth as a biostatistician. Access to clinical data via the All of Us researcher workbench will sharpen my quantitative skills, and set me up to synthesize an ethical understanding of the new generation of electronic health research platforms.

Anticipated Findings

EHR quality and diversity observations. Call to action for data collection from poor or healthy or in between health status persons.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Colby Lewis V - Graduate Trainee, Columbia University

PopHealthAnalysisUNCC

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

This workspace is used to test out didactic methods for the teaching of population health data analysis methodologies.

Scientific Approaches

Data quality assessment methods and statistical regression model development. We will assess data availability in the AllOfUs database according to existing research and didactic questions. After this, we will match potential data analysis methods with the existing data to assess the viability of these methods in the AllOfUs workbench.

Anticipated Findings

We anticipate finding information about the viability of running advanced statistical modeling in the AllOfUs workbench. We will also have benchmark information about processing speeds to inform future projects. .

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Education Level
  • Income Level

Research Team

Owner:

  • Franck Diaz-Garelli - Early Career Tenure-track Researcher, University of North Carolina, Charlotte

Postmenopausal women in AoU

Project Purpose(s)

  • Disease Focused Research (cardiovascular disease; aging; somatic mutations; menopause)
  • Population Health ...
  • Ancestry

Scientific Questions Being Studied

We are hoping to use the All of Us dataset to understand how age at menopause influences a diverse array of age-related conditions and outcomes across women of different race/ethnicity

Scientific Approaches

1. Test associations of age at menopause with a variety of chronic disease outcomes (cardiovascular disease, cancer) using survival methods (e.g., Cox proportional hazard models) overall and stratified by race/ethnicity
2. Validate a prediction model for acquisition of somatic mutations among postmenopausal women (with prediction model derived in other cohorts)

Anticipated Findings

We hope to identify novel risk factors for accelerated biologic aging in women and identify precision medicine approaches to maximize overall long-term health in postmenopausal women.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Disability Status
  • Education Level
  • Income Level

Research Team

Owner:

  • Michael Honigberg - Early Career Tenure-track Researcher, The Broad Institute

Postop Osteolysis

Project Purpose(s)

  • Disease Focused Research (Postoperative Osteolysis)
  • Educational ...
  • Ancestry

Scientific Questions Being Studied

Are there genetic predispositions to experiencing postoperative osteolysis following total joint arthroplasty? The question is important in the quest towards providing personalized medicine and to better establish risk development of a postoperative complication in the preoperative period.

Scientific Approaches

To compare the cohort of patients who underwent total joint arthroplasty without experiencing a postoperative osteolysis complication, with the cohort who underwent total joint arthroplasty and did experience a postoperative osteolysis complication.

Anticipated Findings

We anticipate that there are genetic markers which serve as predisposition to postoperative osteolysis. Literature in the past has pointed towards proteins which regulate the RANK/RANKL bone remodeling pathway. We intent to test this theory, as well as try to identify any other genetic markers which predispose towards this complication.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Khaled Saleh - Senior Researcher, FAJR Scientific

potassium_level

Project Purpose(s)

  • Disease Focused Research (PMDD, PMS, ADHD, Periodical paralysis) ...

Scientific Questions Being Studied

We find that there might be some correlations between potassium level and four common diseases: PMS, PMDD, ADHD, Periodical paralysis. We think our findings can help us better understanding the diseases and find some effective treatments.

Scientific Approaches

We will use the datasets available on All of Us to find some patterns in a population level. We will make examine the potassium level and make some correlations between four common diseases.

Anticipated Findings

We think potassium level can affect the severity of the symptoms that different people might experience. Our findings can help us better understanding the four common diseases.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Chang Liu - Undergraduate Student, University of California, San Diego

Practice Analyses

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

Interested in the intersection of human genetics and statistics - specifically interested in Alzheimer's, hormone responses and lipodema

Scientific Approaches

Interested in looking at the incidence of Alzheimer's, hormone responses, lipodema within the framework of their genetic basis.

Anticipated Findings

Hoping to connect genetic conditions with a particular causative factor and support better medical practices for all communities affected

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Romeo B Celaya - Research Fellow, University of Arizona

Practice Analyses

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

Interested in the intersection of human genetics and statistics - specifically interested in Alzheimer's, hormone responses and lipodema

Scientific Approaches

Interested in looking at the incidence of Alzheimer's, hormone responses, lipodema within the framework of their genetic basis.

Anticipated Findings

Hoping to connect genetic conditions with a particular causative factor and support better medical practices for all communities affected

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Romeo B Celaya - Research Fellow, University of Arizona

Practice workspace

Project Purpose(s)

  • Other Purpose (I am creating this workspace to explore the steps and details for using the AoU workbench. This knowledge will later be used to work on projects. ) ...

Scientific Questions Being Studied

The goal of creating this workspace is to understand the steps for analysis using the workbench. These learnings will further be used to work on research projects.

Scientific Approaches

This task will mainly focus on reviewing cohort and data from AoU participants, tools, and processes to familiarize myself with the process.

Anticipated Findings

Only exploratory findings for learning purposes. This task will be helpful in assisting other new researchers in my workgroup.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Geetika Singh - Graduate Trainee, Duke University

Practice20200722

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

Navigate and understand interface. Just learning how to use workbench.Navigate and understand interface. Just learning how to use workbench.Navigate and understand interface. Just learning how to use workbench.Navigate and understand interface. Just learning how to use workbench.Navigate and understand interface. Just learning how to use workbench.

Scientific Approaches

Navigate and understand interface. Just learning how to use workbench.Navigate and understand interface. Just learning how to use workbench.Navigate and understand interface. Just learning how to use workbench.Navigate and understand interface. Just learning how to use workbench.

Anticipated Findings

None. Navigate and understand interface. Just learning how to use workbench.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Michelle Newell - Graduate Trainee, University of Arizona

Practice20200722

Project Purpose(s)

  • Educational ...

Scientific Questions Being Studied

Navigate and understand interface. Just learning how to use workbench.Navigate and understand interface. Just learning how to use workbench.Navigate and understand interface. Just learning how to use workbench.Navigate and understand interface. Just learning how to use workbench.Navigate and understand interface. Just learning how to use workbench.

Scientific Approaches

Navigate and understand interface. Just learning how to use workbench.Navigate and understand interface. Just learning how to use workbench.Navigate and understand interface. Just learning how to use workbench.Navigate and understand interface. Just learning how to use workbench.

Anticipated Findings

None. Navigate and understand interface. Just learning how to use workbench.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Michelle Newell - Graduate Trainee, University of Arizona

Precision Health Outcomes Realized (PHOuR) Breast Project

Project Purpose(s)

  • Disease Focused Research (breast cancer)
  • Methods Development ...

Scientific Questions Being Studied

Despite advances in breast cancer screening, prevention, and treatment, over 40,000 women still die of breast cancer each year in the United States. Growing interest in risk-based screening creates an urgent mandate to determine the effectiveness of a personalized, risk-based approach to breast cancer screening. A pivotal factor for improving breast cancer risk prediction is determining the maximum predictive power that can be obtained by using more explanatory genetic variants combined with variables extracted from data inherent in electronic health records (EHR). Analytics using genetic variants and intermediate phenotypes like mammographic breast density and EHR variables have the potential to augment existing risk based models. The project is designed to harness the power of predictive modeling to enable personalized, tailored screening protocols with the ultimate goal of improving breast cancer outcomes for women.

Scientific Approaches

This project will develop and refine a new model for estimating breast cancer risk using genetic variants (single nucleotide polymorphisms-SNPs) combined with electronic health record (EHR) variables to inform polygenic risk scores (PRSs). The study will employ a standardized format (Observational Medical Outcomes Partnership), which provides a framework for translating data from disparate coding systems to a standardized vocabulary. We will extract variables from the All of Us data. The extracted variables will be used to obtain a parsimonious set of variables identified to be most strongly associated with breast cancer. We will determine the most important SNPs contributing to PRSs and develop a power calculation. We will then test the model and demonstrate proof of principle when applied to an internal/local dataset. The model’s performance will be gauged by positive predictive value, negative predictive value, sensitivity, specificity and area under the ROC curve.

Anticipated Findings

This project aims to develop advanced algorithms to contribute to personalized approaches to breast cancer screening. We anticipate the ability to stratify risk by examining variables and data points that may not be readily observable, but interact with genetics to predict future outcomes. Genome-wide association studies (GWAS) have detected multiple genetic variants associated with breast cancer risk. Typically, GWAS techniques identify straightforward statistical associations between SNPs and diseases rather than leveraging biological mechanisms or SNP interactions. Risk models using high dimensional variables, EHR data, SNPs, and intermediate phenotypes like mammographic breast density, have the potential to improve risk stratification. Implementation of these advanced models will contribute to a clinical paradigm that uses knowledge gained from analyzing genomic sequence data and/or other large scale datasets to improve breast cancer outcomes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Terry Little - Project Personnel, University of Wisconsin, Madison

Collaborators:

  • Elizabeth Burnside - Mid-career Tenured Researcher, University of Wisconsin, Madison
  • Yeonhee Park - Early Career Tenure-track Researcher, University of Wisconsin, Madison
  • Qiongshi Lu - Early Career Tenure-track Researcher, University of Wisconsin, Madison
  • Julia Carlson - Graduate Trainee, University of Wisconsin, Madison
  • Eric Mischo - Project Personnel, University of Wisconsin, Madison

Precision Health Outcomes Realized (PHOuR) Breast Project-Original

Project Purpose(s)

  • Disease Focused Research (breast cancer)
  • Methods Development ...

Scientific Questions Being Studied

Despite advances in breast cancer screening, prevention, and treatment, over 40,000 women still die of breast cancer each year in the United States. Growing interest in risk-based screening creates an urgent mandate to determine the effectiveness of a personalized, risk-based approach to breast cancer screening. A pivotal factor for improving breast cancer risk prediction is determining the maximum predictive power that can be obtained by using more explanatory genetic variants combined with variables extracted from data inherent in electronic health records (EHR). Analytics using genetic variants and intermediate phenotypes like mammographic breast density and EHR variables have the potential to augment existing risk based models. The project is designed to harness the power of predictive modeling to enable personalized, tailored screening protocols with the ultimate goal of improving breast cancer outcomes for women.

Scientific Approaches

This project will develop and refine a new model for estimating breast cancer risk using genetic variants (single nucleotide polymorphisms-SNPs) combined with electronic health record (EHR) variables to inform polygenic risk scores (PRSs). The study will employ a standardized format (Observational Medical Outcomes Partnership), which provides a framework for translating data from disparate coding systems to a standardized vocabulary. We will extract variables from the All of Us data. The extracted variables will be used to obtain a parsimonious set of variables identified to be most strongly associated with breast cancer. We will determine the most important SNPs contributing to PRSs and develop a power calculation. We will then test the model and demonstrate proof of principle when applied to an internal/local dataset. The model’s performance will be gauged by positive predictive value, negative predictive value, sensitivity, specificity and area under the ROC curve.

Anticipated Findings

This project aims to develop advanced algorithms to contribute to personalized approaches to breast cancer screening. We anticipate the ability to stratify risk by examining variables and data points that may not be readily observable, but interact with genetics to predict future outcomes. Genome-wide association studies (GWAS) have detected multiple genetic variants associated with breast cancer risk. Typically, GWAS techniques identify straightforward statistical associations between SNPs and diseases rather than leveraging biological mechanisms or SNP interactions. Risk models using high dimensional variables, EHR data, SNPs, and intermediate phenotypes like mammographic breast density, have the potential to improve risk stratification. Implementation of these advanced models will contribute to a clinical paradigm that uses knowledge gained from analyzing genomic sequence data and/or other large scale datasets to improve breast cancer outcomes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Terry Little - Project Personnel, University of Wisconsin, Madison

Collaborators:

  • Elizabeth Burnside - Mid-career Tenured Researcher, University of Wisconsin, Madison
  • Yeonhee Park - Early Career Tenure-track Researcher, University of Wisconsin, Madison
  • Qiongshi Lu - Early Career Tenure-track Researcher, University of Wisconsin, Madison
  • Julia Carlson - Graduate Trainee, University of Wisconsin, Madison
  • Eric Mischo - Project Personnel, University of Wisconsin, Madison

PrecisionNutrition

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

This is an exploratory project, where we will be looking at how the data is interconnected and how well we can link the genetic variation and nutrition-related measures to health risks. In particular we are interested in the challenges that will need addressed to link basic sciences research such as biochemical mechanisms and pathways all the way to the foods we eat. Even without proper models to describe cooking, digestion, and other metabolic and physiological processes, it is difficult to describe even the basic associations for how the nutrients in e.g. a single blueberry end up being distributed throughout the body.

Scientific Approaches

We will develop techniques to build network structures out of the All of Us data, and import existing resources such as Chemical databases and ontologies (PubChem and ChEBI), along with Pathway and Reaction resources (Gene Ontology, Reactome, KEGG, etc), and build query tools that can use both SQL and graph query language to interrogate the data. We will apply common Graph Theory algorithms from Mathematics and Computer Science, such as Shortest Paths or Max Flow to estimate the scale of metabolic processes.

Anticipated Findings

This work will identify the missing or ill-defined components of a "path" between nutrients in foods and disease variants, and more specifically where the creation of new identifiers, ontologies, and other resources may aid future Precision Nutrition projects on the All of Us data. With this information, we can link genetic variants to nutrition response, suggesting dietary modifications that may reduce risk of disease.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jeremy Jay - Other, University of North Carolina, Charlotte

Predicting Antimicrobial Resistance

Project Purpose(s)

  • Methods Development ...

Scientific Questions Being Studied

Given the prolific use of beta-lactams within the inpatient setting, it is worth investigating how clinical use of beta-lactams can predict the development of resistance. If a model is able to highlight and depict which groups, individuals, or clinical practices lead to a high risk of developing beta-lactam resistance, such a model would be an invaluable tool to any healthcare institution.

Scientific Approaches

In-patient EHR data from patients prescribed beta-lactams will be aggregated and analyzed. Research methods include descriptive statistics, data cleaning, feature engineering, and machine learning methods to predict AMR from different factors including but not limited to the following: 1) broad-spectrum antibiotics prescribed before culture, 2) identity of the beta-lactam antibiotic used, 3) patient demographic.

Anticipated Findings

We project to develop an analysis pipeline and predictive models to evaluate risks of AMR, curtail antibiotic overprescription, catch potential cases of AMR development ahead of time, and reduce the cost of patient care attributed to AMR.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Ruosi Feng - Graduate Trainee, University of California, San Diego

Collaborators:

  • Zachariah Tman - Graduate Trainee, University of California, San Diego
  • Joanna Coker - Graduate Trainee, University of California, San Diego

Predictors of cognitive decline

Project Purpose(s)

  • Disease Focused Research (Cognition, depression, vascular risk factors) ...

Scientific Questions Being Studied

We would like to explore the data to examine the influence predictors of cognitive decline including depression, vascular risk factors (hypertension, diabetes, hyperlipidemia, heart disease/ stroke, etc.) and demographic factors (sex/ gender, age, socio-economic status, race/ ethnicity, etc.).

Scientific Approaches

We will use survey data, diagnostic codes, and vitals/ electronic health record data to examine depression, cognition, demographics, and vascular risk factors. We would like to describe the participant characteristics of the study sample and examine longitudinal data to determine cognition decline over time.

Anticipated Findings

We anticipate that we will be able to describe the patient population using descriptive statistics and determine predictors of cognitive decline including clinical (depression, blood pressure, diabetes, hyperlipidemia, heart disease/ stroke, etc.) and demographic risk factors (sex/ gender, age, socio-economic status, race/ ethnicity, etc.) from the participants in the All of Us dataset.

Demographic Categories of Interest

  • Age

Research Team

Owner:

  • Seema Aggarwal - Early Career Tenure-track Researcher, University of Texas Health Science Center, Houston

Predictors of Endometriosis

Project Purpose(s)

  • Disease Focused Research (endometriosis) ...

Scientific Questions Being Studied

We aim to quantify predictors of endometriosis and investigate the association between race/ethnicity, urban/rural hospital status, hospital bed size, marital status, census region, infertility, and PCOS diagnosis with the diagnosis of endometriosis.

Historically, Black and Hispanic women have lower rates of diagnosed endometriosis. We hypothesize that when barriers to accessing care are accounted for, the rate of endometriosis will be the same across racial/ethnic groups of women and this disparity in the diagnosis of endometriosis will be attenuated. We predict that rural hospital status will have a lower diagnostic rate of endometriosis when compared to urban hospital status.

Scientific Approaches

We hope to assemble a cohort of women who have had a well-woman exam in the past 3 years. Of these women, we would like to see how many of these women have a diagnosis of endometriosis and compare them to women with no diagnosis. We would like to compare these two cohorts on demographic data to assess whether rates of diagnosis differ between groups. Assembling these two cohorts of women will allow us to gather more accurate information about the true prevalence rate of endometriosis, which has thus far been difficult to quantify.

Anticipated Findings

with this dataset, there will be no disparity between racial/ethnic groups. We believe the current disparity reflected in the literature represents issues in accessing quality care. Our findings can help guide clinical practice and help address health disparities between those who are able to receive a diagnosis for endometriosis and those who are not.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Income Level

Research Team

Owner:

  • Sana Khan - Graduate Trainee, University of Arizona

Pregnancy After ACL Injury

Project Purpose(s)

  • Disease Focused Research (anterior cruciate ligament injury, osteoarthritis)
  • Population Health ...

Scientific Questions Being Studied

Do pregnancy-related outcomes differ among women with and without a history of a knee injury (specifically an anterior cruciate ligament [ACL] injury) and if so does the presence of knee osteoarthritis or other related diseases/disorders (e.g., high blood pressure, obesity) mediate this relationship?
This is exploratory at this point to help formalize a research question. Evidence suggests that people with a history of an ACL injury are less physically active and report lower quality of life than peers without a history of an ACL injury. There is limited data about whether pregnancy outcomes may differ based on injury history.

Scientific Approaches

The All of Us dataset will be used to identify females with and without a history of an ACL injury. The two groups will be matched on age and other possible factors that may influence injury risk and pregnancy outcomes.

Anticipated Findings

This exploratory analysis will help inform future studies to better understand if pregnancy-outcomes differ between females with or without a history of ACL injury. If so, this project will provide preliminary evidence about which factors may help explain why there is a difference.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jeffrey Driban - Mid-career Tenured Researcher, Tufts Medical Center

PregnancyOutcomeRisk

Project Purpose(s)

  • Disease Focused Research (Pregnancy-related Diseases and Outcomes)
  • Drug Development ...
  • Methods Development
  • Other Purpose (Understand drug effects on pregnancy progression and outcomes, as well as child development and outcomes.)

Scientific Questions Being Studied

We are interested in exploring data related to pregnancy in order to understand the effect of drug use on pregnancy (e.g. antibiotics, corticosteroids, antidepressants, etc.) and the impact on pregnancy and pregnancy outcomes. Specifically, we are interested in risk factors that may increase risk of preterm birth and birth defects. By better understanding the risks and mechanisms of preterm birth, we better predict preterm birth risk for patients, advise changes to individual and systemic risk factors, and develop hypothesis to motivate therapeutic or preventative approaches.

Scientific Approaches

We plan to build a pregnancy cohort, with inclusion of preterm birth and complicated birth patients. We then plan to look at all factors within a timeframe around the birth, especially drug use, but also diagnosis, demographic factors, and lab values. If the data is available, we plan to link the pregnancy with the child in order to determine child development outcomes. We will then perform statistical tests to quantify extent of differences between cohorts and build models to determine covariates that affect outcome.

Anticipated Findings

Currently, there is literature suggesting that exposure to certain medications during pregnancy, such as corticosteroids and antidepressants, are associated with preterm birth. We therefore expect to see these medications enriched in preterm cohorts. Due to changes in maternal immune system during pregnancy, and the roles that immune system may play in preterm birth, we expect immune-related disorders and infections during pregnancy to modify risk of preterm birth. Lastly, there are theories that socioeconomic status and stress can also modify risk of preterm birth. We expect to better understand demographic factors in the risk of birth outcomes. Ultimately, our findings will help support or identify associations with preterm birth, help generate hypothesis for the mechanisms that lead to preterm birth, and ultimately aid in the development of therapeutic strategies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Alice Tang - Graduate Trainee, University of California, San Francisco

Prehypertension Epidemiology

Project Purpose(s)

  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use) ...

Scientific Questions Being Studied

In this demonstration project, we propose to replicate the association between race, prehypertension, and associated risk factors, using the All of Us (AoU) participant provided information as well as clinical data. Specific questions of interest include:
1. What is the prevalence of prehypertension in the AoU data?
2. How to define prehypertensive, normotensive, and hypertensive cohorts in the AoU data?
3. What is the association between race and prehypertension?

Scientific Approaches

We will use internationally-defined blood pressure ranges to characterize prehypertensive, normotensive, and hypertensive groups. We will generate summary statistics for various hypertension groupsRace will be categorized according to the definitions of the US Census Bureau. We will stratify results by race to assess the interaction between race and prehypertension. Jupyter Notebook and R will be used used to perform the analyses.

Anticipated Findings

We anticipate the prevalence of prehypertension to be associated with age, race and ethnicity, heart disease, and diabetes as reported in previous literature.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Vignesh Subbian - Early Career Tenure-track Researcher, University of Arizona

Collaborators:

  • John Ehiri
  • Baran Balkan - Project Personnel, University of Arizona

Prevalence of Autoinflammatory Disease

Project Purpose(s)

  • Disease Focused Research (Autoinflammatory Disease)
  • Population Health ...

Scientific Questions Being Studied

Diagnosis of a rare disease typically occurs at the specialist level, and therefore may poorly represent historically underrepresented populations. Here I hope to explore the prevalence of rare autoinflammatory diagnoses (e.g. FMF, FCAS, DADA2, Bechet's Disease) in such populations, for comparison to an existing cohort at the NIH.

This will be an exploratory study to summarize disease prevalence across:
- Age
- Sex
- Race
- Access to medical care

Scientific Approaches

I will use All of Us Researcher Workbench to build a cohort of participants diagnosed with autoinflammatory disease. Seeing as this is my first workspace, I'll limit myself to basic summary statistics to compare subgroups.

Anticipated Findings

Autoinflammatory diseases are generally rare, I don't suspect I'll find many participants let alone significant trends. Nonetheless, it could lead to a more inclusive and nuanced understanding of disease presentation within the field.

Demographic Categories of Interest

  • Race / Ethnicity
  • Access to Care

Research Team

Owner:

  • Ryan Laird - Project Personnel, NIH

prevalence of surgical site infection after hernia repair

Project Purpose(s)

  • Disease Focused Research (abdominal inguinal hernia )
  • Control Set ...

Scientific Questions Being Studied

Describe demographic and clinical features of patients undergoing repair of abdominal or inguinal hernia. Examine the prevalence of postoperative surgical site infections. Outcomes may be stratified by approach (laparoscopic, robotic, etc..) if sufficient numbers are available.

Scientific Approaches

Summary statistics describing demographic and clinical features of patient cohorts extracted from AoU will be generated. These features may be compared against the National Surgical Quality Improvement Program database.

Anticipated Findings

This study will identify potential under-reporting in either database (AoU and NSQIP). Additionally, AoU data may be used as a validation tool for results derived from analysis of the NSQIP. Since the AoU data is anticipated to be more 'fine grained' than the NCDB data, we will attempt to ascertain if clinically meaningful information is lost during the NSQIP data abstraction process.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Andrew Borgert - Project Personnel, Gundersen Health System

Psoriasis and MS

Project Purpose(s)

  • Disease Focused Research (Psoriasis and MS) ...

Scientific Questions Being Studied

Psoriasis patients have previously been reported at greater risk of MS (OR~1.3). I intend to investigate this risk in greater detail, controlling for different environmental factors.

Research questions:
- Are psoriasis patients at greater risk of MS?
- Are MS patients at greater risk of psoriasis?
- What are the main environmental factors that affect this risk?

Scientific Approaches

I intend to conduct an epidemiological study using data from the All of Us initiative and applying statistical methods, including multivariate logistic regression and survival analysis.

Anticipated Findings

Understanding the shared pathophysiology of psoriasis and MS will enable more effective precision medicine and optimal disease management for both diseases.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Matthew Patrick - Research Fellow, University of Michigan

Racial and Ethnic Disparities in Infertility

Project Purpose(s)

  • Disease Focused Research (infertility) ...

Scientific Questions Being Studied

Are there racial and ethnic disparities in infertility? What drives these disparities?

Scientific Approaches

I will use regression-based approaches to examine associations while adjusting for confounders.

Anticipated Findings

Prior literature suggests that there are racial and ethnic disparities in the burden of infertility. This work will contribute to scientific knowledge as there is likely to be a large enough sample of women of racial and ethnic minorities to have adequate statistical power to examine this issue.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Courtney Lynch - Mid-career Tenured Researcher, Ohio State University

RacialEthnicDifferences_AnthropoLipidALT

Project Purpose(s)

  • Disease Focused Research (Obesity)
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy.) ...

Scientific Questions Being Studied

Obesity is one of the most important risks for many diseases in the United States and across the world. Differences in body weight and shape across gender and race/ethnicity have been extensively described. We sought to replicate these differences and evaluate newly emerging data from the All of Us Research Program (AoU). In this project, we ask the scientific question: How do individuals from different genders and different racial/ethnic groups in the All Of Us dataset differ with respect to weight, waist and hip circumferences, cholesterol levels and levels of alanine aminotransferase?

Scientific Approaches

Within each ethnic/racial group and each gender group, we first visually examine histograms of each outcome variable to determine the presence of any major outliers that may represent measurement errors. Then we tabulated the mean values and other descriptive statistics for continuous variables such as waist and hip circumferences. We also determined the proportion of individuals with abdominal obesity. To formally test for differences among groups and to adjust for age and other covariates, we will use linear regression, transforming variables to conform to assumptions of linear regression. Data for race and ethnicity was obtained from participants in participant-provided information (PPI). Biological sex at birth, height, weight, waist circumference (WC), and hip circumference measurements were obtained according to AoU baseline visit protocols. Levels of alanine aminotransferase (ALT) were obtained from the EHR records of participants.

Anticipated Findings

For this study, we anticipate that we will be able to replicate known differences in body weight and shape across gender and race/ethnicity. We anticipate that we will find racial/ethnic and gender disparities related to ALT, a surrogate marker of hepatic steatosis. We anticipate the ability to evaluate the consistency of the All of Us cohort with national averages related to obesity and indicate that this resource is likely to be a major source of scientific inquiry and discovery. This project will serve to demonstrate the quality, utility, and diversity of the All of Us data and tools and the power of gathering multiple data sources for a single set of phenotypes, providing researchers options for study design and validation.

Demographic Categories of Interest

  • Race / Ethnicity
  • Sex at Birth

Research Team

Owner:

  • Yann Klimentidis - Mid-career Tenured Researcher, University of Arizona

Collaborators:

  • Roxana Loperena Cortes - Other, All of Us Program Operational Use
  • Jason Karnes - Early Career Tenure-track Researcher, University of Arizona
  • Andrea Ramirez - Other, All of Us Program Operational Use
  • Amit Arora - Graduate Trainee, University of Arizona
  • Lina Sulieman - Other, All of Us Program Operational Use
  • Jianglin Feng - Other, University of Arizona

Repurposing medications for COVID-19

Project Purpose(s)

  • Disease Focused Research (COVID-19) ...

Scientific Questions Being Studied

We are interested in repurposing medications for treating COVID-19. The primary question is whether groups of patients taking specific medications do not get COVID-19 or if they do get COVID-19, are they hospitalized less than other groups.

Scientific Approaches

We are interested in drugs like Keytruda, which is used in cancer treatment to help boost the immune system. We will select patients who are taking Keytruda, and look to see how many of those patients have also been diagnosed or hospitalized for COVID-19. We will look to see whether patients taking Keytruda get COVID-19 or are hospitalized less than general population.

Anticipated Findings

We anticipate that patients taking immune boosting drugs like Keytruda, may be less likely to get COVID-19, or if they do get COVID-19 would need to be hospitalized less frequently. Identifying already approved drugs which could potentially be used to treat COVID-19 would be very useful.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Becky Kinkead - Other, Emory University

Research Program for Vision Surveillance: Diabetes and Diabetic Retinopathy

Project Purpose(s)

  • Control Set ...

Scientific Questions Being Studied

How do data from the All of Us database compare against known data sources that are considered to be representative of the general population and have been traditionally used in vision health surveillance activities (such as NHANES, NHIS, etc.)? How does All of Us compare to existing big-data sources such as IQVIA?

There is increasing interest in understanding how social factors impact health and vision outcomes. Social determinants of health are important considerations for disease management and prognosis, and our representative use case (diabetes and diabetic retinopathy) has huge implications for our health system as the leading cause of blindness and visual impairment among working-age adults in the United States. By answering the above questions, we can determine whether the All of Us database is representative and may be broadly generalizable for future studies.

Scientific Approaches

- Develop standard cohort definition for diabetes
- Develop standard cohort definition for diabetic retinopathy
- Determine prevalence of diabetes and compare across different data sources – All of Us, NHANES, NHIS, IQVIA
o Numerator: Number of adults with diabetes
o Denominator: Total number of adults available in data source
- Determine prevalence of diabetic retinopathy and compare across different data sources – All of Us, NHANES, NHIS, IQVIA
o Numerator: Number of adults with diabetic retinopathy
o Denominator: Total number of adults available in data source vs. total number of adults with diabetes
- For prevalence calculations, will need to establish defined study periods and ensure consistency across data sources
- Potential analyses:
o Look at state/regional variations
o Examine demographics (age, gender, race, ethnicity) of cohorts across data sources
- Identify areas of similarity/alignment vs. differences

Anticipated Findings

If we are able to demonstrate that the All of Us database is representative and aligns with existing nationwide data sources, then findings regarding links between social determinants and vision health outcomes using All of Us would be felt to be more broadly generalizable. On the other hand, if there are major discrepancies between All of Us and previously established data sources, this would be important information for the vision research community to be aware of, and this could even inform future efforts to make the database more representative.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Access to Care

Research Team

Owner:

  • Michael Paap - Graduate Trainee, University of California, San Diego

Researcher Workbench learning

Project Purpose(s)

  • Other Purpose (Learning Researcher Workbench and exploring AllOfUs data.) ...

Scientific Questions Being Studied

This workspace will be used for learning Research Workbench and exploring the data available in the AllOfUs tools.
I am planning to explore how does the prevalence of some medical conditions in AllOfUs compare to the national data that is reported in various medical publications.
Planning to explore also the availability of data for Pediatric ages in All Of Us. It will also help me understand if the data can be used for neonatal research.

Scientific Approaches

Planning on exploring the EHR data and the surveys data for learning the platform and exploring the data.

Anticipated Findings

I anticipate finding that the prevalence of various medical conditions is within the published ranges of prevalence in US.
In terms of ages for which AllOfUs data is available, I expect to find that the younger the pediatric patient, the least number of subjects will be available in AllOfUs.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Corneliu Antonescu - Mid-career Tenured Researcher, University of Arizona

Researcher Workbench learning

Project Purpose(s)

  • Other Purpose (Learning Researcher Workbench and exploring AllOfUs data.) ...

Scientific Questions Being Studied

This workspace will be used for learning Research Workbench and exploring the data available in the AllOfUs tools.
I am planning to explore how does the prevalence of some medical conditions in AllOfUs compare to the national data that is reported in various medical publications.
Planning to explore also the availability of data for Pediatric ages in All Of Us. It will also help me understand if the data can be used for neonatal research.

Scientific Approaches

Planning on exploring the EHR data and the surveys data for learning the platform and exploring the data.

Anticipated Findings

I anticipate finding that the prevalence of various medical conditions is within the published ranges of prevalence in US.
In terms of ages for which AllOfUs data is available, I expect to find that the younger the pediatric patient, the least number of subjects will be available in AllOfUs.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Corneliu Antonescu - Mid-career Tenured Researcher, University of Arizona

Retention of Participants Underrepresented in Biomedical Research

Project Purpose(s)

  • Social / Behavioral
  • Ethical, Legal, and Social Implications (ELSI) ...

Scientific Questions Being Studied

Generally, digital studies have been hindered by substantial participant drop-out (Druce et al., 2019). Indeed, participant retention has constantly been an issue of concern for remote research endeavors both traditional and digital (Pratap et al., 2020). This is especially true for populations underrepresented in biomedical research (Booker et al., 2011; Nicholson et al., 2011), which are a key part of the All of Us Research Program. Preliminary data from several sites of the All of Us Research Program revealed that income and education play a major role in completion of retention activities. Specifically, sites with a greater number of participants with income 200% below the federal poverty level and with education below the high school level (or equivalent) experience difficulty in retaining participants. Thus, the purpose of this study is to determine whether there are additional social and behavioral factors that affect retention of participants in the All of Us Research Program.

Scientific Approaches

Responses to questions from the following surveys will be analyzed: The Basics, Lifestyle, Overall Health, Healthcare Access and Utilization, and COVID-19 Participant Experience Survey (COPE). Linear mixed models will primarily be used to determine the relationship of select social and behavioral factors on retention rates.

Anticipated Findings

The anticipated findings from this study will demonstrate additional behavioral and social factors that influence retention of participants underrepresented in biomedical research in the All of Us Research Program. Findings will broaden the considerations of developing digital and/or longitudinal research programs that represent the rich demographic diversity of the United States.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Patricia Izbicki - Project Personnel, University of Miami

Revision_after_HTN_code_review

Project Purpose(s)

  • Other Purpose (This work is an AoU demo project. Demo projects are efforts by the AoU Research Program designed to meet the program goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. As an approved demo project, this work was reviewed and overseen by the AoU Research Program Science Committee and the AoU Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use. ) ...

Scientific Questions Being Studied

We are using the All of Us Researcher Workbench interface to answer the question, "Is hypertension prevalence in the All of Us Research Program similar to hypertension prevalence in the 2015–2016 National Health and Nutrition Examination Survey (NHANES) ?". Clinical approaches to understanding and treating hypertension may benefit from the integration of a precision medicine approach that integrates data on environments, social determinants of health, behaviors, and genomic factors that contribute to hypertension risk. Hypertension is a major public health concern and remains a leading risk factor for stroke and cardiovascular disease.

Scientific Approaches

In this cross-sectional, population-based study, we used All of Us baseline data from patient (age>18) provided information (PPI) surveys and electronic health record (EHR) blood pressure measurements and retrospectively examined the prevalence of hypertension in the EHR cohort using Systemized Nomenclature of Medicine (SNOMED codes and blood pressure medications recorded in the EHR. We used the EHR data (SNOMED codes on 2 distinct dates and at least one hypertension medication) as the primary definition, and then add subjects with elevated systolic or elevated diastolic blood pressure on measurements 2 and 3 from PPI. We extracted each participant’s detailed dates of SNOMED code for essential hypertension from the Researcher Workbench table ‘cb_search_all_events’. We calculated an age-standardized HTN prevalence according to the age distribution of the U.S. Census, using 3 groups (18-39, 40-59, ≥ 60).

Anticipated Findings

The prevalence of hypertension in the All of Us cohort is similar to that of published literature. All of Us age-adjusted HTN prevalence was 27.9% compared to 29.6% in National Health and Nutrition Examination Survey. The All of Us cohort is a growing source of diverse longitudinal data that can be utilized to study hypertension nationwide. The prevalence of hypertension varies in the United States (U.S.) by age, sex, and socioeconomic status. Hypertension can often be treated successfully with medication, and prevented or delayed with lifestyle modifications. Even with these established hypertension intervention and prevention strategies, the prevalence of hypertension continues to be at levels of public health concern. The diversity within All of Us may provide insight into factors relevant to hypertension prevention and treatments in a variety of social and geographic contexts and population strata in the U.S.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Guohai Zhou - Early Career Tenure-track Researcher, Massachusetts General Hospital

Rheumatoid Arthritis Analysis

Project Purpose(s)

  • Disease Focused Research (rheumatoid arthritis) ...

Scientific Questions Being Studied

Prevalence of Rheumatoid Arthritis in AoU and the phenotypes associated after conditioning for different co-morbidities, medications, gender, race, etc.

Scientific Approaches

Mainly statistical and machine learning modeling and PheWAS software for determining RA diagnosis and associated diagnoses as well as outcomes. The goal of this research is to compare findings across other datasets and especially the differences in phenotypes across different sites, as explained by the models.

Anticipated Findings

We expect that on average our models will find consistent markers for RA diagnoses across sites, however there will likely be large outliers. RA phenotypes will not likely be a surprise, since this is a commonly researched disease area.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Kyle Webb - Project Personnel, NIH

Rheumatoid Arthritis Analysis 2

Project Purpose(s)

  • Disease Focused Research (rheumatoid arthritis) ...

Scientific Questions Being Studied

Prevalence of Rheumatoid Arthritis in AoU and the phenotypes associated after conditioning for different co-morbidities, medications, gender, race, etc.

Scientific Approaches

Mainly statistical and machine learning modeling and PheWAS software for determining RA diagnosis and associated diagnoses as well as outcomes. The goal of this research is to compare findings across other datasets and especially the differences in phenotypes across different sites, as explained by the models.

Anticipated Findings

We expect that on average our models will find consistent markers for RA diagnoses across sites, however there will likely be large outliers. RA phenotypes will not likely be a surprise, since this is a commonly researched disease area.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Kyle Webb - Project Personnel, NIH

risk factors and pregnancy outcomes

Project Purpose(s)

  • Disease Focused Research (preterm birth) ...

Scientific Questions Being Studied

what risk factors can we find in this data set that are associated with preterm birth, such as smoking, drinking, other health conditions, etc.

Scientific Approaches

Create a cohort of women who have known pregnancy outcomes.
Clean up the samples with inclusion/exclusion conditions.
Apply linear regression models to identify risk factors associated with pregnancy outcomes, such as gestational duration and birth weight.

Anticipated Findings

There are many epidemiology studies to study risk factors for pregnancy outcomes. This study, with a very limited number of samples, serves as a test case for utilizing EMR data for this type of epi study.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Research Team

Owner:

  • Jing Chen - Senior Researcher, Cincinnati Children's Hospital Medical Center

Risk Prediction Models Across Common Complex Diseases

Project Purpose(s)

  • Population Health
  • Social / Behavioral ...
  • Methods Development
  • Ancestry

Scientific Questions Being Studied

The aim of this research is to develop and validate models for predicting risks of common complex diseases, like cancers, heart disease and type-2 diabetes, and evaluate potential utility of such models for developing strategies for risk-based approach to disease prevention through lifestyle modification, screening and medication. We will leverage the large size and diversity of All of US study to develop and validate comprehensive multi-ethnic models that will incorporate information on sociodemographic indicators, lifestyle factors, environmental exposures, family and medical history, biomarkers and whole genome genotyping and sequencing profiles of individuals. Integration of information across multiple domains of data is expected to lead to improved models for risk prediction and thus will lead to maximization of benefit and minimization harms and economic costs associated with various types of available interventions for disease prevention.

Scientific Approaches

We will be developing predictive models based on classical statistical methods as well as advanced machine learning algorithms. We will build "cohorts" based on individuals who are free of specific diseases of interest at the time to entry to All of Us. We will then link information on "baseline" variables for these individuals to prospectively collected data on disease outcomes (e.g those captured through electronic medical records). Disease-specific models will incorporate available information on corresponding well established risk factors such as age, family history, smoking, BMI and alcohol consumption. In addition, when genetic data becomes available, the model will incorporate information on emerging polygenic risk scores from genome-wide association studies. Finally, we will explore potential role of high-dimensional biomarkers, such as blood metabolites, on risk prediction beyond risk factors that are easy to ascertain and evaluated in more parsimonious models.

Anticipated Findings

Our study will lead to comprehensive multi-ethnic models for risk prediction models across a number of common chronic diseases. Using results from the study, we will further develop online risk calculators for potential clinical applications.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Research Team

Owner:

  • Nilanjan Chatterjee - Mid-career Tenured Researcher, Johns Hopkins University

RWC_first_workspace

Project Purpose(s)

  • Disease Focused Research (autoimmune diseases) ...

Scientific Questions Being Studied

I am interested in learning the formats and types of data available through AOU to formulate specific research questions related to auto-immune disease.

Scientific Approaches

The specific scientific approach is not yet determined. I will learn more about available data and disease processes and use that to formalize a scientific approach.

Anticipated Findings

The anticipated findings could potentially be previously unknown risk factors for auto-immune disease or under-appreciated treatments.

Demographic Categories of Interest

  • Race / Ethnicity

Research Team

Owner:

  • Robert Corty - Research Fellow, Vanderbilt University Medical Center