Research Projects Directory

Research Projects Directory

4,332 active projects

This information was updated 4/1/2023

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

Hereditary blood disorders

The genetic etiology of many hereditary blood disorders is not comprehensively known. Furthermore, when the genes are known, there is often variable expressivity in terms of phenotypic manifestation of the disease. We want to explore the interaction between rare disease-causing…

Scientific Questions Being Studied

The genetic etiology of many hereditary blood disorders is not comprehensively known. Furthermore, when the genes are known, there is often variable expressivity in terms of phenotypic manifestation of the disease. We want to explore the interaction between rare disease-causing genetic variants and common genetic variants governing related traits and how they impact phenotypic manifestation of the disease. This is important to gain a comprehensive understanding of the effects of genetic background on disease incidence and susceptibility. This information is valuable when making decisions about using genetic testing to diagnose diseases or for genetically informed personalized therapies.

Project Purpose(s)

  • Ancestry

Scientific Approaches

We are going to use GWAS summary statistics from blood cell traits to calculate a polygenic score for individuals for each blood cell trait. We are also going to perform RVAS analyses for blood diseases. We will then perform additional statistical tests and analyses to explore the relationship between rare variants and common variants involved in disease and blood cell traits.

Anticipated Findings

We anticipate understanding the interaction between rare disease-causing variants and common genetic variants for hereditary blood disorders.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Uma Arora - Research Fellow, Broad Institute

Social and biological determinants of leukemia disease

The main questions of this study are: (1) can we use All of US to identify risk factors for leukemia disease that are specific gender? (2) Are major risk factors for leukemia disease shared across all racial groups? These questions…

Scientific Questions Being Studied

The main questions of this study are: (1) can we use All of US to identify risk factors for leukemia disease that are specific gender? (2) Are major risk factors for leukemia disease shared across all racial groups? These questions are important to develop preventive healthcare strategies per each racial group by identifying high-risk individuals before leukemia disease is developed.

Project Purpose(s)

  • Population Health
  • Educational

Scientific Approaches

We will use the All of US dataset V6. We will identify variables that represent (1) leukemia disease; (2) all the known risk factors for each of these conditions; (3) physiological variables that either define a risk factor or are associated with risk of leukemia disease. We will use linear and logistic regression to test for association between risk factors, social determinants of health, genomic risk scores, and the conditions of interest.

Anticipated Findings

We expect to find that: (1) a substantial number of the known social and biological risk factors increase risk of leukemia disease across all evaluated groups; (2) interactions between the effect of social determinants of health and biological contribution to disease. These findings will help us detect high-risk individuals and conduct preventive healthcare through a well-timed interventions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Jieun Jang - Graduate Trainee, George Mason University

Congestive Heart Failure

Identifying the temporal sequence of events. Doing temporal and statistical analysis. Create a Causal Network for clusters of conditions in predicting our response variable; using lasso regression ;estimating parameters. Using LASSO, regress the response variable on all independent variables, and…

Scientific Questions Being Studied

Identifying the temporal sequence of events. Doing temporal and statistical analysis. Create a Causal Network for clusters of conditions in predicting our response variable; using lasso regression ;estimating parameters.

Using LASSO, regress the response variable on all independent variables, and pairwise or triple cluster of independent variables that precede the response variable.
Using LASSO, regress each variable that is a direct predictor of response/outcome variable on all preceding variables (demographics and conditions). In these regressions, statistically significant variables are parents in the Markov blanket of the regression response variable.
Draw the network using Netica.
Estimate the parameters of the network
Using the LASSO regression, calculate the predicted value for all combinations of the parents in the Markov blanket of the regression's response variables.
Enter the parameters into Netica Tables.

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Educational

Scientific Approaches

Using LASSO, regress the response variable on all independent variables, and pairwise or triple cluster of independent variables that precede the response variable.
Using LASSO, regress each variable that is a direct predictor of response/outcome variable on all preceding variables (demographics and conditions). In these regressions, statistically significant variables are parents in the Markov blanket of the regression response variable.
Draw the network using Netica.
Estimate the parameters of the network
Using the LASSO regression, calculating the predicted value for all combinations of the parents in the Markov blanket of the regression's response variables.
Enter the parameters into Netica Tables.

Anticipated Findings

Anticipated findings include:
How statistically significant the variables related to congestive heart failure are.
Which variables impact the independent variable, in what ways, and what are the outcomes of those impacts.
These findings will hone in on congestive heart failure will be a process in understanding how the specific statistical methods we use will produce the results.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Cervical Cancer Prediction Model

We will use the AllOfUs database to perform an analysis of cervical cancer. We will explore variables related to social determinants of health and outcomes in the diagnosis and/or progression of cervical cancer.

Scientific Questions Being Studied

We will use the AllOfUs database to perform an analysis of cervical cancer. We will explore variables related to social determinants of health and outcomes in the diagnosis and/or progression of cervical cancer.

Project Purpose(s)

  • Disease Focused Research (Cervical Cancer)
  • Population Health
  • Social / Behavioral
  • Educational

Scientific Approaches

We will use methods based in regression and network modeling these may include propensity score matching, LASSO regression, temporal analysis, and networks of regressions.

Anticipated Findings

These findings may support diagnosis and treatment decisions for cervical cancer.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

association between Alzheimer's and infection

There is ample evidence to suggest that some link may exist between chronic inflammation of the brain and development of Alzheimer's disease later in life. A potential trigger for such inflammation is an infectious pathogen, and understanding the potential links…

Scientific Questions Being Studied

There is ample evidence to suggest that some link may exist between chronic inflammation of the brain and development of Alzheimer's disease later in life. A potential trigger for such inflammation is an infectious pathogen, and understanding the potential links between infection and Alzheimer's will enable prevention of disease before symptoms even begin and a better understanding of how to treat the disease once it has already developed. My dataset explores the relative risk of developing Alzheimer's disease in individuals that have experienced a viral infection compared to those who have not.

Project Purpose(s)

  • Disease Focused Research (Alzheimer's Disease)
  • Population Health
  • Educational

Scientific Approaches

This is preliminary exploration of the data, not an exhaustive study, so I will make some basic plots of the dataset I generate as I start planning a broader study.

Anticipated Findings

We expect to see that individuals that have experienced an infection of the brain or nervous system are more likely to develop AD than those who have not. This would shed greater light on how AD develops in the first place and how we could prevent it or lessen its impact using existing treatments or vaccinations for pathogens that may be linked to AD.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

CHD in AA

We want to establish the genetic architecture of coronary artery disease in populations of African ancestry. Hundreds of genetic loci associated with coronary artery disease in populations of European ancestry and populations of Eastern Asian ancestry have been identified; only…

Scientific Questions Being Studied

We want to establish the genetic architecture of coronary artery disease in populations of African ancestry. Hundreds of genetic loci associated with coronary artery disease in populations of European ancestry and populations of Eastern Asian ancestry have been identified; only a few loci were identified so far in populations of African ancestry, mainly due to the small sample sizes. This will create the disparity of transforming genetic information into the control and prevention of coronary artery disease across populations.

Project Purpose(s)

  • Disease Focused Research (coronary artery disease)
  • Ancestry

Scientific Approaches

We will perform genome-wide association study of coronary artery disease in populations of African ancestry using the whole-genome sequencing data generated from the allofus project.

Anticipated Findings

We will identify loci associated with coronary artery disease in populations of African ancestry, especially these African ancestry – specific variants. The knowledge gained from our study will help with the control and prevention of coronary artery disease in populations of African ancestry.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

  • Yingchang Lu - Early Career Tenure-track Researcher, Vanderbilt University Medical Center

Dilated Cardiomyopathy Genomic Data

Our group is exploring the contribution of known genetic variants to the presence of dilated cardiomyopathy in the All of Us database. Previous groups have described the frequency of these pathologic variants in referral populations of European descent, primarily through…

Scientific Questions Being Studied

Our group is exploring the contribution of known genetic variants to the presence of dilated cardiomyopathy in the All of Us database. Previous groups have described the frequency of these pathologic variants in referral populations of European descent, primarily through the UK Biobank. We are the first group to explore if the frequency of these variants is different in more diverse populations as captured in the All of Us database.

Project Purpose(s)

  • Disease Focused Research (dilated cardiomyopathy)
  • Ancestry

Scientific Approaches

We will use the All of Us database to collect genomic data for patients within the All of Us database that have dilated cardiomyopathy. We will identify patients within AoU with dilated cardiomyopathy using a previously published algorithm that makes use of ICD codes, medication information, and cardiac imaging data (TTE, CMR). We have 19 genes of interest that have previously been identified as having Moderate to Definite association with dilated cardiomyopathy. We will also collect data on heart failure risk factors (hypertension, atrial fibrillation, etc) within this population in AoU.

Anticipated Findings

No group has previously described the frequency of causal variants outside of patients with European ancestry referred for genetic testing. Our group will contribute to scientific knowledge by enhancing our understanding of the contributions of these causal variants in a non-european population. Our anticipated finding is that the frequency of dilated cardiomyopathy cases in a non-referral and non-european population attributable to known causal variants will be less than reported in european, referral populations. This finding would then support our group's project identifying novel disease causing variants through additional analysis within the AoU database. The results of both of these projects would better inform the utility of genetic testing in patients with dilated cardiomyopathy in the US as well help with the interpretation of identified genetic variants identified in patients that undergo genetic testing without dilated cardiomyopathy.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Mark Sonderman - Early Career Tenure-track Researcher, Vanderbilt University Medical Center
  • Brandon Lowery - Other, Vanderbilt University Medical Center

Fellowship project

I am exploring the data at this stage to formalize a specific research question. Potential research questions may be determined based on what data are available. Potential research questions include exploring what are the outcomes of participants who receive pharmacogenomic…

Scientific Questions Being Studied

I am exploring the data at this stage to formalize a specific research question. Potential research questions may be determined based on what data are available. Potential research questions include exploring what are the outcomes of participants who receive pharmacogenomic results related to their medical care.

Project Purpose(s)

  • Population Health
  • Ancestry

Scientific Approaches

Scientific approaches will depend on the specific research question and will be updated at a later date.

Anticipated Findings

Anticipated findings will depend on the specific research question and will be updated at a later date.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

PAH genetic predisposition

Pulmonary arterial hypertension is a rare disease (~2-5 deaths per million people annually in the US). There are infrequent but known genetically mediated/familial forms of PAH with what is believed to be moderately low penetrance of ~20% (BMPR2, ACVR1, AQ1,…

Scientific Questions Being Studied

Pulmonary arterial hypertension is a rare disease (~2-5 deaths per million people annually in the US).

There are infrequent but known genetically mediated/familial forms of PAH with what is believed to be moderately low penetrance of ~20% (BMPR2, ACVR1, AQ1, ATP13A3, CAV1, EIF2AK4, ENG, GDF2, KCNK3, SMAD9, SOX17, TBX4, BMPR1B).

The true frequency of these mutations in the population is not known. The intent of the current research is better understand the population frequency of mutations in genes known to be associated with pulmonary arterial hypertension. This will help us to better understand importance concepts related to the penetrance of disease and begin to understand whether or not there are genetically mediated mild phenotypes of disease that are currently unrecognized.

Project Purpose(s)

  • Disease Focused Research (Pulmonary arterial hypertension)

Scientific Approaches

The WGS dataset will be used.

Population frequency of mutations in genes that are known to be associated with pulmonary arterial hypertension will be calculated including: BMPR2, ACVR1, AQ1, ATP13A3, CAV1, EIF2AK4, ENG, GDF2, KCNK3, SMAD9, SOX17, TBX4, BMPR1B.

Anticipated Findings

Understanding the population frequency of mutations in genes known to be associated with pulmonary arterial hypertension will help contextualize (but not directly inform) the penetrance of disease in these mutations and may ultimately inform a population for study with mild phenotypes or early disease.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Peter Leary - Mid-career Tenured Researcher, University of Washington

Fitbit_DST_Sleep

1. How does Day-light saving time affect people's activity level comparing before and after daylight-saving time zone changes? 2. How long does it take for the fitbit population to recover from daylight-saving time zone changes?

Scientific Questions Being Studied

1. How does Day-light saving time affect people's activity level comparing before and after daylight-saving time zone changes?
2. How long does it take for the fitbit population to recover from daylight-saving time zone changes?

Project Purpose(s)

  • Population Health
  • Social / Behavioral

Scientific Approaches

Descriptive statistics (mean, standard deviation, and frequency) will be used to summarize the demographic and activity data. A paired t-test will be used to compare the mean activity levels before and after DST changes. Subgroup analyses will be conducted based on gender, age, and occupation.

An average value of decay will be modeled to find out how long in general a population would take to go back to a normal circadian rhythm.

Anticipated Findings

The anticipated findings are that workforce are wasted during to daylight saving time and health is affected because of reduced or irregular activity, as well as the fact that it could take some people more time than desired to adjust to daylight saving time changes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Ke Wang - Graduate Trainee, Duke University

William O - Equity in Precision Health

I am creating a workspace as part as of the Equity and Precision health Spring 2023 Quarter course at the University of Chicago. I am a masters student in the Precision health program. I am learning to use the workbench.…

Scientific Questions Being Studied

I am creating a workspace as part as of the Equity and Precision health Spring 2023 Quarter course at the University of Chicago. I am a masters student in the Precision health program. I am learning to use the workbench. I plan to update this description as I frame my research question.

Project Purpose(s)

  • Educational

Scientific Approaches

I am going to explore a question of equity in precision health using all available data in the workbench. I will refine my data and analytical requirements as I frame my research question.

Anticipated Findings

I am unclear what to anticipate. This is mainly a class exercise to understand how to use the workbench.

Demographic Categories of Interest

  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

Diabetes - HAP 823

The goal of this project is to investigate diabetes in the adult population. We plan on identifying a temporal sequence of events and creating a causal network.

Scientific Questions Being Studied

The goal of this project is to investigate diabetes in the adult population. We plan on identifying a temporal sequence of events and creating a causal network.

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Methods Development

Scientific Approaches

The temporal sequence of events will be identified by counting for each pair of the condition (diabetes), the number of times it occurs before another in the same person. We will use the pairwise count of one condition occurring before another to establish the sequence of occurrence of conditions.

A causal network will be built in order to better understand and predict the response variable, diabetes. We will use LASSO regression to regress the response variable on all independent variables, and pairwise or triple clusters of independent variables that precede the response variable. We will regress each variable that is a direct predictor of response/outcome variable on all preceding variables (demographic and conditions).

Using Netica we will map the network. We will estimate the parameters and calculate the predicted vale for all combinations of the parents in the Markov blanket of the response variable with the use of LASSO regression.

Anticipated Findings

The results from this project may help to map out the network of relationships surrounding diabetes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Sean Grimm - Graduate Trainee, George Mason University

test

Interested in investigating heart disease as related to EHR variables and genetics.

Scientific Questions Being Studied

Interested in investigating heart disease as related to EHR variables and genetics.

Project Purpose(s)

  • Disease Focused Research (heart disease)

Scientific Approaches

Phenotyping, statistics, data mining, gwas.

Anticipated Findings

Advance knowledge of heart disease in scientific literature.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

Relationships between Risky Drinking & Medical Conditions

Our group previously identified that females with AUD and females engaging in heavy or extreme binge drinking were more likely to self-report cancers and other medical conditions compared to their male counterparts. This analysis aims to extend our previous findings…

Scientific Questions Being Studied

Our group previously identified that females with AUD and females engaging in heavy or extreme binge drinking were more likely to self-report cancers and other medical conditions compared to their male counterparts. This analysis aims to extend our previous findings to examine relationships between sex and consumption of alcohol and the presence of EHR-confirmed health conditions.

Project Purpose(s)

  • Disease Focused Research (Alcohol Use Disorder, Liver, Cardiovascular, Cancer, Pain, Respiratory)

Scientific Approaches

We want to use All of Us to examine EHR and survey data on risky drinking/AUD and EHR-confirmed medical conditions. We will evaluate associations between sex (female vs. male) and alcohol (yes vs. no AUD; yes vs. no risky drinking) on ongoing or new EHR-confirmed liver, cardiovascular, cancer, respiratory, pain or other medical conditions. We will use binary logistic regression for statistical analysis.

Anticipated Findings

Our group previously identified that females with AUD and females engaging in heavy or extreme binge drinking were more likely to self-report cancers and other medical conditions compared to their male counterparts. However, these were self-reported medical conditions and we could not verify the presence or absence of medical conditions against EHR data. We hope to extend our findings using All of Us to examine relationships between sex and consumption of alcohol and the presence of EHR-confirmed health conditions. We anticipate that results will be consistent with and validate our previous findings that AUD status and risky drinking be considered in the clinical care of individuals with poorer health, especially in women.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

AOU_PAD_forLeland

We would like to conduct Genome wide association studies (GWAS) on PAD case and control data. Individual PAD GWAS will be meta analyzed to identify important loci associated with PAD. We will conduct GWAS using Regenie an meta analyze the…

Scientific Questions Being Studied

We would like to conduct Genome wide association studies (GWAS) on PAD case and control data. Individual PAD GWAS will be meta analyzed to identify important loci associated with PAD. We will conduct GWAS using Regenie an meta analyze the results using Metal.

Project Purpose(s)

  • Disease Focused Research (peripheral artery disease)
  • Ancestry

Scientific Approaches

We will curate PAD case and controls and we will conduct GWAS using Regenie an meta analyze the results using Metal.

Anticipated Findings

We aim to meta-analyse the data with other cohorts and plan to identify new loci associated with PAD

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Leland Hull - Early Career Tenure-track Researcher, Mass General Brigham

Group 8: PTC Mutations and Haplotypes Distribution

The question we intend to study is the ethical distribution of the TAS2R38 gene and whether this has a trend nationally. This is important to understand how the bitter taste gene is distributed throughout the population and whether there’s a…

Scientific Questions Being Studied

The question we intend to study is the ethical distribution of the TAS2R38 gene and whether this has a trend nationally. This is important to understand how the bitter taste gene is distributed throughout the population and whether there’s a difference based on ethnicity, thus allowing better understanding.

Project Purpose(s)

  • Ancestry

Scientific Approaches

In our project, we will be using datasets provided by All of Us to study a variety of ethnicities and their haplotypes of PTC mutations across the United States of America. Through the use of these datasets and programming techniques supported by R/R-Studio, a data analysis will be conducted using the data sets from All of Us to create visuals and statistics to derive a conclusion about an ethnic group's frequency of PTC mutations and the unique characteristic of a community.

Anticipated Findings

The study’s expected outcomes are to find the possible link between various ethnographic locations in the US and specific PTC mutations and to map out existing data on PTC mutation ethnographically to see whether trends emerge in different groups. Suppose the finding supports the experiment’s premise. In that case. It might imply that particular groups have a high incidence of PTC mutations and that these mutations may have health repercussions beyond the capacity to detect bitterness. The findings would contribute to the body of scientific knowledge by developing a better understanding of the relationship between the occurrence of PTC mutations in different populations, particularly in the genetics and health fields. Ex. Personalized health.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

  • Rya Kiernan - Undergraduate Student, Arizona State University
  • Emily Druce - Undergraduate Student, Arizona State University

Collaborators:

  • Khailene Amisone - Undergraduate Student, Arizona State University
  • Gerardo Rodriguez - Undergraduate Student, Arizona State University

Duplicate of Data Wrangling in All of Us Program (v6)

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Scientific Questions Being Studied

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Project Purpose(s)

  • Educational
  • Other Purpose (For use with Office hours. notebooks for adding code snippets useful for researchers. This is a placeholder for creating notebooks for best practices among other things)

Scientific Approaches

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Anticipated Findings

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Spencer Boris - Undergraduate Student, Brigham Young University

CHD Polygenic Risk Score & statin effectiveness on ASCVD (control tierV6)

The primary objective is to determine how statin effectiveness is modified by CHD polygenic risk score in a real-world cohort of primary prevention participants. We will investigate coronary heart disease polygenic risk scores for statin effectiveness of atherosclerotic cardiovascular disease…

Scientific Questions Being Studied

The primary objective is to determine how statin effectiveness is modified by CHD polygenic risk score in a real-world cohort of primary prevention participants.
We will investigate coronary heart disease polygenic risk scores for statin effectiveness of atherosclerotic cardiovascular disease in stratified race/ethnicity and age groups so that we can (1) study this relationship in subsets of the population that are traditionally excluded from statin randomized controlled trials and (2) determine the impact of social determinants of health which vary across these stratified groups.

Project Purpose(s)

  • Disease Focused Research (arteriosclerotic cardiovascular disease)
  • Drug Development
  • Ancestry

Scientific Approaches

Health record data, lab results, prescription dispensing record would be used to define phenotype. Different statistical methods (like logistic regression, survival analysis etc.) would be used to analyze the data.

We shall calculate CHD polygenic risk scores of participants in the cohort using genotype data. Covariate-adjusted Cox regression models will be used to compare the risk of cardiovascular outcomes between statin users and matched nonusers.

We are mainly going to use R for the analysis.

Anticipated Findings

We anticipate that for primary prevention patients undergoing routine care, CHD polygenic risk modifies statin relative risk reduction of incident myocardial infarction independent of statin LDL-C lowering. Our findings will replicate our prior work which identified a subset of patients with attenuated clinical benefit from statins.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

CYP2C19 Clopidogrel Response

Will use controlled tier data to identify association between CYP2C19 variants, clopidogrel usage, and rate of cardiac/clotting events post-MI/PCI.

Scientific Questions Being Studied

Will use controlled tier data to identify association between CYP2C19 variants, clopidogrel usage, and rate of cardiac/clotting events post-MI/PCI.

Project Purpose(s)

  • Disease Focused Research (Post-MI/PCI Clopidogrel interaction with CYP2C19 variants)

Scientific Approaches

WGS sequencing to identify participants with CYP2C19 variants;
EHR data to identify participants with hx MI/PCI/stenting;
Kaplan meier survival analysis
PheWAS

Anticipated Findings

Anticipate identifying how CYP2C19 variants interact with clopidogrel response post-MI/PCI with hopes of further clarifying role of CYP2C19 variants in drug processing and risk post-MI/stent

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Anav Babbar - Other, National Human Genome Research Institute (NIH-NHGRI)

Solid tumor Research

Cancer is the second most common cause of death in the United States. More than 600,000 Americans are expected to die of cancer in 2020. About 58% of these deaths could be prevented if cancer is detected at an early…

Scientific Questions Being Studied

Cancer is the second most common cause of death in the United States. More than 600,000 Americans are expected to die of cancer in 2020. About 58% of these deaths could be prevented if cancer is detected at an early stage. At present population based screening tests are in place for early diagnosis of cancer. It is in part responsible for an overall reduction in cancer rates by 25% from 1990 to 2005. However population based screening still remains to be an imperfect method. In many parts of the world, participation of subjects in screening programs are at a lower level than desired. So additional methods for early cancer detection need to be employed. Machine learning algorithms have shown to help improve early detection of cancer. Our research effort aims to identify, evaluate and validate machine learning algorithms to predict the incidence, prognosis and complications of cancer, so as to create a more proactive approach to the management of cancer.

Project Purpose(s)

  • Disease Focused Research (Solid tumors)
  • Population Health

Scientific Approaches

The purpose of this study is to utilize machine learning algorithms to predict the incidence, prognosis and complications of solid tumors in adults.
Aims:
- to create an alternative and efficient screening tool for cancer detection
-to diagnose solid tumors at an early stage
-to reduce cancer morbidity and mortality
-to utilize lab values and basic patient information to detect people at risk for cancer
-to reduce healthcare costs, in the long term

Anticipated Findings

As a result of this study we anticipate detection of cancer at an early stage and thereby reduce the morbidity and mortality associated with it. All significant findings will be published in a high-impact journal and presented at academic conferences. The results of this study may give way to a new screening test for cancer.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Bijun Kannadath - Early Career Tenure-track Researcher, University of Arizona

Duplicate of Schizophrenia and PTC Mutation

We intend to study the relationship between the PTC mutation and schizophrenia. Our research question is "Are people who are non-tasters more likely to develop schizophrenia?" We are specifically looking into hallucinogenics and enlarged ventricles.

Scientific Questions Being Studied

We intend to study the relationship between the PTC mutation and schizophrenia. Our research question is "Are people who are non-tasters more likely to develop schizophrenia?" We are specifically looking into hallucinogenics and enlarged ventricles.

Project Purpose(s)

  • Educational

Scientific Approaches

We plan to use data of PTC mutations in schizophrenics and the data of non-tasters and schizophrenics.

Anticipated Findings

We are not sure what the findings of the study will be. We hope that it will give a better understanding of the causes of schizophrenia.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Gabby Bucci - Undergraduate Student, Arizona State University
  • Alyssa Standage - Undergraduate Student, Arizona State University
  • Alex Gilchrist - Undergraduate Student, Arizona State University
  • Madhumanti Chowdhury - Undergraduate Student, Arizona State University

Investigation on Suicide in the COVID-19 pandemic Phase 2

Outbreak of Coronavirus Disease 2019 (COVID-19) has caused a new psychological burden. Patient Health Questionnaire (PHQ-9) can be used to evaluate mood status, monitor changes in signs/symptoms of suicide, and assess suicidal ideation. Here our study aims to describe the…

Scientific Questions Being Studied

Outbreak of Coronavirus Disease 2019 (COVID-19) has caused a new psychological burden. Patient Health Questionnaire (PHQ-9) can be used to evaluate mood status, monitor changes in signs/symptoms of suicide, and assess suicidal ideation. Here our study aims to describe the basis statistics of PHQ-9 scores and its inferred depression or suicide risk for all participants in All of US COPE survey.

Project Purpose(s)

  • Disease Focused Research (Suicidal behaviors/thoughts)

Scientific Approaches

PHQ-9 questions and answers will be retrieved for participants involved in six different time points. Response to each question will be converted to numeric scores (0, 1, 2, 3), and then summed up to derive the PHQ-9 total score. Participants missing any individual score were not included in this study. Binary status of suicidal ideation will be defined using item-9 answer (i.e., yes for >0). Distributions of PHQ-9 total score, suicidal ideation status at each time session will be reported by descriptive statistics stratified by age, sex, and ancestry. Their changes across different time sessions were tested by Kruskal-Wallis (KW) test, Friedman test, or chi-square test. Multivariable analyses are going to be conducted by generalized linear mixed models.

Anticipated Findings

We anticipate the descriptive statistics, pairwise correlations, and multivariable model fitting results will tell us the trajectories of suicidal thoughts and behaviors in the COVID-19 pandemic. They will not only help to verify the known relationship between suicide and gender or age, but also will provide new evidence of mood status changes along COVID-19 pandemic at both population and individual level.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Hongsheng Gui - Early Career Tenure-track Researcher, Henry Ford Health System

Collaborators:

  • Hsueh-Han Yeh - Research Associate, Henry Ford Health System

Discrimination and mental health

This study aims to understand the interplay between gene x environment (perceived discrimination) on mental health outcomes in diverse populations. We will evaluate sociodemographic phenotypic correlations with perceived discrimination across populations, conduct a genome-wide association and genome-wide interaction study of…

Scientific Questions Being Studied

This study aims to understand the interplay between gene x environment (perceived discrimination) on mental health outcomes in diverse populations. We will evaluate sociodemographic phenotypic correlations with perceived discrimination across populations, conduct a genome-wide association and genome-wide interaction study of mental health disorders and perceived discrimination, and integrate genomic and sociodemographic data to identify risk factors for mental health disorders in diverse populations. Underrepresented groups are widely disproportionally affected by discrimination, which can have long-lasting influences on the individual’s wellbeing and mental health. We will leverage the diversity of the All of Us cohort to focus on the study of non-European populations, which are widely underrepresented in genomics studies, to investigate the interplay between genetic risk factors and perceived discrimination in the context of mental health disorders.

Project Purpose(s)

  • Disease Focused Research (psychiatric disorders)
  • Social / Behavioral
  • Ancestry

Scientific Approaches

Datasets: All of Us participants who answered the Everyday Discrimination Scale from the COVID-19 Participant Experience (COPE) Survey and have genetic data available through the workbench.

Research Methods: genetic analysis (genome-wide association study [GWAS], polygenic risk score [PRS, phenome-wide association studies [PheWAS], PRSpheWAS, genome-wide by environment interaction studies [GWEIS]

Tools: R, Python, Hail, PHESANT package, PRSice

Anticipated Findings

A recent study in JAMA Psychiatry reported increased levels of discrimination associated with higher depressive symptoms, disproportionally affecting underrepresented Hispanic or Latino and non-Hispanic Asian groups from the All of Us cohort (PMID: 35895053). We expect to find phenotypic associations between discrimination and mental health disorders, and identify genetic factors that interact with discrimination in the context of mental health, including depression, anxiety, and substance use disorders. We expect to identify socio-demographic characteristics that are associated with the gene x discrimination interplay.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Cameron Atighetchi - Graduate Trainee, California State University, Northridge

Genomic risk prediction of opioid use disorder in diverse populations

Opioid epidemic is an on-going crisis in the United States, but better understanding of opioid use and use disorder (OUD) have to consider both biological and social factors. There is still a lack of actionable biomarkers or genomic risk scores…

Scientific Questions Being Studied

Opioid epidemic is an on-going crisis in the United States, but better understanding of opioid use and use disorder (OUD) have to consider both biological and social factors. There is still a lack of actionable biomarkers or genomic risk scores for OUD screening or treatment. Specific questions we will ask: (1) what are the most significant risk factors that associate with OUD onset in All of Us retrospective cohort? (2) How can we combine clinical/social risk factors and genetic/genomic risk factors together in the risk prediction framework for OUD prospective development across different population groups? We hypothesize that a risk model incorporating both non-genetic and genetic factors will have better power to explain and predict OUD. This study will develop and evaluate end-to-end predictive models for OUD phenotype in diverse groups, will then suggest new strategy for future OUD screening and prevention in a precision way.

Project Purpose(s)

  • Disease Focused Research (Opioid use disorder and related mental comorbidities)
  • Population Health
  • Social / Behavioral
  • Methods Development
  • Ancestry

Scientific Approaches

We will include all participants in All of Us project. First, self-reported surveys (e.g., lifestyle and medical history) and electronic health records (EHR) will be used to determine OUD cases, opioid-exposed controls, and general controls. Exposure variables will be assessed and harmonized across resources; and pre-processed for missing value and outliers. Second, OUD risk factors will then be interrogated by regularized regression (non-genetic) and genome-wide association test (genetic). Third, clinical risk score and genomic risk score will be constructed by weighted sum of significant risk factors retained, and then combined by logistic regression and random forest. Fourth, a set of candidate risk models will be validated and then optimized in prospective sample, via C statistics and other diagnostic metrics. The whole framework will also be repeated in stratified samples so as to tune the parameters for specific groups (e.g., age, sex, and race).

Anticipated Findings

Findings from this project would establish a predictive model to identify individuals at low, moderate, and high risk for developing OUD. We expect to find: (1) significant clinical/social factors and biomarkers that are associated with opioid use initiation and OUD status; (2) polygenic risk scores tailored for overall population and different subgroups; (3) risk model comprised of non-genetic risk factors and polygenic risk scores.

We also anticipate some risk factors may present different effect sizes across populations. There is also a need to revise risk model in general population to increase its performance in specific group. However, those difference observed across populations may be caused by social determinants, instead of biology; and some of them may be not well covered in All of Us project. We will mitigate the potential stigmatization risk of our models in future publications and presentations by providing more discussion on its benefit and limitation.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Hongsheng Gui - Early Career Tenure-track Researcher, Henry Ford Health System

Collaborators:

  • Ze Meng - Early Career Tenure-track Researcher, Henry Ford Health System

Aortic Stenosis Genetics

Calcific aortic valve disease (CAVD) is the most common etiology of aortic stenosis and has a high worldwide morbidity and mortality. There are no effective medical therapies to slow disease progression or reverse the disease course of CAVD. Population level…

Scientific Questions Being Studied

Calcific aortic valve disease (CAVD) is the most common etiology of aortic stenosis and has a high worldwide morbidity and mortality. There are no effective medical therapies to slow disease progression or reverse the disease course of CAVD. Population level genetic studies provide biologic insights that have enabled new therapeutic targets, such as LPA and PCSK9 for coronary artery disease. The goal of this research workspace in All of Us is to elucidate the genetic basis of CAVD towards enabling medical therapies for which none have been demonstrated to be efficacious to date.

Project Purpose(s)

  • Disease Focused Research (aortic valve stenosis)

Scientific Approaches

All of Us data will be used to 1) develop a meaningful biological definition of calcific aortic stenosis using EHR data, 2) use this EHR phenotype to perform a discovery genome wide association study for calcific aortic stenosis, and 3) perform a suite of in-silico trans-omics analyses to prioritize causal variants and causal genes.

Anticipated Findings

This project proposal extends prior work by 1) maximizing power with the inclusion of a much larger population of individuals with CAVD and using contemporary methods for causal variant and causal gene prioritization, 2) incorporates individuals from multiple genetic ancestries with stratified analysis by both genetic ancestry and sex to clarify population level differences in the biology of disease.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Buu Truong - Research Fellow, Broad Institute
1 - 25 of 4332
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.