Research Projects Directory

Research Projects Directory

11,984 active projects

This information was updated 6/23/2024

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

RT- Social Determinants of Health and Average Steps

These scientific inquiries aim to elucidate the intricate connections between social factors and individuals' levels of physical activity. The relevance of these questions lies in their potential to uncover health disparities among diverse populations based on various social determinants, such…

Scientific Questions Being Studied

These scientific inquiries aim to elucidate the intricate connections between social factors and individuals' levels of physical activity. The relevance of these questions lies in their potential to uncover health disparities among diverse populations based on various social determinants, such as income, education, and neighborhood environment. Understanding these disparities is vital for devising targeted interventions that can effectively reduce health inequities.

Project Purpose(s)

  • Methods Development

Scientific Approaches

Fitbit data with social determinants data will be merged and integrated, ensuring that each individual's physical activity records are linked with their corresponding social determinants information. We will conduct an initial descriptive analysis to understand the basic statistics and patterns in the data. Explore We will also use statistical methods to determine the correlations between social determinants and average step counts, and regression analysis will be performed to quantify the impact of social determinants on average step counts while controlling for potential confounding variables.

Anticipated Findings

Identification of Significant Correlations: The study may reveal statistically significant correlations between specific social determinants (such as income, education, neighborhood environment, or access to recreational facilities) and average daily step counts. These correlations may vary across demographic groups.
Impact on Health Disparities: It is likely that the study will find evidence of health disparities, with certain groups facing lower physical activity levels due to disadvantaged social determinants. This could include individuals from lower-income neighborhoods or with limited access to healthy food options.
Mediating and Moderating Factors: The research might identify factors that mediate or moderate the relationship between social determinants and physical activity. For example, access to parks may mediate the impact of neighborhood environment on step counts, or age may moderate the relationship between education and physical activity.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

gene_environment_interactions

Genetic epidemiology is concerned with the impact of genetics on health measures. We are interested in studying new methods of genetic stratification and measuring the differences in shared environment and what sort of health impacts there are. We are also…

Scientific Questions Being Studied

Genetic epidemiology is concerned with the impact of genetics on health measures. We are interested in studying new methods of genetic stratification and measuring the differences in shared environment and what sort of health impacts there are. We are also interested in studying relationships between collected metadata and genetic data.

Project Purpose(s)

  • Methods Development
  • Ancestry

Scientific Approaches

We will use different methods of dimensionality reduction such as principal components analysis, uniform manifold approximation and projection, as well as clustering approaches like density clustering and Louvain community detection. We will compare to other established population genetics measures like identity-by-descent sharing. With these methods, we will visualize data using scatter plots, tree plots, and other techniques. We will also make use of standard statistical practices such as regression, summary statistics, and statistical testing.

Anticipated Findings

We believe that different visualizations will find unexplored relationships between environment and genetics, such as gradients in geography or access to healthcare. We anticipate that there will also be connections between metadata and genetics, such as what type of facility was used for sequencing.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Shevaughn Holness - Graduate Trainee, Brown University

Sexual Dysfunction

To study factors contributing to erectile dysfunction

Scientific Questions Being Studied

To study factors contributing to erectile dysfunction

Project Purpose(s)

  • Disease Focused Research (Erectile Dysfunction)
  • Population Health
  • Methods Development

Scientific Approaches

Using the all of us dataset to cross reference factors influencing ED.

Anticipated Findings

That cardiovascular and other lifestyle factors influence Erectile Dysfunction

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

Capstone Project

Women with fibroid issues. This is of importance to my personal family health history.

Scientific Questions Being Studied

Women with fibroid issues. This is of importance to my personal family health history.

Project Purpose(s)

  • Educational

Scientific Approaches

Dataset, data analysis, SQL, python.

Anticipated Findings

The anticipated findings from this study are to inform viewers of the margin of minority women receiving hysterectomies opposed to their other racial counterparts.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

WCUCOMAOUGroup6

Our research addresses the barriers to diagnosing and treating Diabetic Retinopathy in rural and non-metropolitan regions. We are specifically focusing on the lack of knowledge, socioeconomic status, and other factors preventing underrepresented communities from seeking medical care, leading to worsening…

Scientific Questions Being Studied

Our research addresses the barriers to diagnosing and treating Diabetic Retinopathy in rural and non-metropolitan regions. We are specifically focusing on the lack of knowledge, socioeconomic status, and other factors preventing underrepresented communities from seeking medical care, leading to worsening diabetes complications such as blindness.

Project Purpose(s)

  • Disease Focused Research (Diabetic Retinopathy)
  • Population Health
  • Educational

Scientific Approaches

We will use the All of Us database to look at the epidemiology of Diabetic Retinopathy (DR) treatment adherence in the cohort of adult patients diagnosed with DR, since inception of the research program in 2018. We will also investigate how geographical differences are associated with various barriers to DR care. This research will directly involve underrepresented residents of established rural and non- metropolitan zip codes, based on the HRSA Federal Office of Rural Health Policy data files. Our research will require use of the Health Care Access & Utilization survey and the Social Determinants of Health survey.

Anticipated Findings

Our study will likely find that patients in rural areas face greater barriers to accessing diabetic retinopathy care, such as fewer healthcare facilities, financial and transportation challenges, and lower awareness and education levels. These findings will highlight disparities and inform targeted interventions and policies to improve access and outcomes for rural patients.

Demographic Categories of Interest

  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Niharika Dar - Graduate Trainee, William Carey University
  • Ketsia Kimbimbi - Graduate Trainee, William Carey University
  • Daniel Thompson - Graduate Trainee, William Carey University
  • Don Rubin - Graduate Trainee, William Carey University
  • Danielle Fastring - Other, William Carey University

Strabismus GWAS Project - Whitman Lab - Viola Lee

Strabismus is generally understood to be a multifactorial disease attributable to dysfunction of cranial nerves 3, 4, and 6, which have motor control of ocular muscles. Neurodevelopmental diseases have been shown to increase the risk of strabismus, but the underlying…

Scientific Questions Being Studied

Strabismus is generally understood to be a multifactorial disease attributable to dysfunction of cranial nerves 3, 4, and 6, which have motor control of ocular muscles. Neurodevelopmental diseases have been shown to increase the risk of strabismus, but the underlying mechanism of its pathophysiology remains to be elucidated. This project aims to apply bioinformatics analysis and statistical methods toward elucidating the genetic biomarkers of strabismus. We anticipate that these findings can help inform patient care and the clinical management of strabismus. Some specific goals for the project may include 1) identifying a subset of genetic loci that are associated with esotropia, exotropia, or both, 2) identifying copy number variants and epigenetic changes associated with esotropia, exotropia, or both, and 3) investigating heterogeneity in functional vision and phenotype in strabismus patients.

Project Purpose(s)

  • Disease Focused Research (strabismus)

Scientific Approaches

I hope to utilize the genome sequence data in the All of Us dataset and perform a genome-wide association study along with pathway analyses and other bioinformatics tools to achieve the goals of this study.

Anticipated Findings

While there have been some genome-wide association studies (GWAS) that have identified several gene loci, these findings had low reproducibility and small sample size. Past studies have shown some exome sequencing efforts that identified risk variants in genes FAT3, KCNH2, CELSR1, and TTYH1, suggesting their role in familial strabismus, but the biomolecular relationships between these genes and their proposed mechanism in the underlying etiology of strabismus remain poorly understood. Also, the differences in the mechanism between different subtypes of strabismus (esotropia and exotropia) are not fully delineated. I hope to fill these knowledge gaps by using bioinformatics analysis to better understand the molecular mechanisms behind strabismus onset, specifically looking at the differences in mechanisms between its subtypes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Kyoung A Lee - Graduate Trainee, Boston Children's Hospital

Collaborators:

  • Inas Aboobakar - Research Fellow, Mass General Brigham

Concussion and neurodegeneration 6.22.24

This study will investigate whether sustaining a concussion is associated with experiencing neurodegeneration later in life.

Scientific Questions Being Studied

This study will investigate whether sustaining a concussion is associated with experiencing neurodegeneration later in life.

Project Purpose(s)

  • Population Health

Scientific Approaches

The plan is to conduct a cohort study where the exposure is sustaining a concussion and the outcome is the onset of a form of neurodegeneration.

Anticipated Findings

We are not in a position anticipate whether the findings will suggest that concussion is a risk factor for developing neurodegeneration. Whatever the nature of the finding, this study will be position to make a contribution to the literature given that the study will be conducted to the extent possible with a rigorous cohort study design and attention to accounting for sources of potential bias.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Effect of Major SNPs on NASH

We are interested in investigating the effect of four major SNPs (PNPLA3-I148M, TM6SF2-E167K and GCKR-P446L and MBOAT7 rs641738) on NASH in the population. This question is relevant to science because it may help to further understand the pathophysiology of NASH…

Scientific Questions Being Studied

We are interested in investigating the effect of four major SNPs (PNPLA3-I148M, TM6SF2-E167K and GCKR-P446L and MBOAT7 rs641738) on NASH in the population. This question is relevant to science because it may help to further understand the pathophysiology of NASH and provide insight into potential therapeutics.

Project Purpose(s)

  • Disease Focused Research (Nonalcoholic fatty liver disease)
  • Ancestry

Scientific Approaches

We will quantify the “N” of, steatosis and steatosis levels subdivided by BMI level with each of four major NASH SNPs (PNPLA3-I148M, TM6SF2-E167K and GCKR-P446L and MBOAT7 rs641738). Furthermore, we may look at quantitative lab measurements in each of the cohorts.

Anticipated Findings

We anticipate to understand the population characteristics of each SNP relating to its effect on NASH. These findings will be an important orthogonal perspective to help validate in-vivo results derived from organoid models of NASH.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Ismael Assi - Graduate Trainee, Cincinnati Children's Hospital Medical Center

GWAS Cohort and Fitbit Data(Controlled)

We analyze individual-level data to examine the heterogeneous treatment effects of soda tax exposure on exercise levels, specifically among people with and without sugar addiction genes, compared to those not subject to the tax. Research concerning genetics (i.e. the study…

Scientific Questions Being Studied

We analyze individual-level data to examine the heterogeneous treatment effects of soda tax exposure on exercise levels, specifically among people with and without sugar addiction genes, compared to those not subject to the tax. Research concerning genetics (i.e. the study of genes, genetic variations, and heredity) in the context of diseases or ancestry.

Project Purpose(s)

  • Ancestry

Scientific Approaches

We want to conduct a genetic analysis(GWAS), to find the relationship, we can also create regression model, or scatter plot to find out the association.

Anticipated Findings

By organizing and analyzing this data, we aim to uncover the varied impacts of soda tax on different population segments and understand the broader implications of such policies, specifically on changing lifestyles.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Using AI to Predict Cardiovascular Disease

I want to use Artificial Intelligence to predict at what age a person is most likely to develop moderate to severe heart disease based on a variety of inputs, including their age, family history, other medical conditions, and lifestyle. Cardiovascular…

Scientific Questions Being Studied

I want to use Artificial Intelligence to predict at what age a person is most likely to develop moderate to severe heart disease based on a variety of inputs, including their age, family history, other medical conditions, and lifestyle. Cardiovascular diseases are the leading cause of death globally, based off of data from the World Health Organization (WHO). Leveraging modern-day technology in such a wide-spread and preventable disease will help people take preventative steps earlier and reduce the chances of an untimely death.

Project Purpose(s)

  • Disease Focused Research (Cardiovascular Disease)
  • Methods Development

Scientific Approaches

Our training data will be the group of people who have already been diagnosed with a form of heart disease, and at what age they were given this diagnosis. Our inputs to the model will include traditional risk factors of cardiovascular disease, such as age, height, and weight of a person, if they’ve already been diagnosed with high blood pressure, high cholesterol, and/or diabetes, family history of high cholesterol, high blood pressure and/or diabetes, their smoking and drinking habits, and their physical lifestyle. We will also explore various other inputs included within the All of Us dataset to investigate whether there are additional inputs which may be a significant indicator of developing heart disease. Modern machine learning methods such as supervised classification algorithms and supervised regression algorithms will be implemented using Python. Our output will be a potential age range of when they are likely to develop a moderate to severe diagnosis of heart disease.

Anticipated Findings

From this study, I hope to have created a model for predicting when a person may develop heart disease with an accuracy of at least 90%. This model would be a valuable contribution in utilizing artificial intelligence and machine learning in the realm of cardiovascular health. I'd also be interested in incorporating Fitbit data and seeing how that impacts performance.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Ethan Steinberg - Project Personnel, Stanford University

Test Liver SNP

Testing extracting SNP data

Scientific Questions Being Studied

Testing extracting SNP data

Project Purpose(s)

  • Educational

Scientific Approaches

Testing extracting SNP data

Anticipated Findings

Testing extracting SNP data

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Chris Hemme - Project Personnel, University of Rhode Island

Shared Lifestyles and Genetic Risk Factors of MAFLD and Diabetes

Metabolic dysfunction-associated fatty liver disease (MAFLD) and diabetes are two common types of metabolic disorder diseases. Unhealthy lifestyles and genetic mutations affecting biochemical processes are the causes of MAFLD and diabetes. The question is to see how the shared lifestyles…

Scientific Questions Being Studied

Metabolic dysfunction-associated fatty liver disease (MAFLD) and diabetes are two common types of metabolic disorder diseases. Unhealthy lifestyles and genetic mutations affecting biochemical processes are the causes of MAFLD and diabetes. The question is to see how the shared lifestyles and genetic risk factors contribute to the development of MAFLD and diabetes compared to those without the diseases.

Project Purpose(s)

  • Disease Focused Research (metabolic dysfunction-associated fatty liver disease and diabetes)

Scientific Approaches

Depending on the data available, I plan to compare the lifestyles (including sedentary time, physical activity, high-fat diet, etc.) and genetics of those with MAFLD and diabetes to those with one type of MAFLD and diabetes or those without the diseases to identify the potentially associated lifestyle risk factors and genes. I will use data from All of Us only and perform statistical analysis on the data.

Anticipated Findings

We anticipate that lack of exercise and high-fat diet may be the common shared risk factors of MAFLD and diabetes. Through our analysis, we expect to receive recommended physical activity time and proper fat diet to decrease the incidence of MAFLD and diabetes. Meanwhile, I hope this research can identify the possible shared genes those contribute to the development of MAFLD and diabetes, which can help us prevent or treat MAFLD and diabetes precisely.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Guangqin Xiao - Research Fellow, Harvard T. H. Chan School of Public Health

Reproductive and Metabolic Subtypes of Polycystic Ovary Syndrome (PCOS)

Polycycstic Ovary Syndrome (PCOS) is the most common disorder of reproductive-age women, affecting up to 15% of this population worldwide. It is a leading cause of anovulatory infertility, obesity and type 2 diabetes. The cause of PCOS remains unknown so…

Scientific Questions Being Studied

Polycycstic Ovary Syndrome (PCOS) is the most common disorder of reproductive-age women, affecting up to 15% of this population worldwide. It is a leading cause of anovulatory infertility, obesity and type 2 diabetes. The cause of PCOS remains unknown so diagnoses are based on expert opinion rather than on knowledge of disease mechanisms. We have recently stratified our cohort of women with PCOS into two subtypes (named reproductive and metabolic) and have reproduced these subtypes within other cohorts. PCOS is a highly heritable complex genetic disorder. Using genomewide association studies, we have identified candidate genes that are specific to the PCOS subtypes and have replicated these findings. Genomic specificity points to the possibility that PCOS risk is through either reproductive or metabolic mechanisms. This project will use the All of Us cohort to assess genetic variants in reproductive and metabolic subtypes of PCOS to attempt to identify possible disease mechanisms.

Project Purpose(s)

  • Disease Focused Research (polycystic ovary syndrome)
  • Control Set
  • Ancestry

Scientific Approaches

Individuals with and without PCOS as determined by survey data and health records will be compared in analyses. Hierarchical clustering and prediction modeling will be performed in order to classify PCOS women into subtypes. Genomic data will be used to investigate PCOS GWAS candidate gene regions and evaluate new and previously identified genetic variants within the subtypes. Findings will be followed up with functional assessments.
• Burden tests of previously identified regions of interest for each subtype and together
• GWAS of other pcos features in the subtypes (amh, CVD) (subtype-specific phewas)
• GWAS with case-control

Anticipated Findings

We anticipate that we will find genomic variants that are relevant to and help to explain causal pathways of PCOS.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Kelly Brewer - Project Personnel, Icahn School of Medicine at Mount Sinai

lung cancer

Lung cancer remains the leading cause of cancer-related mortality globally. Epidemiological data reveal profound disparities in lung cancer outcomes across different racial and ethnic populations. Socioeconomic factors exacerbate this disparity. Certain populations exhibit elevated LC burdens but have been underrepresented…

Scientific Questions Being Studied

Lung cancer remains the leading cause of cancer-related mortality globally. Epidemiological data reveal profound disparities in lung cancer outcomes across different racial and ethnic populations. Socioeconomic factors exacerbate this disparity. Certain populations exhibit elevated LC burdens but have been underrepresented in lung cancer research. Existing lung cancer risk prediction models primarily rely on clinical and genetic data, often overlooking socioeconomic and environmental contexts that disproportionately affect these populations. Consequently, the models exhibit inherent biases. Thus, we propose to develop a robust lung cancer risk prediction deep learning model across multiethnic populations. We hypothesize that the existing risk models do not appropriately predict the lung cancer risk for minority groups. Utilize deep machine learning to integrate diverse datasets to develop an unbiased and comprehensive lung cancer risk prediction model across multiethnic populations.

Project Purpose(s)

  • Disease Focused Research (lung carcinoma)
  • Population Health
  • Ancestry

Scientific Approaches

We will conduct a longitudinal study involving participants from the All of US. Initially, we will assess the performance of existing lung cancer prediction models, specifically analyzing their efficacy across race/ethnicity. Subsequently, we will utilize widely recognized machine learning techniques—including Multi-Layer Perceptrons, Recurrent Neural Networks, Hybrid Models—to develop lung cancer risk prediction models. These models will incorporate a comprehensive set of predictors encompassing demographic characteristics, lifestyle and health information, laboratory data, environmental factors, anthropometric data, and genomic variants to enhance prediction accuracy. First, we will conduct the data processing. A thorough literature review will identify key predictors, which the XGBoost algorithm will statistically evaluate and rank for importance before guiding feature selection. Model performance will be evaluated and validated through cross-validation.

Anticipated Findings

The study anticipates refining lung cancer prediction by evaluating existing models, focusing on their performance across different ethnicities, potentially uncovering biases. By employing advanced machine learning techniques and integrating comprehensive predictors from demographics to genomic variants, the study aims to develop models with enhanced accuracy and generalizability. This could lead to improvements in early detection and personalized intervention strategies for lung cancer, particularly benefiting diverse populations. Contributions to scientific knowledge include methodological advancements in predictive modeling. The integration of a broad spectrum of predictors and the use of deep machine learning techniques may set new standards for lung cancer prediction and foster interdisciplinary research. These developments could have far-reaching implications for medical research, policy-making, and clinical practices, driving innovations in disease prevention and management.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Ting Gong - Research Fellow, University of Hawaii at Manoa

Menopause, APOE and dementia

My question: Is premature menopause and its interaction with APOE-ε4 carrier status associated with risk of dementia in Black and Hispanic women? Why this is important: Earlier menopause is associated with increased dementia risk. APOE-ε4, a genetic risk factor for…

Scientific Questions Being Studied

My question:
Is premature menopause and its interaction with APOE-ε4 carrier status associated with risk of dementia in Black and Hispanic women?

Why this is important:
Earlier menopause is associated with increased dementia risk. APOE-ε4, a genetic risk factor for dementia, influences this relationship; people who are APOE-ε4 carriers have a higher dementia risk compared to non-carriers. However, most of this evidence comes from studies with White populations. This is problematic, because we know there are racial and ethnic differences in menopause onset, APOE-ε4, and dementia. Black women reach menopause earlier than White women, and are twice as likely to develop dementia. Hispanic women are 1.5 times more likely than White women to develop dementia. The influence of APOE-ε4 on dementia is also stronger in White populations. Given these uncertainties, it is important to examine the relationship between early menopause, APOE-ε4, and dementia in non-White populations.

Project Purpose(s)

  • Disease Focused Research (dementia)

Scientific Approaches

We plan to use longitudinal datasets of women who experienced early menopause and have been characterized for APOE-ε4 carrier status, with longitudinal cognitive outcomes. We will use Cox regression models to test associations between early menopause and dementia outcomes, and the interaction of APOE-ε4. Models will be adjusted for potential confounders such as age, education, age at menarche, and current or past hormone therapy use. Data will be analyzed using R.

Anticipated Findings

Anticipated findings:
We aim to show whether premature menopause is associated with risk of cognitive impairment or dementia in Black and Hispanic women, and if this association is influenced by APOE-ε4 carrier status.

Contribution to the field:
Results will add critically needed evidence regarding the relationship between early menopause, APOE-ε4 and dementia in non-White populations. We hope that our findings will help to improve health equity in dementia research and contribute to future risk assessment and prevention strategies (e.g., developing interventions for Black or Hispanic women who experience early menopause and are APOE-ε4 carriers).

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Registered Tier

Research Team

Owner:

Fitbit Data Feasibility Queries

I intend to study how wearable Fitbit data can inform how we monitor recovery from orthopedic surgeries, such as total hip arthroplasty. I am hoping to use individual data as well to determine how social determinants of health can determine…

Scientific Questions Being Studied

I intend to study how wearable Fitbit data can inform how we monitor recovery from orthopedic surgeries, such as total hip arthroplasty. I am hoping to use individual data as well to determine how social determinants of health can determine a person’s recovery and to model such a trajectory for other patients (i.e. how age can determine progress of recovery post-operatively). This can help improve healthcare accessibility and allow us to better understand how different patients recover in accordance with objective Fitbit heath metrics.

Project Purpose(s)

  • Educational

Scientific Approaches

I am using the wearable and EHR datasets provided in All of Us. My research will use Python for statistical modeling and machine learning analyses.

Anticipated Findings

We anticipate that older age will lead to slower recovery. We are hoping that this research can help surgeons better manage patient expectations and allow transparency on the recovery process for various orthopedic surgeries.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Registered Tier

Research Team

Owner:

  • Jennifer Yu - Graduate Trainee, Icahn School of Medicine at Mount Sinai

SDOH of Spinal Arthrodesis

Are there racial and socioeconomic disparities in post-operative outcomes after spinal arthrodesis? This is important to understand, to both compare with previous sdoh data and better identify gaps in healthcare

Scientific Questions Being Studied

Are there racial and socioeconomic disparities in post-operative outcomes after spinal arthrodesis? This is important to understand, to both compare with previous sdoh data and better identify gaps in healthcare

Project Purpose(s)

  • Population Health

Scientific Approaches

We plan on using the AOU dataset, focused on arthrodesis and outcomes within 4 weeks of the surgery. We will primarily use statistical analyses in R to answer our question

Anticipated Findings

We anticipate to find that patients in lower socioeconomic groups and minority racial and ethnic identities will have increased rates of postoperative complications and severely less access to arthrodesis.

Demographic Categories of Interest

  • Race / Ethnicity
  • Sex at Birth
  • Gender Identity
  • Access to Care

Data Set Used

Controlled Tier

Research Team

Owner:

  • Divy Kumar - Graduate Trainee, Northwestern University

Cervical Cancer project

The following questions are under consideration in this study: 1. What demographic, genetic, and lifestyle factors are most predictive of cervical cancer risk in African American women? 2. How do different machine learning models (e.g., decision trees, support vector machines,…

Scientific Questions Being Studied

The following questions are under consideration in this study:

1. What demographic, genetic, and lifestyle factors are most predictive of cervical cancer risk in African American women?
2. How do different machine learning models (e.g., decision trees, support vector machines, neural networks) perform in predicting precancerous conditions and early-stage cervical cancer in an African American female population, and which model offers the highest accuracy and efficiency?
3. What are the barriers to implementing machine learning-based screening tools in clinical settings for the early detection of cervical cancer among African American women, and how can these barriers be overcome?
4. How does the incorporation of machine learning algorithms into cervical cancer screening programs affect the early detection rates, treatment outcomes, and overall survival rates among African American women compared to current screening practices?

Project Purpose(s)

  • Disease Focused Research (cervical cancer)
  • Methods Development

Scientific Approaches

To address these questions, the study will employ the following approaches:

1. Datasets: We will utilize datasets that include demographic, genetic, and lifestyle information from African American women, along with medical records detailing cervical cancer diagnoses and outcomes.
2. Research Methods: Statistical analysis will be used to identify key predictive factors. Machine learning models (decision trees, support vector machines, neural networks) will be trained and tested on the data to predict precancerous conditions and early-stage cervical cancer.
3. Tools: Data analysis will be conducted using statistical software (e.g., Python), and machine learning models will be developed using platforms such as TensorFlow and Scikit-learn.

Anticipated Findings

The anticipated findings from this study are:

1. Identification of specific demographic, genetic, and lifestyle factors that significantly predict cervical cancer risk in African American women.
2. Determination of the most accurate and efficient machine learning model for predicting precancerous conditions and early-stage cervical cancer in this population.
3. Insight into the barriers to implementing machine learning-based screening tools in clinical settings and strategies to overcome these barriers.
4. Evidence that incorporating machine learning algorithms into cervical cancer screening programs can improve early detection rates, treatment outcomes, and overall survival rates among African American women.

These findings will contribute to the scientific knowledge by enhancing our understanding of cervical cancer risk factors in African American women, improving screening techniques, and addressing health disparities.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

  • Fadekemi Osaye - Early Career Tenure-track Researcher, Alabama State University

Depression and Activity Monitor Data

Currently learning about the All of Us platform and using this workspace to look at different subgroupings of depression and their associations with activity monitor data.

Scientific Questions Being Studied

Currently learning about the All of Us platform and using this workspace to look at different subgroupings of depression and their associations with activity monitor data.

Project Purpose(s)

  • Disease Focused Research (Psychiatric Disorders)

Scientific Approaches

Unknown at this time. Currently learning more about the All of Us platform and possible methodologies to conduct this analysis.This will be updated when the project details are finalized further.

Anticipated Findings

Create phenotypes of depression based off of activity monitor data and possibly conduct outcome analysis.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Vedant Agrawal - Graduate Trainee, University of Texas Medical Branch (UTMB) at Galveston

Duplicate of CVD/DM Composite Risk Factor Control

Diabetes remains an overwhelming public health burden in the United States with the prevalence of diagnosed diabetes reaching an estimated 8.7% of the adult population in 2019. Among patient with DM, cardiovascular disease (CVD) is the leading cause of mortality…

Scientific Questions Being Studied

Diabetes remains an overwhelming public health burden in the United States with the prevalence of diagnosed diabetes reaching an estimated 8.7% of the adult population in 2019. Among patient with DM, cardiovascular disease (CVD) is the leading cause of mortality and morbidity. Importantly, the burden of disease is not shared equally across the population; racial and socioeconomic disparities are evident in recent DM data.

The goal of this project is to use the All of Us Research Program database to explore racial and socioeconomic disparities in composite CVD risk factor control amongDM patients with and without CVD. We want to know 1. Among patients with diabetes mellitus with and without CVD, what is the extent of composite risk factor control (BP, HbA1c, LDL-C, BMI, tobacco status, activity level? 2. How is race, ethnicity, sex, income, and healthcare access associated with number of risk factors at target for control?

Project Purpose(s)

  • Population Health

Scientific Approaches

Demographic data will be compared in patients with and without CVD using a survey t test for continuous variables or Chi-squared test for categorial variables. We will use a chi-squared test of proportions to compare the proportion of patients at target individual and composite risk factor control between patients with and without CVD. Multiple logistic regression will be used to identify if race, sex, socioeconomic status, and access to healthcare are associated with individual and composite risk factor control.

Anticipated Findings

We anticipate that there are different patterns in terms of risk factor control across DM patients with and without CVD depending on social determinants of health, which will help to identify the possible prevention strategies for DM and CVD and address social determinants to improve health outcomes.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Cammie Young - Student, University of California, Irvine

galc variants

which variants in the GALC gene contribute to Krabbe disease. I want to see what variants are found in the genomes included so far in All of US.

Scientific Questions Being Studied

which variants in the GALC gene contribute to Krabbe disease. I want to see what variants are found in the genomes included so far in All of US.

Project Purpose(s)

  • Disease Focused Research (krabbe disease)
  • Population Health
  • Educational
  • Methods Development
  • Ancestry

Scientific Approaches

Expression of variants in human cells to see which ones cause damage to the GALC enzyme. Make a library of variants and expressionin HEK293 cells. Measure GALC activity in these cells.

Anticipated Findings

Better newborn screening and prognosis/diagnosis of Krabbe disease. When genotype info is obtained, better interpretation of genotypes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • michael gelb - Late Career Tenured Researcher, University of Washington

Rare Variant Burden Analyses Templates and Tutorials

Endogenous retroviruses (ERVs) comprise approximately 8% of the human genome. For years, ERVs have been considered silent and therefore deemed irrelevant by most for study. However, in the last years, we have shown that ERV expression signatures have a larger…

Scientific Questions Being Studied

Endogenous retroviruses (ERVs) comprise approximately 8% of the human genome. For years, ERVs have been considered silent and therefore deemed irrelevant by most for study. However, in the last years, we have shown that ERV expression signatures have a larger volume than previously thought. ERV RNA and protein expression has been demonstrated in particular for malignancies like cancers, neurological conditions, and autoimmune diseases as well as in immune privileged organs such as testes, placenta, brain and thyroid. While overexpression of certain ERV families which can encompass loci from multiple chromosomes has been linked to disease, single locus associations remain scarce. We will study the mutational burden of specific ERVs. This research aims to identify mutational profiles of ERVs in various disease,

Project Purpose(s)

  • Methods Development
  • Ancestry

Scientific Approaches

Common variant frequency associations, rare-variant burden analyses, and detection of ERV integration in large whole genome sequencing dataset from the All of Us research program.

Anticipated Findings

We hypothesize that certain ERVs have disease specific roles and will be more mutated in disease.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Pragati Kore - Graduate Trainee, Baylor College of Medicine
  • Nirav Shah - Graduate Trainee, Baylor College of Medicine
  • Hatoon Al Ali - Graduate Trainee, Baylor College of Medicine
  • Elizabeth Atkinson - Early Career Tenure-track Researcher, Baylor College of Medicine
  • Aishi Ayyanathan - Undergraduate Student, Baylor College of Medicine

Tutorial_AD_Study

I am exploring the All of Us dataset to formalize a research question centered around AD. The specific scientific questions are: 1) How do sleep quality and physical activity levels influence AD progression in underserved communities? 2) Are there disparities…

Scientific Questions Being Studied

I am exploring the All of Us dataset to formalize a research question centered around AD.

The specific scientific questions are:
1) How do sleep quality and physical activity levels influence AD progression in underserved communities?
2) Are there disparities in AD progression based on social determinants of health within these populations?

This exploration is crucial for identifying modifiable behavioral factors and addressing health disparities, ultimately aiming to improve the quality of life for individuals in these communities​​.

Project Purpose(s)

  • Educational

Scientific Approaches

For this exploratory phase study, I will use the All of Us dataset to assess the available data on sleep quality, physical activity, and Alzheimer’s Disease (AD) progression in underserved communities. The dataset includes comprehensive health data, such as electronic health records, survey responses, and wearable device data.

Anticipated Findings

From this data exploration, I anticipate identifying key variables and potential correlations between sleep quality, physical activity, and Alzheimer’s Disease (AD) progression in underserved communities. These preliminary findings will help refine my research questions and hypotheses.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography

Data Set Used

Registered Tier

Research Team

Owner:

Marijuana use and neuropsychiatric outcomes among cancer survivors

How do reported neuropsychiatric outcomes (cognitive impairment, depression, etc.) differ between cancer survivors with frequent, infrequent, or no marijuana usage? Are there any polymorphisms associated with different neuropsychiatric outcomes among cancer survivors with frequent marijuana usage?

Scientific Questions Being Studied

How do reported neuropsychiatric outcomes (cognitive impairment, depression, etc.) differ between cancer survivors with frequent, infrequent, or no marijuana usage? Are there any polymorphisms associated with different neuropsychiatric outcomes among cancer survivors with frequent marijuana usage?

Project Purpose(s)

  • Disease Focused Research (cancer)
  • Ancestry

Scientific Approaches

Univariate and multivariate logistic regression

Anticipated Findings

We hypothesize that cancer survivors with frequent marijuana usage will demonstrate different trends in neuropsychiatric outcomes in comparison to those with infrequent or no usage. Furthermore, we hypothesize that polymorphisms in genes where variability is known to impact the effect of cannabis on the individual (i.e. COMT, CYP2C9, etc.) will be associated with altered neuropsychiatric outcomes among cancer survivors with marijuana usage. Our findings will provide information on how neuropsychiatric symptoms are related to marijuana use in cancer survivors, potentially shedding Insight on their reasons for usage and/or possible therapeutic applications of cannabis.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Julia Trudeau - Graduate Trainee, University of California, Irvine
  • Ding Quan Ng - Graduate Trainee, University of California, Irvine
  • Carolyn Nguyen - Undergraduate Student, University of California, Irvine

Familial Hypercholesterolemia

Our primary goals are to achieve a better understanding of the prevalence and penetrance of FH and rates of underdiagnosis and undertreatment in the U.S.

Scientific Questions Being Studied

Our primary goals are to achieve a better understanding of the prevalence and penetrance of FH and rates of underdiagnosis and undertreatment in the U.S.

Project Purpose(s)

  • Disease Focused Research (familial hypercholesterolemia)
  • Population Health
  • Ancestry

Scientific Approaches

We will use whole genome sequencing, survey, and electronic health record (EHR) data.

Anticipated Findings

We hypothesize that genetic screening for FH using the most up-to-date genetic P/LP variant classifications will reveal higher rates of FH prevalence, underdiagnosis, and undertreatment in our cohort compared to previous studies. Our study will result in a better understanding of the current state of FH prevalence, underdiagnosis and undertreatment in the U.S. and potential disparities, which we hope will raise awareness and inform the design, implementation, and policy of future studies and interventions to increase the rates of genetic screening, diagnosis and treatment, especially among individuals historically underrepresented in biomedical research.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Hannah Park - Other, University of California, Irvine
1 - 25 of 11984
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.