Research Projects Directory

Research Projects Directory

9,423 active projects

This information was updated 2/27/2024

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

D-PRISM-Team-BandV (v7)

Large-scale genome wide association studies have identified a large number of genetic variants associated with complex diseases. The aggregation of all the variants that are known to contribute to the disease in the form of polygenic risk scores (PRS) improves…

Scientific Questions Being Studied

Large-scale genome wide association studies have identified a large number of genetic variants associated with complex diseases. The aggregation of all the variants that are known to contribute to the disease in the form of polygenic risk scores (PRS) improves the prediction of a range of complex diseases. Most PRS have been developed within European ancestry study samples, further exaggerating health disparities across ancestries. There is a critical need to responsively and pro-actively expand access to accurate PRS. Specifically, diabetes, and its associated complications are one of the biggest global health problems of the 21st century. In fact, type 1 and type 2 diabetes (T1D and T2D), gestational diabetes (GDM) and related complications are excellent disease models to study the utility of PRS for predicting heterogenous and complex health outcomes in a setting where dramatic racial/ethnic and socioeconomic disparities exist.

Project Purpose(s)

  • Disease Focused Research (Lifespan diabetes and diabetes complications)

Scientific Approaches

To address the disparities in PRS across ancestries, we have assembled a multi-disciplinary team to aggregate and analyze genetic data from more than with T1D, T2D, GDM and glycemia-related complications and quantitative traits to improve the PRS prediction of diabetes and progression across lifespan in diverse ancestries with these Aims: (1) Collection, harmonization and integration of large-scale, multi-ancestry cohorts with diabetes traits across the life-span and genomics for development, training and testing PRS for diverse ancestries; (2) Development of methods to improve PRS prediction in non-European populations by using Bayesian approaches that allow integration of linkage disequilibrium and summary statistics from several ancestries. (3) Development, testing, and comparing performance of PRS for each trait, development of risk prediction tools that integrate clinical and genetic risk factors, and assessment of scenarios where PRS improve the prediction.

Anticipated Findings

Accomplishing the aims of this proposal will demonstrate how genomic data can inform more efficient and targeted preventive strategies within healthcare systems and across ethnically diverse populations. Findings are expected to advance precision care of patients with diabetes and related conditions in people of diverse ancestral background and serve as a paradigm for many other complex diseases.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

  • Josep Mercader - Early Career Tenure-track Researcher, Broad Institute
  • Alisa Manning - Early Career Tenure-track Researcher, Mass General Brigham

Collaborators:

  • Ravi Mandla - Project Personnel, Broad Institute
  • Raymond Kreienkamp - Research Fellow, Broad Institute
  • Philip Schroeder - Project Personnel, Broad Institute
  • Maheak Vora - Project Personnel, Broad Institute
  • Maggie Ng - Mid-career Tenured Researcher, Vanderbilt University Medical Center
  • Lukasz Szczerbinski - Research Fellow, Broad Institute
  • Kenneth Westerman - Research Fellow, Mass General Brigham
  • Yingchang Lu - Early Career Tenure-track Researcher, Vanderbilt University Medical Center
  • Joohyun Kim - Other, Vanderbilt University Medical Center
  • Jaehyun Park - Research Fellow, Vanderbilt University Medical Center
  • Kaavya Ashok - Early Career Tenure-track Researcher, Broad Institute
  • Alicia Huerta - Research Fellow, Broad Institute
  • Aaron Deutsch - Research Fellow, Mass General Brigham

ADHD

What is the effectiveness of Lion's Mane mushrooms as a supplementary treatment for ADHD symptom management, and what physiological and neurochemical mechanisms underlie its therapeutic potential?

Scientific Questions Being Studied

What is the effectiveness of Lion's Mane mushrooms as a supplementary treatment for ADHD symptom management, and what physiological and neurochemical mechanisms underlie its therapeutic potential?

Project Purpose(s)

  • Disease Focused Research (attention deficit hyperactivity disorder)
  • Educational
  • Drug Development

Scientific Approaches

Look at data of those affected with ADHD and those on ADHD medication to determine how to reduce symptoms of ADHD with lions mane.

Anticipated Findings

The population using ADHD medication is expected to rise compared to those with ADHD who aren't on medication. Additionally, there's potential for cognitive benefits from natural remedies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Grant Proposal

Review data fields that provided by All of Us program and finding potential research question, data and methods for the grant proposal.

Scientific Questions Being Studied

Review data fields that provided by All of Us program and finding potential research question, data and methods for the grant proposal.

Project Purpose(s)

  • Other Purpose (Review data fields for a All of Us grant proposal)

Scientific Approaches

Method is to be decided. I haven't get a chance to review the data fields yet. Would like to make this decision later after seeing dataset.

tool will be using python and R.

Anticipated Findings

Prostate cancer disparities in Hispanic Patients' and Partners' health outcomes and psychosocial factors

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

  • Xiaomeng Wang - Project Personnel, University of Texas Health Science Center, San Antonio

COVID-19 Vaccine Hesitancy

Research question: What factors contribute to vaccine hesitancy among racialized communities in COVID-19? This question is important because addressing vaccine hesitancy is crucial for achieving widespread vaccination and controlling the COVID-19 pandemic. Racialized communities have been disproportionately affected by the…

Scientific Questions Being Studied

Research question: What factors contribute to vaccine hesitancy among racialized communities in COVID-19?

This question is important because addressing vaccine hesitancy is crucial for achieving widespread vaccination and controlling the COVID-19 pandemic. Racialized communities have been disproportionately affected by the pandemic, experiencing higher infection rates and worse health outcomes. Identifying the factors that contribute to vaccine hesitancy in these communities will help public health officials and policymakers develop targeted interventions to increase vaccination rates and reduce disparities in COVID-19 outcomes. This research will provide valuable insights that can inform public health strategies and improve health equity for all communities.

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Educational

Scientific Approaches

For our study on vaccine hesitancy among racialized communities in COVID-19 using quantitative analysis of survey datasets to identify trends. We will use R for analysis. This approach will provide a comprehensive understanding of vaccine hesitancy in these communities, aiding targeted interventions for public health improvement.

Anticipated Findings

The study anticipates identifying key factors influencing vaccine hesitancy including access to healthcare, historical mistrust, and cultural beliefs. Understanding these factors and how they interact will be critical for developing targeted interventions to improve vaccination rates and reduce disparities in COVID-19 outcomes.

The findings will contribute to the scientific knowledge by providing insights into the impact of vaccine hesitancy on vaccination rates and COVID-19 outcomes among racialized communities, informing public health interventions, and highlighting the importance of addressing social determinants of health in promoting vaccine uptake.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

Accelerometer-based Indicators of Long-Term Physical Activity Maintenance

We will test a framework for evaluating longitudinal accelerometer-based indicators to classify physical activity maintenance. Aim 1: Compare the prevalence of days and individuals classified as physical activity maintenance across different values of accelerometer indicators. Aim 2: Compare the validity…

Scientific Questions Being Studied

We will test a framework for evaluating longitudinal accelerometer-based indicators to classify physical activity maintenance.
Aim 1: Compare the prevalence of days and individuals classified as physical activity maintenance across different values of accelerometer indicators.
Aim 2: Compare the validity of different values of accelerometer indicators at predicting future behavior change success and meaningful health outcomes.
Aim 3: Compare the construct validity of different values of accelerometer indicators against theory-based behavioral and psychosocial mechanisms of physical activity maintenance.
Aim 4: Compare the values of accelerometer indicators of physical activity maintenance across different populations, historical contexts/time periods, devices/algorithms.

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Methods Development

Scientific Approaches

Aim 1: Exploratory and descriptive analyses of how combinations of values of accelerometer-based indicators (i.e., threshold, duration above the threshold, allowance below the threshold) will impact the prevalence of days and individuals that are classified as maintenance.
Aim 2: Compare the extent to which values predict future behavior change success and meaningful health outcomes through receiver operator characteristic curves that compare the likelihood behavior change success occurring across different levels of inputs.
Aim 3:: Test the extent to which different accelerometer indicators of maintenance thresholds, durations above the threshold, and allowances below the threshold correspond to psychological mechanisms of physical activity maintenance.
Aim 4: Compare the values obtained in the previous steps across a range of different populations varying in sex gender, age, race/ethnicity, socioeconomic status, weight status and fitness level, able-bodied status, and country.

Anticipated Findings

Widespread application of a common framework to evaluate longitudinal accelerometer-based indicators of physical activity maintenance is necessary to develop standardized operational definitions of physical activity maintenance than can be used across surveillance, epidemiological, and intervention studies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Genevieve Dunton - Late Career Tenured Researcher, University of Southern California

research topic

to explore the data about health disparities and how it relates to maternal health

Scientific Questions Being Studied

to explore the data about health disparities and how it relates to maternal health

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Educational
  • Ethical, Legal, and Social Implications (ELSI)

Scientific Approaches

races of the women who have died during child birth

Anticipated Findings

seeing the comparison of the target group and their outcomes

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

Iron deficiency in anticoagulated menstruators

The specific scientific question is: What is the prevalence (frequency) of iron deficiency among menstruating individuals using anticoagulation. The reason this is is important is that iron deficiency is very common among menstruating individuals, particularly those with heavy periods, which…

Scientific Questions Being Studied

The specific scientific question is: What is the prevalence (frequency) of iron deficiency among menstruating individuals using anticoagulation. The reason this is is important is that iron deficiency is very common among menstruating individuals, particularly those with heavy periods, which includes 70% or more of menstruating individuals on anticoagulation, although this condition is commonly underdiagnosed. Iron deficiency may represent a major unmet need in this population and we wish to define the problem so interventions can be better designed.

Project Purpose(s)

  • Disease Focused Research (iron deficiency in anticoagulated menstruating individuals)

Scientific Approaches

We plan to search for ferritin levels in menstruating individuals (in this case we will use AFAB individuals between 18-55 as a proxy for this) on anticoagulants including dabitagran, apixaban, rivaroxaban, edoxaban and low molecular weight heparin.

Anticipated Findings

This information will be used to appropriately power interventional studies of iron replacement in anticoagulated menstruating individuals. upon publication, we are hopeful this will also encourage prescribers of anticoagulants to evaluate and treat patients for iron deficiency.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Wesley Stoller - Project Personnel, Oregon Health & Science University

Lloyd_Wound_Healing

I would like to explore the relationship between wound healing and cancer development and progression.

Scientific Questions Being Studied

I would like to explore the relationship between wound healing and cancer development and progression.

Project Purpose(s)

  • Disease Focused Research (cancer)
  • Population Health

Scientific Approaches

I will perform various case-control assessments (chi square tests, logistic regression) using keloids as a proxy for aberrant wound healing with various cancers.

Anticipated Findings

We hypothesize that there will be an association between aberrant wound healing (keloid presentation) and specific solid tumors.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Stacy Lloyd - Early Career Tenure-track Researcher, Tuskegee University

Collaborators:

  • Deyana Lewis - Project Personnel, Morehouse School of Medicine

WorkbenchBackChannel

This workspace is to be used for researchers to share scripts with each other. We will copy over notebooks of interest for each other to use.

Scientific Questions Being Studied

This workspace is to be used for researchers to share scripts with each other. We will copy over notebooks of interest for each other to use.

Project Purpose(s)

  • Ancestry

Scientific Approaches

This workspace is to be used for researchers to share scripts with each other. We will copy over notebooks of interest for each other to use.

Anticipated Findings

This team-science approach will advance research by harmonizing methods and avoiding redundancy in script writing.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Micah Hysong - Graduate Trainee, University of North Carolina, Chapel Hill

Collaborators:

  • Laura Raffield - Other, University of North Carolina, Chapel Hill
  • Bjoernar Tuftin - Project Personnel, University of North Carolina, Chapel Hill

Missing Analysis

A key objective is to develop data-driven methods which to determine the most effective adherence thresholds. This is crucial for ensuring that the data under analysis meet a certain quality standard and are relevant for the intended research purposes. Another…

Scientific Questions Being Studied

A key objective is to develop data-driven methods which to determine the most effective adherence thresholds. This is crucial for ensuring that the data under analysis meet a certain quality standard and are relevant for the intended research purposes. Another significant aim is to thoroughly define and analyze the patterns of missingness in the data, both on an intraday and daily basis. This involves applying clustering techniques to discern these patterns among individuals and across the population. his approach will not only help in understanding the nuances of data missingness but also in identifying potential systemic biases or errors in data collection and processing.

Project Purpose(s)

  • Methods Development

Scientific Approaches

Filter data based on HR filtering (used in AoU) or other definitions in the literature to ensure data quality.
Visualization: Developing figures, such as (looking like) elbow diagrams, to identify optimal adherence thresholds.

Defining and analyzing missingness patterns (intraday & daily)
Applying clustering to understand the missingness pattern both at the individual level and across the population (what Peter did).
Making associations (correlation) with outcome variables or covariates

Anticipated Findings

Firstly, we expect to identify optimal data filtering criteria that align with HR filtering standards or other reputable definitions, which will set a benchmark for data quality in subsequent analyses. Through the development of visualizations like elbow diagrams, the study aims to pinpoint precise adherence thresholds that ensure data reliability, offering a methodological advancement in data preprocessing for health research.

In analyzing missingness patterns, both intraday and daily, we anticipate uncovering significant insights into the nature and extent of data gaps. By applying clustering techniques, we aim to categorize individuals and populations based on their missingness profiles, potentially revealing underlying reasons for data absence, such as demographic factors, study design issues, or data collection methodologies. This could lead to recommendations for improving data collection processes and minimizing missing data in future research.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Shun Sakai - Undergraduate Student, Duke University
  • Yuyou Wu - Graduate Trainee, Duke University
  • Lori Liu - Graduate Trainee, Duke University
  • Jiamu Yang - Graduate Trainee, Duke University
  • Harrison Kane - Undergraduate Student, Duke University

Validation of Phenotype Risk Scores

Many genetic syndromes (described in OMIM) manifest as constellations of disease. As most of these genetic diseases are rare individually but not uncommon as a whole, many patients can be misdiagnosed and did not have correct genetic tests. In this…

Scientific Questions Being Studied

Many genetic syndromes (described in OMIM) manifest as constellations of disease. As most of these genetic diseases are rare individually but not uncommon as a whole, many patients can be misdiagnosed and did not have correct genetic tests. In this project, we want to use the diagnostic codes (conditions) recorded in the AoU to identify potential missed cases and test them with the genomic data.

Phenotype risk score (PheRS) is calculated with presence of weighed numbers of manifestations as described in the following publication.

Bastarache, L., Hughey, J.J., Goldstein, J.A., Bastraache, J.A., Das, S., Zaki, N.C., Zeng, C., Tang, L.A., Roden, D.M., Denny, J.C. (2019). Improving the phenotype risk score as a scalable approach to identifying patients with Mendelian disease. J Am Med Inform Assoc, 26(12):1437-1447.

Project Purpose(s)

  • Methods Development

Scientific Approaches

1. Conditions / diagnostic codes are used to calculate PheRS
2. Genomic data: to identify carriers of undiagnosed genetic diseases.

Anticipated Findings

Validation of the application of PheRS in predicting carrier status. If this approach is valid, it would be useful to implement in clinical decision support (CDS).

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Huan Mo - Research Fellow, National Human Genome Research Institute (NIH-NHGRI)

Collaborators:

  • Cassia Williams-Rogers - Research Assistant, National Human Genome Research Institute (NIH-NHGRI)
  • Tam Tran - Other, National Human Genome Research Institute (NIH - NHGRI)
  • Molly Goldwasser - Undergraduate Student, National Human Genome Research Institute (NIH-NHGRI)
  • Henry Taylor - Graduate Trainee, National Human Genome Research Institute (NIH - NHGRI)
  • David Schlueter - Research Fellow, National Human Genome Research Institute (NIH-NHGRI)
  • Dayo Shittu - Project Personnel, National Human Genome Research Institute (NIH-NHGRI)
  • Chiamaka Diala - Graduate Trainee, National Human Genome Research Institute (NIH-NHGRI)
  • Katharine Chaillet - Research Assistant, National Human Genome Research Institute (NIH-NHGRI)
  • Anav Babbar - Other, National Human Genome Research Institute (NIH-NHGRI)
  • Anas Awan - Project Personnel, National Human Genome Research Institute (NIH-NHGRI)

Asian Health Coalition ALL of US ambassador outreach use

As a medical student ambassador of All of Usresearch program, I am tasked to see if there is any unique characteristic among patients with depression, especially the population of Asians and Pacific Islanders with Disabilities.

Scientific Questions Being Studied

As a medical student ambassador of All of Usresearch program, I am tasked to see if there is any unique characteristic among patients with depression, especially the population of Asians and Pacific Islanders with Disabilities.

Project Purpose(s)

  • Educational

Scientific Approaches

potentially use Chi square to see if there is a relationship in age, race, gender and presentation of depressive episodes, since the survey items should have categorical data.

Anticipated Findings

There should be unique characteristic of presentation differences in depressive episode in different sex, age and race

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Disability Status

Data Set Used

Registered Tier

Research Team

Owner:

  • Tsz Chun Chung - Graduate Trainee, A.T. Still University of Health Sciences

Duplicate of How to Get Started with Controlled Tier Data (v7)

1. Socio-Economic Metrics: How to retrieve participants' socio-economic data from the CDR. 2. Observation Date: How to query and plot an observation date using survey completion date as example. 3. Demographics: Examples of how to query and plot participant demographic…

Scientific Questions Being Studied

1. Socio-Economic Metrics: How to retrieve participants' socio-economic data from the CDR.
2. Observation Date: How to query and plot an observation date using survey completion date as example.
3. Demographics: Examples of how to query and plot participant demographic data.
4. Death Cause: How to retrieve and plot deceased participants' death causes.

Project Purpose(s)

  • Educational
  • Methods Development
  • Other Purpose (This is an All of Us Featured Workspace: - teaches the users how to set up this notebook, install and import software packages, and select the correct version of the CDR. - gives an overview of the data types available in the current Controlled Tier Curated Data Repository (CDR) that are not available in the Registered Tier - shows how to retrieve and summarize this data.)

Scientific Approaches

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data. The tutorial Workspace contains two Jupyter Notebooks (one written in Python, the other in R). It contains helper functions for repeatedly, code readability and efficiency and repeatedly.

Anticipated Findings

By reading and running the notebooks in this Tutorial Workspace, you will understand the following: All of Us data are made available in two Curated Data Repository: the Registered Tier and Controlled Tier. The latter was subject to more relaxed privacy rules relative to the Registered Tier. As a result, you can expect to find more concept ids in certain data types such as EHR and Survey.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Jay Levy - Project Personnel, Columbia University

Family Medicine Research Interests

This effort is dedicated to identifying potential research questions within the interests of UCI Family Medicine faculty. Research areas include but are not limited to: 1. hypertension, diabetes, and obesity in underserved communities 2. Health literacy 3. Fitness

Scientific Questions Being Studied

This effort is dedicated to identifying potential research questions within the interests of UCI Family Medicine faculty.
Research areas include but are not limited to:
1. hypertension, diabetes, and obesity in underserved communities
2. Health literacy
3. Fitness

Project Purpose(s)

  • Educational

Scientific Approaches

Based on the finalized research questions and possible areas of interest, various statistical analysis, predictive modeling using techniques such as machine learning and deep learning are going to be potential choices.

Anticipated Findings

This is an effort to clarify and finalize Potential research questions and thus associated anticipated findings can be inferred upon the generation of the hypothesis. Potential findings would be around the topic of disparities among different demographics.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Genetic Variant Exploring with Alzheimer's Disease Sequencing Project

Compare the called variants obtained from our dataset from the Alzheimer's Disease Sequencing Project against those available in the All Of Us to study genetic differences across cohorts and populations.

Scientific Questions Being Studied

Compare the called variants obtained from our dataset from the Alzheimer's Disease Sequencing Project against those available in the All Of Us to study genetic differences across cohorts and populations.

Project Purpose(s)

  • Disease Focused Research (neurodegenerative diseases)
  • Ancestry

Scientific Approaches

Datasets: Alzheimer’s Disease Sequencing Project variant dataset, as well as the AllOfUS summary statistics.
Research methods/tools: BCFtools, Hail, R, Python.

Anticipated Findings

Datasets: Alzheimer’s Disease Sequencing Project variant dataset, as well as the AllOfUS summary statistics.
Research methods/tools: BCFtools, Hail, R, Python.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Wan-Ping Lee - Early Career Tenure-track Researcher, University of Pennsylvania

Prostate Cancer Genetics v7

It is well understood that inherited genetic variation is an important risk factor for prostate cancer. While much work has been done on predicting prostate cancer risk based on genetics, this approach suffers from the problem of overdiagnosis -- identifying…

Scientific Questions Being Studied

It is well understood that inherited genetic variation is an important risk factor for prostate cancer. While much work has been done on predicting prostate cancer risk based on genetics, this approach suffers from the problem of overdiagnosis -- identifying men at risk of prostate cancer that, if left untreated, would grow slowly and cause no harm. To address this problem, we are focusing on identifying and characterizing genetic variants that are specific for more aggressive forms of prostate cancer -- those that lead to metastatic disease or death. Such variants may be especially important in Black populations, who are known to be at higher risk of prostate cancer overall and more aggressive disease in particular. To address this question, we are currently estimating the number of such aggressive prostate cancer cases with genetic data available in All of Us.

Project Purpose(s)

  • Disease Focused Research (Prostate Cancer)
  • Population Health
  • Ancestry

Scientific Approaches

To address these questions, we will identify All of Us participants who 1) have aggressive prostate cancer and 2) have genetic information available. Once we determine the sample sizes available, we will use standard statistical approaches to identify genetic variants that are associated with aggressive disease in particular and ask if genetic predictors (polygenic scores) can specifically identify men at risk of aggressive prostate cancer.

Anticipated Findings

We anticipate developing and benchmarking predictive scores for aggressive prostate cancer. Such a score will have translational potential, and could be used to guide prostate cancer screening decisions.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Robert Klein - Mid-career Tenured Researcher, Icahn School of Medicine at Mount Sinai

HLA diversity and obesity in women with breast cancer

In a Phase II clinical trial in women with metastatic breast cancer we observed better responses in women with elevated BMI and in self-identified Black women treated with immunotherapy. We also observed higher levels of major histocompatibility complex I (MHC)…

Scientific Questions Being Studied

In a Phase II clinical trial in women with metastatic breast cancer we observed better responses in women with elevated BMI and in self-identified Black women treated with immunotherapy. We also observed higher levels of major histocompatibility complex I (MHC) protein expression in tumors from those responding to therapy. The immune system recognizes mutated proteins in cancer by presenting these altered proteins on the MHCI protein. The DNA region that encodes MHCI is highly variable and encoded by region is highly variable and encoded by the Human leukocyte antigen (HLA) locus. We will test the hypothesis that Black women with different levels of African Ancestry will have differing abilities to recognize tumor antigens based on their HLA diversity.

Project Purpose(s)

  • Disease Focused Research (breast cancer)

Scientific Approaches

We will build a cohort of Black women diagnosis with breast cancer and BMI status. De-identified genomic (bed PLINK) data will be used for HLA allelic imputation using SNP2HLA to identify the HLA haplotypes individual. West African ancestry data will be used to test for associations with HLA-diversity and BMI status. Using the imputed HLA haplotypes for the alleles at class I loci (HLA-A, -B, -C) we will perform computational antigen-binding prediction algorithms (NetMHCpan) for MHC molecules were used to determine tumor peptides potentially bound by the MHC alleles under investigation.

Anticipated Findings

We expect that the HLA/MHC diversity will vary depending on West African ancestry and reveal associations with respective demographics. MHC genotypes with more divergent alleles would allow for broader antigen-presentation to immune effector cells, increasing immunocompetence and potentially impact effectiveness of immunotherapy.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • brian lehmann - Early Career Tenure-track Researcher, Vanderbilt University Medical Center

Polygenic Risk Scores and Physical Activity (Extended) CTDv7

Our primary goal is to understand the interaction between activity levels and polygenic risk score with the development and progression of human disease. Both physical activity and polygenic risk scores have been shown to be associated with prevalence and outcomes…

Scientific Questions Being Studied

Our primary goal is to understand the interaction between activity levels and polygenic risk score with the development and progression of human disease. Both physical activity and polygenic risk scores have been shown to be associated with prevalence and outcomes in many human diseases. These analyses will generate hypotheses guiding clinical and research interventions focused on activity to reduce morbidity and mortality in patients seeking care.

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Ancestry

Scientific Approaches

We will examine the relationship between daily activity (steps, activity intensity) over time and the prevalence and progression of coded human diseases, which may be modified by genetics. We will use the Fitbit data, EHR-curated diagnoses, laboratory values, quality of life survey results, clinical outcomes (hospitalizations/mortality), and polygenic risk scores derived from the WGS dataset in AoU.

Anticipated Findings

We expect to find that lower levels of activity are associated with a higher prevalence and more rapid progression of certain diseases and that this risk may be modified by polygenic risk score. These data will provide the rationale to link wearables data with electronic health records nationwide as a window into behavioral activity choice as a modifiable risk factor for chronic diseases. We may find substantial variation in activity and disease prevalence/severity by socioeconomic status, which would motivate studies/interventions to reduce these health disparities.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Lide Han - Project Personnel, Vanderbilt University Medical Center

SDoH Analysis

Rheumatic diseases such as Systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), and osteoarthritis (OA) have large personal, financial, and societal impacts and are the leading cause of work disability and chronic pain. High rates of potentially avoidable complications occur in…

Scientific Questions Being Studied

Rheumatic diseases such as Systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), and osteoarthritis (OA) have large personal, financial, and societal impacts and are the leading cause of work disability and chronic pain. High rates of potentially avoidable complications occur in individuals with rheumatic diseases, and there are many disparities among these individuals. We aim to determine the impact of social determinants of health (SDoH) on acute care use and avoidable hospitalizations among individuals with SLE, RA, and OA. These factors may contribute significantly to modifiable disparities, but they are not systematically screened for or addressed by care providers.

Project Purpose(s)

  • Disease Focused Research (rheumatic disease)
  • Population Health
  • Social / Behavioral

Scientific Approaches

We will use datasets of individuals with SLE, RA, and OA based on diagnosis codes. Acute care use will be ED visits and hospitalizations. We will include covariates such as demographic factors and social determinant of health information from surveys. We will use descriptive statistics to examine our variables, and Poisson regression models to examine incidence rate ratios for social risk factors and acute care use. We will adjust for covariates and run separate models for each rheumatic disease category. We will also use area level social risk data and use multilevel Poisson models to adjust for both individual level and area level data.

Anticipated Findings

We anticipate finding differences among avoidable acute care use by social risk factor, with individuals with SDoH's having more acute care use. These findings will help shed light on the burden of social risk factors in this national cohort of individuals with rheumatic diseases, and lay the ground work for interventions helping specific groups of people have better outcomes.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

Duplicate of Best Practices for AoU Data Science

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data query and other research support related issues that is useful for multiple AoU researchers.

Scientific Questions Being Studied

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data query and other research support related issues that is useful for multiple AoU researchers.

Project Purpose(s)

  • Educational

Scientific Approaches

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data query and other research support related issues that is useful for multiple AoU researchers.

Anticipated Findings

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data query and other research support related issues that is useful for multiple AoU researchers.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • yury garcia - Research Fellow, University of California, Davis

Duplicate of Understanding the Genetic Basis of Borderline Personality Disorder

Borderline personality disorder (BOR) is the most common of the 10 personality disorders described in the DSM-5, with an incidence of between 1-6% across most populations. Individuals with BOR display extreme emotional instability and although the symptoms can vary between…

Scientific Questions Being Studied

Borderline personality disorder (BOR) is the most common of the 10 personality disorders described in the DSM-5, with an incidence of between 1-6% across most populations. Individuals with BOR display extreme emotional instability and although the symptoms can vary between individuals, they typically include unstable and intense relationships, impulsive behaviors, self-harm, frequent mood swings, and paranoia or dissociation. These behaviors and responses are stable across time and situational context, and not arise as a result of substance abuse or response to a pharmacological agent. At any one-time patients with BOR can account for up to 20% of outpatients, and 25% of inpatients at psychiatric facilities. BOR has an estimated heritability of 46% and so we plan to use All of Us data to identify genes, and rare variants of high effect size that are associated and potentially are involved in the manifestation of borderline personality disorder traits.

Project Purpose(s)

  • Disease Focused Research (borderline personality disorder)
  • Ancestry

Scientific Approaches

We will use standard genome-wide association approaches to investigate association between common genetic variants and BDPD whilst controlling for population effects that can lead to p-value inflation. Additionally, we will investigate the contribution of rare variants to BDPD in a gene-wise approach through the aggregation of rare variants within genes using previously validated tools. Positive hits from both the GWAS and rare variant analyses will be followed up using pathway analysis to identify underlying biological processes that underpin BDPD.

Anticipated Findings

We anticipate identifying novel loci that contribute to the development of BDPD. GWAS will identify loci/genes through linkage-disequilibrium between common variants and functional/causative variants with lower effect size. Rare variant analysis will likely identify functional variants with high effect size. Pathway analysis using identified loci will provide additional information related to potential mechanisms through which the identified loci promote the development of BDPD.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Qiaoping Yuan - Project Personnel, National Institute on Alcohol Abuse and Alcoholism (NIH - NIAAA)

Study 1: Determinants of CGM Use

I intend to investigate associations between social or structural determinants and the utilization of diabetes technology among people with type 2 diabetes and to explore variations in these determinants based on race/ethnicity, sex/gender, and class. My research questions are: 1.…

Scientific Questions Being Studied

I intend to investigate associations between social or structural determinants and the utilization of diabetes technology among people with type 2 diabetes and to explore variations in these determinants based on race/ethnicity, sex/gender, and class.

My research questions are:
1. Which determinants are associated with diabetes technology use? Do determinants differ by race/ethnicity, sex/gender, and class?
2. Are clinical factors and structural determinants significantly stronger predictors of use among people of color, women, and those of lower socio-economic status?
3. Are intersections of race/ethnicity, sex/gender, and class significant predictors of diabetes technology use?

Project Purpose(s)

  • Population Health
  • Social / Behavioral

Scientific Approaches

I will use survey data for my independent variables and EHR data to assess diabetes technology use.

Inclusion Criteria: All adult AoU participants aged 18+ with a diagnosis of type 2 diabetes and who have completed baseline surveys “The Basics,” “Overall Health,” “Healthcare Access and Utilization,” and “Social Determinants of Health” as of the start date of this study will be included. Diagnoses will be determined using ICD-10 code E.11.

Exclusion Criteria: Those with a recorded diagnosis of type 1 diabetes or gestational diabetes (ICD-10 codes E.10, O24.4, and E.13) will be excluded.

Primary outcome: CGM initiation between 2021 and 2023, operationalized using current procedural terminology (CPT) codes indicative of CGM setup/training or CGM data interpretation.

Anticipated Findings

The findings of this study will help uncover social and structural barriers to diabetes technology use beyond access and affordability.

Demographic Categories of Interest

  • Race / Ethnicity
  • Gender Identity
  • Disability Status
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Robert Cavanaugh - Project Personnel, Northeastern University
  • Louisa Smith - Other, Northeastern University

Duplicate of How to Get Started with Controlled Tier Data (v7)

1. Socio-Economic Metrics: How to retrieve participants' socio-economic data from the CDR. 2. Observation Date: How to query and plot an observation date using survey completion date as example. 3. Demographics: Examples of how to query and plot participant demographic…

Scientific Questions Being Studied

1. Socio-Economic Metrics: How to retrieve participants' socio-economic data from the CDR.
2. Observation Date: How to query and plot an observation date using survey completion date as example.
3. Demographics: Examples of how to query and plot participant demographic data.
4. Death Cause: How to retrieve and plot deceased participants' death causes.

Project Purpose(s)

  • Educational
  • Methods Development
  • Other Purpose (This is an All of Us Featured Workspace: - teaches the users how to set up this notebook, install and import software packages, and select the correct version of the CDR. - gives an overview of the data types available in the current Controlled Tier Curated Data Repository (CDR) that are not available in the Registered Tier - shows how to retrieve and summarize this data.)

Scientific Approaches

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data. The tutorial Workspace contains two Jupyter Notebooks (one written in Python, the other in R). It contains helper functions for repeatedly, code readability and efficiency and repeatedly.

Anticipated Findings

By reading and running the notebooks in this Tutorial Workspace, you will understand the following: All of Us data are made available in two Curated Data Repository: the Registered Tier and Controlled Tier. The latter was subject to more relaxed privacy rules relative to the Registered Tier. As a result, you can expect to find more concept ids in certain data types such as EHR and Survey.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Duplicate of How to Work with All of Us Genomic Data (Hail - Plink)(v7)

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Scientific Questions Being Studied

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Project Purpose(s)

  • Other Purpose (Demonstrate to the All of Us Researcher Workbench users how to get started with the All of Us genomic data and tools. It includes an overview of all the All of Us genomic data and shows some simple examples on how to use these data.)

Scientific Approaches

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Anticipated Findings

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Cecile Avery - Research Fellow, Vanderbilt University Medical Center

Duplicate of How to Work with Long Read Data (v7)

This workspace and its notebooks neither ask nor answer any scientific questions. The purpose of this workspace is to serve as a tutorial which shows how to localize the All of Us (AoU) long read BAM files individually in addition…

Scientific Questions Being Studied

This workspace and its notebooks neither ask nor answer any scientific questions. The purpose of this workspace is to serve as a tutorial which shows how to localize the All of Us (AoU) long read BAM files individually in addition to showing how to render the Integrated Genome Viewer (IGV) on the AoU workbench to explore the BAM files.

Project Purpose(s)

  • Educational
  • Methods Development

Scientific Approaches

This workspace conducts no study and applies no scientific approaches. This workspace and its notebooks are tutorials for localizing AoU BAM files with Python commands and using IGV to explore their contents. The methods and tools employed include Python code and system commands for localizing individual BAM files, and the commands for importing and rendering IGV to view the localized BAM files.

Anticipated Findings

There will be no findings or contribution to scientific knowledge as there is no study being conducted nor questions asked. Informal 'findings' include the usability of the aforementioned tools and AoU BAM files on the All of Us workbench.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

  • Feseha Abebe-Akele - Early Career Tenure-track Researcher, Elizabeth City State University
1 - 25 of 9423
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.