Research Projects Directory

Research Projects Directory

1,934 active projects

This information was updated 7/1/2022

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

Duplicate of How to Run Python Notebooks in the Background

Some analyses take some time to run. Currently, the researcher has to wait for their job to run because if they are logged out of the system, their code will stop working and will not be executed. This is problematic…

Scientific Questions Being Studied

Some analyses take some time to run. Currently, the researcher has to wait for their job to run because if they are logged out of the system, their code will stop working and will not be executed. This is problematic for users working on datatypes such as Fitbit and Genomics.

To avoid this interruption, this notebook will run codes in the background.

Project Purpose(s)

  • Educational
  • Other Purpose (The notebook in this workspace shows how to run notebooks in the background even if the user is logged out of the workbench.)

Scientific Approaches

To run notebooks in the background, we use a special Python library called nbconvert. Users will specify the name of the notebook that they need to be executed. After that, they just need to run every cell in this notebook.

Anticipated Findings

There is no anticipated findings as this is for educational purpose only.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Rose Hill - Research Fellow, Scripps Research

Vaccination Demographics

The specific question being studied is if a feature selection method that can identify factors associated with difference in opinion or behavior with regard to vaccinations can be determined through viewing variables such as vaccination status and demographics from participants.…

Scientific Questions Being Studied

The specific question being studied is if a feature selection method that can identify factors associated with difference in opinion or behavior with regard to vaccinations can be determined through viewing variables such as vaccination status and demographics from participants. This in turn will be used to design a clinical trial style experiment to determine what kinds of interventions would change attitudes about HPV vaccinations and provide evidence of which factors affect overall vaccine attitudes.

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Methods Development

Scientific Approaches

The main datasets that will be looked at for this study are datasets with participants that have any variables related with vaccination status and any demographic information of those participants.

Anticipated Findings

The anticipated finding from this study is finding a feature selection method related to participants' vaccination status and demographics as to later conduct a clinical trial style study to determine what kinds of interventions would change attitudes about HPV vaccinations and provide evidence of which factors affect overall vaccine attitudes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Luis Gomez - Undergraduate Student, Brown University

Duplicate of Underdiagnosis of Dementias

The national prevalence of Alzheimer’s disease is estimated to be 11% in individuals over 65 years old and 1.5-2.0 times higher in underrepresented racial and ethnic groups including Black and Hispanic/Latinx. In real world medical care based on prior studies…

Scientific Questions Being Studied

The national prevalence of Alzheimer’s disease is estimated to be 11% in individuals over 65 years old and 1.5-2.0 times higher in underrepresented racial and ethnic groups including Black and Hispanic/Latinx. In real world medical care based on prior studies and our work, less than half of patients with dementia have been formally diagnosed and significantly more so in underrepresented racial and ethnic groups. Our scientific questions are:
1. Are dementias underdiagnosed in the electronic health record (EHR) compared to population estimates?
2. What are the risk factors for patients to not be formally diagnosed with dementias?
3. How do the underdiagnosis and risk factors differ across racial and ethnic groups?

We hypothesize dementias are underdiagnosed the EHR, more so in underrepresented racial and ethnic groups, and that risk factors include demographics, comorbidities, medications, and healthcare access.

Project Purpose(s)

  • Disease Focused Research (dementia)
  • Population Health

Scientific Approaches

We will define dementia as probable based on ICD codes and possible based on a combination of ICD codes, medication use and personal medical history. We will calculate the prevalence of dementias across racial and ethnic group compared to population estimates. We will compare risk factors of patients who have probable dementia versus possible dementia using logistic regression correcting for demographics. Risk factors tested will include drug exposures, lab measurement, demographics, health access, and other survey questions. We will compare risk factors across racial and ethnic groups.

Anticipated Findings

We expect to find that dementias are underdiagnosed the EHR, more so in underrepresented racial and ethnic groups, and that risk factors for underdiagnosis include demographics, comorbidities, medications, and healthcare access, which will differ across racial and ethnic groups. These findings may lead to racially and ethnically specific strategies to improve the early and appropriate diagnosis of dementias, which can initiate multi-disciplinary care and treatment.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

  • Samantha Shah - Project Personnel, University of California, Los Angeles

V6 Duplicate of Cannabis-glaucoma study

Background: 1. With recent changes in legal regulation, the prevalence of cannabis use is increasing in the US 2. Cannabis has previously shown potential IOP-reducing and neuroprotective effects in some prior studies 3. Although only anecdote evidence is available, glaucoma…

Scientific Questions Being Studied

Background:
1. With recent changes in legal regulation, the prevalence of cannabis use is increasing in the US
2. Cannabis has previously shown potential IOP-reducing and neuroprotective effects in some prior studies
3. Although only anecdote evidence is available, glaucoma patients may be increasingly using cannabis as a therapeutical option

Relevance:
There is a huge knowledge gap on the public health impact and risk benefit of cannabis use in glaucoma patients. To fill this gap, it is important to understand the epidemiology of cannabis use among patients with glaucoma, as well as the potential benefit and risk of such behavior.

Aims:
1. To estimate the prevalence and frequency of cannabis use in patients with glaucoma
2. To characterize the social and demographic factors associated with cannabis use in these patients
3. To evaluate the differences in the prevalence of cannabis use disorder in glaucoma patients compared to healthy individuals

Project Purpose(s)

  • Disease Focused Research (glaucoma)

Scientific Approaches

We will define the cohort of glaucoma patients based on diagnosis codes. Data from the All of Us Lifestyle Survey will be used to characterize cannabis exposure.
1. We will ascertain the overall prevalence of cannabis use and then perform descriptive analyses to understand the variations of marijuana use by age, race, ethnicity, gender, and geographic region.
2. Data will be extracted for additional variables such as income, education, insurance, access to eye doctors, and access to general medical care based on responses to the All of Us Basics Survey and the Healthcare Access and Utilization Survey. We will then perform analyses to understand which variables are significantly associated with cannabis use.
3. We will query the EHR data for diagnosis codes related to marijuana use disorders and manifestations of adverse effects. We will compare the prevalence of marijuana use disorder between patients with glaucoma and healthy controls.

Anticipated Findings

1. We anticipated a higher prevalence and frequency of cannabis use in glaucoma patients as compared to healthy individual.
2. We also anticipated several socioeconomic factors to be associated with increased cannabis use in this cohort, such as increased age, certain race, certain geographic region with greater accessibility to cannabis products, lower access to medical care and eye care, etc.
3. The prevalence of marijuana use disorders and manifestations of adverse effects should be slightly higher or similar to that in healthy cohort.

This study will help fill the knowledge gap on the public health impact and risk benefit of cannabis use in glaucoma patients.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Jo-Hsuan Wu - Research Fellow, University of California, San Diego

Collaborators:

  • Bharanidharan Radha Saseendrakumar - Project Personnel, University of California, San Diego

Healthcare Access, Usage, & CVD Risk among Hispanic Adults

Among Hispanic/Latino adults • what is the relationship between health care usage, quality of healthcare, barriers to health care and lifestyle behaviors? • what is the relationship between health care usage, quality of healthcare, barriers to health care and self-report…

Scientific Questions Being Studied

Among Hispanic/Latino adults
• what is the relationship between health care usage, quality of healthcare, barriers to health care and lifestyle behaviors?
• what is the relationship between health care usage, quality of healthcare, barriers to health care and self-report CVD risk?
• what is the relationship between health care usage, quality of healthcare, barriers to health care and observed CVD risk?

Project Purpose(s)

  • Population Health
  • Other Purpose (The data will be used conduct an exploratory study on the relationship between health care (usage, quality of experience, barriers) and cardiovascular health. The findings will be published in a peer-reviewed scientific journal and presented at professional conferences.)

Scientific Approaches

A retrospective study design will be used to explore the Tier 1 All of Us data set to describe the relationship between health care experience (usage, quality of experience, barriers) and indicators of cardiovascular health among Hispanic/Latino adult participants.

Anticipated Findings

This study will provide evidence on how aspects of access to quality health care are related to cardiovascular health outcomes in the Hispanic/Latino population, a group that is currently under represented in the health behavior literature who disproportionately develops chronic conditions (type II diabetes, hypertension, obesity) that directly impact cardiovascular health compared to non-Hispanic White adults.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Access to Care

Data Set Used

Registered Tier

Research Team

Owner:

  • Sheri Rowland - Senior Researcher, University of Nebraska Medical Center

Collaborators:

  • Nicole Hollander - Project Personnel, University of Nebraska Medical Center
  • Kevin Kupzyk - Project Personnel, University of Nebraska Medical Center

Metastatic Recurrence of Breast Cancer

I intend to rerun the recurrence model developed by Banjeree et al. in the paper (Natural Language Processing Approaches to Detect the Timeline of Metastatic Recurrence of Breast Cancer) to test for generalizability and also to assess bias and fairness…

Scientific Questions Being Studied

I intend to rerun the recurrence model developed by Banjeree et al. in the paper (Natural Language Processing Approaches to Detect the Timeline of Metastatic Recurrence of Breast Cancer) to test for generalizability and also to assess bias and fairness of the model. The authors intended for their model to be reproduced, tested, and applied in research to ultimately improve cancer outcomes.

Project Purpose(s)

  • Disease Focused Research (Metastatic Breast Cancer)
  • Population Health

Scientific Approaches

I plan to run the model using the All of Us data. This process starts with identifyina cohort of individuals who have been diagnosed with breast cancer within a specified period using concepts from their medical records. I will build a dataset from the necessary concepts and finally analyze the data using the methods outlined in the original paper.

Anticipated Findings

I anticipate that the model will be reasonably generalizable to the All of Us cohort. I expect that since the All of Us cohort is weighted for underrepresented populations, there may be slightly lower or higher predictive metrics than found in the original paper.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Tina hernandez-boussard - Mid-career Tenured Researcher, Stanford University

T6 Update of COPC_New

Chronic pains are often overlapping with each other, forming COPC. The project will use all of us data to identify COPC developing trajectories and genetic mechanisms once the genetic data is available

Scientific Questions Being Studied

Chronic pains are often overlapping with each other, forming COPC. The project will use all of us data to identify COPC developing trajectories and genetic mechanisms once the genetic data is available

Project Purpose(s)

  • Disease Focused Research (Chronic overlapping pain conditions (COPC))
  • Educational

Scientific Approaches

We will use logistic regression to study the pairwise overlapping, counting the time series of the occurrence of the diseases. We will also use similar models with lasso to identify most relevant pairs and their trajectories.

Anticipated Findings

We expect to identify true COPC developing pairs and clusters, providing insights for the development of COPC conditions, and the underlying conditions, such as mental status

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth

Data Set Used

Registered Tier

Research Team

Owner:

  • Jungwei Fan - Early Career Tenure-track Researcher, Mayo Clinic
  • Haiquan Li - Early Career Tenure-track Researcher, University of Arizona
  • Edwin Baldwin - Graduate Trainee, University of Arizona

Collaborators:

  • Wenting luo - Graduate Trainee, University of Arizona
  • Reid Loeffler - Undergraduate Student, University of Arizona

Cannabis-glaucoma study

Background: 1. With recent changes in legal regulation, the prevalence of cannabis use is increasing in the US 2. Cannabis has previously shown potential IOP-reducing and neuroprotective effects in some prior studies 3. Although only anecdote evidence is available, glaucoma…

Scientific Questions Being Studied

Background:
1. With recent changes in legal regulation, the prevalence of cannabis use is increasing in the US
2. Cannabis has previously shown potential IOP-reducing and neuroprotective effects in some prior studies
3. Although only anecdote evidence is available, glaucoma patients may be increasingly using cannabis as a therapeutical option

Relevance:
There is a huge knowledge gap on the public health impact and risk benefit of cannabis use in glaucoma patients. To fill this gap, it is important to understand the epidemiology of cannabis use among patients with glaucoma, as well as the potential benefit and risk of such behavior.

Aims:
1. To estimate the prevalence and frequency of cannabis use in patients with glaucoma
2. To characterize the social and demographic factors associated with cannabis use in these patients
3. To evaluate the differences in the prevalence of cannabis use disorder in glaucoma patients compared to healthy individuals

Project Purpose(s)

  • Disease Focused Research (glaucoma)

Scientific Approaches

We will define the cohort of glaucoma patients based on diagnosis codes. Data from the All of Us Lifestyle Survey will be used to characterize cannabis exposure.
1. We will ascertain the overall prevalence of cannabis use and then perform descriptive analyses to understand the variations of marijuana use by age, race, ethnicity, gender, and geographic region.
2. Data will be extracted for additional variables such as income, education, insurance, access to eye doctors, and access to general medical care based on responses to the All of Us Basics Survey and the Healthcare Access and Utilization Survey. We will then perform analyses to understand which variables are significantly associated with cannabis use.
3. We will query the EHR data for diagnosis codes related to marijuana use disorders and manifestations of adverse effects. We will compare the prevalence of marijuana use disorder between patients with glaucoma and healthy controls.

Anticipated Findings

1. We anticipated a higher prevalence and frequency of cannabis use in glaucoma patients as compared to healthy individual.
2. We also anticipated several socioeconomic factors to be associated with increased cannabis use in this cohort, such as increased age, certain race, certain geographic region with greater accessibility to cannabis products, lower access to medical care and eye care, etc.
3. The prevalence of marijuana use disorders and manifestations of adverse effects should be slightly higher or similar to that in healthy cohort.

This study will help fill the knowledge gap on the public health impact and risk benefit of cannabis use in glaucoma patients.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Jo-Hsuan Wu - Research Fellow, University of California, San Diego

Collaborators:

  • Bharanidharan Radha Saseendrakumar - Project Personnel, University of California, San Diego

Demo - Assessment of pathogenic variants across the All of Us Research Program

We will assess the relative frequency of positive findings across AoU samples and compare the aggregate findings with those from other cohorts - e.g. GnomAD. The frequencies of positive findings will be further broken down by the ancestry background of…

Scientific Questions Being Studied

We will assess the relative frequency of positive findings across AoU samples and compare the aggregate findings with those from other cohorts - e.g. GnomAD. The frequencies of positive findings will be further broken down by the ancestry background of the participants to understand the impact of diverse backgrounds has on the rate of pathogenic findings.

Project Purpose(s)

  • Other Purpose (Demonstrate the potential utility of Researcher Workbench data by describing the frequency of known pathogenic & pharmacogenomic variants in the current genomic dataset.)

Scientific Approaches

We will annotate genomic variants from AoU participants with variant curations that have been recorded by the HGSC-CL clinical annotation team in its ‘VIP’ database or in ClinVar. We will assess the frequency of these previously-known pathogenic mutations, and provide breakdowns by ancestry.

Anticipated Findings

These data will likely identify groups that are underrepresented and overrepresented by the current knowledge of pathogenic variants, and may provide important directions for prioritizing future research. Additionally, they may point to systematic differences between the AoU resource and other resources, such as GnomAD.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Eric Venner - Early Career Tenure-track Researcher, Baylor College of Medicine

Collaborators:

  • Yi-Ju Chen - Project Personnel, Baylor College of Medicine
  • Philip Empey - Mid-career Tenured Researcher, University of Pittsburgh
  • Neha Mittal - Project Personnel, Baylor College of Medicine
  • Karynne Patterson - Project Personnel, University of Washington
  • Joshua Smith - Late Career Tenured Researcher, University of Washington
  • Divya Kalra - Project Personnel, Baylor College of Medicine
  • Andrew Haddad - Graduate Trainee, University of Pittsburgh

Duplicate of How to Work with All of Us Genomic Data (Hail - Plink)(v6)

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Scientific Questions Being Studied

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Project Purpose(s)

  • Other Purpose (Demonstrate to the All of Us Researcher Workbench users how to get started with the All of Us genomic data and tools. It includes an overview of all the All of Us genomic data and shows some simple examples on how to use these data.)

Scientific Approaches

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Anticipated Findings

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Duplicate of How to Work with Genomics Data (CRAM_Processing and IGV)

This workspace and its notebooks neither ask nor answer any scientific questions. The purpose of this workspace is to serve as a tutorial which shows how to localize the All of Us (AoU) CRAM files individually or in groups via…

Scientific Questions Being Studied

This workspace and its notebooks neither ask nor answer any scientific questions. The purpose of this workspace is to serve as a tutorial which shows how to localize the All of Us (AoU) CRAM files individually or in groups via the CRAM manifest in addition to showing how to render the Integrated Genome Viewer (IGV) on the AoU workbench to explore the CRAM files.

Project Purpose(s)

  • Methods Development

Scientific Approaches

This workspace conducts no study and applies no scientific approaches. This workspace and its notebooks are tutorials for localizing AoU CRAM files with R commands and using IGV to explore their contents. The methods and tools employed include R system commands for localizing individual CRAM files, an R for loop for localizing multiple CRAM files by referencing the manifest, and the commands for importing and rendering IGV to view the localized CRAM files.

Anticipated Findings

There will be no findings or contribution to scientific knowledge as there is no study being conducted nor questions asked. Informal 'findings' include the usability of the aforementioned tools and AoU CRAM files on the All of Us workbench.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Duplicate of How to Work with All of Us Genomic Data (Hail - Plink)(v6)

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Scientific Questions Being Studied

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Project Purpose(s)

  • Other Purpose (Demonstrate to the All of Us Researcher Workbench users how to get started with the All of Us genomic data and tools. It includes an overview of all the All of Us genomic data and shows some simple examples on how to use these data.)

Scientific Approaches

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Anticipated Findings

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

IDD disparities in cancer

Healthcare disparities in those living with intellectual or developmental disabilities (IDD) are well documented in the literature. Factors attributing to these observed inequities include but are not limited to difficulties accessing care, lack of providers prepared to meet their needs,…

Scientific Questions Being Studied

Healthcare disparities in those living with intellectual or developmental disabilities (IDD) are well documented in the literature. Factors attributing to these observed inequities include but are not limited to difficulties accessing care, lack of providers prepared to meet their needs, and the intersectionality that exists between an IDD diagnosis and poverty, race, and gender. However, the impact of a cancer diagnosis on the receipt or access to cancer care is less understood. In the US, there are no studies looking at cancer mortality in the IDD population nor studies looking at cancer-related healthcare use.

We propose the following aims: Assess the incidence of new cancer diagnoses for individuals living with IDD. Describe behaviors and screening patterns for cancers with an available screening test. Document the use of cancer-related care for individuals with a concomitant IDD and cancer diagnosis.

Project Purpose(s)

  • Population Health

Scientific Approaches

We propose to use the All of US database to abstract data. Using ICD-10 codes we will build our cohort to include individuals with an IDD and cancer diagnosis. The IDD cohort with cancer will be age-matched with a non-IDD cohort. Looking at insurance claims filed during the study timeframe, we will assess healthcare utilization among both cohorts and compare adherence to guidelines for cancer screening. Clinical and sociodemographic variables, with a special focus on variables accounting for social determinants of health, will also be abstracted to look at their impact on the outcomes of interest listed above.

Anticipated Findings

We anticipate find that a disparity in cancer care utilization does exist within the population with intellectual and developmental disabilities when compared to the rest of the population. We hope to provide a meaningful contribution to the paucity of data in the US by outlining healthcare utilization for individuals living with IDD and cancer. As previously reported, disparities in the care that this community receives is well documented. However, the exact phase of care at highest risk for disparities in the continuum of cancer care is not yet identified. By examining the differences in utilization of cancer-related care we can provide a more detailed understanding of the areas of highest need in the IDD community (i.e prevention vs screening vs treatment vs survivorship). We also believe that this will provide a foundation for future research endeavors looking to create targeted interventions to address the disparities identified.

Demographic Categories of Interest

  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

  • Steven Hiek - Project Personnel, University of California, Irvine
  • Eduardo Garcia - Research Fellow, University of California, Irvine
  • Argyrios Ziogas - Late Career Tenured Researcher, University of California, Irvine
  • Amanda Leung - Graduate Trainee, University of California, Irvine

Duplicate of Preparing to Investigate the Genetics of NDDs in the AoU Cohort

Querying the AoU WGS data to quantify the statistical power of using the AoU cohort dataset to independently investigate the genetic architecture of neurodegenerative diseases such as Alzheimer’s Disease (AD), Parkinson’s Disease (PD), Frontotemporal Dementia (FTD), and other age-associated neurodegenerative…

Scientific Questions Being Studied

Querying the AoU WGS data to quantify the statistical power of using the AoU cohort dataset to independently investigate the genetic architecture of neurodegenerative diseases such as Alzheimer’s Disease (AD), Parkinson’s Disease (PD), Frontotemporal Dementia (FTD), and other age-associated neurodegenerative diseases.

Project Purpose(s)

  • Disease Focused Research (neurodegenerative disease, Ageing)
  • Ancestry

Scientific Approaches

We will use this workspace to become familiar with how to retrieve phenotypic and genomic data from the AoU cohort. Using Python and/or R (via JupyterNotebooks) we will perform a prospective statistical power analysis of the AoU dataset to determine the applicability of future, secondary analysis to help bridge the gap between genetic association and functional impact with methods such as GWAS, PheWAS, and Polygenic Risk Score (PRS). Additionally, our lab is interested in investigating the role of genetics in neurodegenerative diseases in underrepresented populations and therefore will investigate whether the diversity and size of the AoU cohort will allow us to perform population-specific analysis of the genetic contributions to neurodegenerative diseases.

Anticipated Findings

We expect to find that the size of the AoU cohort will have the statistical power necessary to perform our desired secondary analysis.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Eve Gardner - Project Personnel, Van Andel Research Institute

Disparities in Cervical Cancer Screening Among Hispanic Women

We propose using the All of Us Research Database to answer two primary scientific questions: 1. What was the impact of the COVID-19 pandemic on the cervical cancer screening behaviors of Hispanic women, compared to non-Hispanic White women, in the…

Scientific Questions Being Studied

We propose using the All of Us Research Database to answer two primary scientific questions:
1. What was the impact of the COVID-19 pandemic on the cervical cancer screening behaviors of Hispanic women, compared to non-Hispanic White women, in the United States?
a. Is this impact different between subgroups? We will specifically examine geographic regions of the country, immigration status, socioeconomic variables such as home community area deprivation index, and the impact of COVID-19 on both mental and physical health.
2. Have Hispanic women re-emerged into preventative screenings as the COVID-19 pandemic has continued?
a. Comparing screening rates for 2019, 2020 and 2021, we plan to examine if there is a responsive surge in catch-up screenings among Hispanic women, compared to non-Hispanic White women.

Project Purpose(s)

  • Population Health

Scientific Approaches

Our study will include Hispanic and non-Hispanic White women ages 18+ who are eligible for cervical cancer screening during 2019, 2020, and 2021. Cervical cancer screening will be identified based on the procedure of cervical cancer screening. The existence of COVID-19 will be classified firstly based on the onset of the COVID-19 in the U.S. Specific COVID-19 impact will be defined based on COPE survey questions. Descriptive analyses will be conducted to assess the screening rate over the study period and across US regions. The Cochrane-Armitage test will be performed to detect a linear trend of the screening rate. Multivariable logistic regression will be conducted to assess the association of COVID-19 impact and the likelihood of screening between Hispanic and non-Hispanic White women, adjusting for sociodemographic and socioeconomic factors. Maps showing the screening rates during 2019-2021 across US regions will be created using visualization software, if possible.

Anticipated Findings

Studies have shown that the COVID-19 pandemic has led to a significant decrease in cervical cancer screenings across the country. However, few studies have examined the impact of COVID-19 on the Hispanic population, which even without the pandemic, bears a disproportionate cervical cancer disease burden. It is anticipated that cervical cancer incidence will sharply increase in the future without a responsive surge in screening as catch-up. Furthermore, the direct impact of COVID-19 has likely exacerbated the existing disparities in the cervical cancer burden due to compounding socioeconomic factors which were affected by the pandemic, such as insurance status, employment, and income. Our project is unique in that it addresses a vulnerable population that was made more vulnerable during the COVID-19 pandemic. It is also distinctive in that we aim to use these results to inform our own outreach efforts to expand cervical cancer screening among Hispanic women in our community.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Tong Han Chung - Project Personnel, University of Texas Health Science Center, Houston
  • Linh Nguyen - Project Personnel, University of Texas Health Science Center, Houston

Collaborators:

  • Yen-Chi Le - Other, University of Texas Health Science Center, Houston
  • Ogochukwu Ezeigwe - Graduate Trainee, University of Texas Health Science Center, Houston

Epigenetics of Rheumatoid Arthritis in Hispanic/Latino Population

In Hispanic/Latino adults who have a family history of rheumatoid arthritis, what is the relationship between a healthy lifestyle and the development of rheumatoid arthritis?

Scientific Questions Being Studied

In Hispanic/Latino adults who have a family history of rheumatoid arthritis, what is the relationship between a healthy lifestyle and the development of rheumatoid arthritis?

Project Purpose(s)

  • Educational

Scientific Approaches

Population is Hispanic/Latino Adults who have a family history of RA. I will use surveys on family medical history, personal medical history, lifestyle, and basics to identify different lifestyle factors to assess and demographics of the sample.

Anticipated Findings

Anticipate discovering health behaviors that promote or reduce the development of RA in those that are genetically predisposed. This can help with prevention counseling in those with a family history of autoimmune disorders RA. Will develop research with a minority population which is a general gap in research.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Sheri Rowland - Senior Researcher, University of Nebraska Medical Center

Individual- and neighborhood-level chronic stressors and lipids

Previous research has demonstrated a connection between chronic stress and health outcomes such as cardiovascular disease. Chronic stress is also known to associate with inflammation and other markers of biologic dysfunction. We seek to investigate the relationship between chronic stress…

Scientific Questions Being Studied

Previous research has demonstrated a connection between chronic stress and health outcomes such as cardiovascular disease. Chronic stress is also known to associate with inflammation and other markers of biologic dysfunction. We seek to investigate the relationship between chronic stress and lipid levels in All of Us participants. We will also explore whether chronic stressors impact the effectiveness of cholesterol-lowering treatments. These findings have relevant public health and clinical implications. By better understanding the connection between individual- and neighborhood-level stressors and lipids, work can be done to tailor stress-reduction and community-based interventions to improve health outcomes. Additionally, addressing chronic stressors can be incorporated into clinical practice to prevent cardiovascular disease and poor outcomes.

Project Purpose(s)

  • Population Health
  • Social / Behavioral

Scientific Approaches

We plan to use “Social Determinants of Health ” survey data and lipid, statin therapy, and demographic data from the registered tier. We will use linear regression modeling and other statistical methods to examine the relationship between chronic stressors and lipid levels, explore effect modification by race/ethnicity and gender, and investigate the impact of chronic stress on statin therapy effectiveness.

Anticipated Findings

We anticipate that individual- and neighborhood-level chronic stressors, such as discrimination or neighborhood disorder, will associate with lipid levels in All of Us participants, with race/ethnicity and gender potentially impacting these patterns. We also anticipate statin therapy will be less effective in lowering lipids levels in individuals experiencing high burden of chronic stress. These findings will provide insight into the biological impact of the social determinants and how these changes might play a role in cardiovascular disease disparities.

Demographic Categories of Interest

  • Race / Ethnicity
  • Sex at Birth
  • Gender Identity
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

Duplicate of Antidepressant Response Datav6

Treatment of major depressive disorder typically begins with antidepressants. Response to antidepressants is highly variable. Around half of individuals will not respond to the first antidepressant they are prescribed, starting a long treatment odyssey to find the a drug or…

Scientific Questions Being Studied

Treatment of major depressive disorder typically begins with antidepressants. Response to antidepressants is highly variable. Around half of individuals will not respond to the first antidepressant they are prescribed, starting a long treatment odyssey to find the a drug or drug combination that works for them. EHRs provide detailed information on the medications individuals take, however, it is not always clear in an EHR how well a patient responds to a particular medication. Sometimes, physicians will administer a survey to a patient that aims to quantify their depression symptoms (patient health questionnaire, or PHQ). The responses to the PHQ are stored in EHRs. At Vanderbilt, we developed an algorithm that aims to infer antidepressant treatment response based on drug switching. We hope to implement our algorithm in the All of Us data and then use PHQ responses to validate how well our response variables track with survey questions on depression.

Project Purpose(s)

  • Disease Focused Research (major depressive disorder)

Scientific Approaches

Our approach is to implement our drug switching algorithm for antidepressants and then use PHQ outcomes to determine how well our response outcome tracks with depression symptoms. This will require longitudinal data on antidepressants and PHQ responses.

Anticipated Findings

We hope that our algorithm will be a valid proxy for treatment response that can help increases sample sizes in antidepressant response studies. Future studies could integrate genetic information to determine if there are genetic variants contributing to treatment response. Overall, we hope the algorithm can be used by other EHR researchers and can serve as a paradigm for future treatment response algorithms for other medications.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Complex Traits GWAS and Polygenic Scores (v6)

Genome-wide association studies (GWAS) have identified tens of thousands of genotype-phenotype associations for human complex traits. Polygenic risk score (PRS) for a trait is typically calculated as a weighted sum of trait-associated allele counts across numerous loci in the genome,…

Scientific Questions Being Studied

Genome-wide association studies (GWAS) have identified tens of thousands of genotype-phenotype associations for human complex traits. Polygenic risk score (PRS) for a trait is typically calculated as a weighted sum of trait-associated allele counts across numerous loci in the genome, where the weight is obtained from a corresponding GWAS. PRS is an effective tool to quantify the aggregated genetic propensity for a trait or disease. With rapid advances in GWAS sample size and statistical methodologies, PRS has shown substantially improved prediction accuracy and great potential in disease risk screening and precision medicine. The main goals of this project are 1) to run GWAS on numerous complex traits to identify and interpret genetic associations through integrative modeling of annotation data, and 2) to produce a set of PRS for hundreds of complex traits using newly released genomic data in AllofUs.

Project Purpose(s)

  • Social / Behavioral
  • Methods Development
  • Ancestry

Scientific Approaches

We will use the softwares like Hail, Regenie, and/or BOLT-LMM to run GWAS. We will implement a state-of-the-art method named PRS-CS to compute PRS for each GWAS trait. We will benchmark and optimize the performance of PRS models using a summary statistics-based cross-validation approach called PUMAS developed by our group (Zhao et al. Genome Biology 22(1), 2021). AllofUs genomic data will undergo rigorous quality control (QC) procedures including removing variants with lower sequencing depth and variant calling quality.

Anticipated Findings

We will produce GWAS summary statistics for numerous complex traits and disorders. We will also produce PRS for all individuals with whole-genome sequencing (WGS) data in AllofUs. Every individual will have hundreds of scores quantifying their genetic propensity for a large collection of diseases and traits. These scores will be immediately applicable in future studies. For example, one planned future study is to integrate breast cancer PRS with electronic health record data in AllofUs to improve risk screening accuracy.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Yuchang Wu - Research Fellow, University of Wisconsin, Madison

Collaborators:

  • Qiongshi Lu - Early Career Tenure-track Researcher, University of Wisconsin, Madison
  • YAO FU - Undergraduate Student, University of Wisconsin, Madison

Duplicate of Wearables Data and COVID-19 - Controlled Tier

Our primary goal is to understand the interaction between activity levels and the development, progression, and societal effects of COVID-19. These analyses will generate hypotheses guiding clinical and research interventions focused on activity and sleep to reduce morbidity and mortality…

Scientific Questions Being Studied

Our primary goal is to understand the interaction between activity levels and the development, progression, and societal effects of COVID-19. These analyses will generate hypotheses guiding clinical and research interventions focused on activity and sleep to reduce morbidity and mortality in patients seeking care.

Project Purpose(s)

  • Population Health
  • Social / Behavioral

Scientific Approaches

We will examine the relationship between daily activity (steps, activity intensity) over time and the prevalence of COVID-19. We will use the Fitbit data, EHR-curated diagnoses, laboratory values, quality of life survey results, and clinical outcomes (hospitalizations/mortality).

Anticipated Findings

We may find substantial variation in activity and disease prevalence/severity by socioeconomic status and/or location which would motivate studies/interventions to reduce these health disparities.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

Duplicate of Diabetic Retinopathy

Broadly, we will be studying the factors associated with diabetic retinopathy amongst patients with diabetes. More specifically, we will consider the factors associated with vision loss in patients with diabetic retinopathy. Identifying these factors may generate knowledge that will help…

Scientific Questions Being Studied

Broadly, we will be studying the factors associated with diabetic retinopathy amongst patients with diabetes. More specifically, we will consider the factors associated with vision loss in patients with diabetic retinopathy. Identifying these factors may generate knowledge that will help us better understand the relationship between the severity of diabetic retinopathy and vision impairment. Also, the results can help with modifying health care systems to enhance eye care utilization among diabetic patients.

Project Purpose(s)

  • Disease Focused Research (diabetic retinopathy)

Scientific Approaches

The approach to our retrospective study will consist of the following: Construction of datasets using patients from the All of Us dataset with a diagnosis of diabetes. After compiling the dataset, we will analyze it utilizing Python in Jupyter Notebook. The analysis will start by gathering descriptive statistics for the dataset. Once complete, the focus of the study will shift to developing predictive models to identify those at the highest risk of vision loss or diabetic retinopathy.

Anticipated Findings

From this study, we expect to be able to identify the factors associated with diabetic retinopathy and vision loss, improving our understanding of the relationship between these outcomes. Also, we expect the study to yield insights into factors that influence the progression of diabetic retinopathy. Ultimately, these findings will help with developing strategies to improve patient outcomes.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

  • Brandon Grover - Undergraduate Student, University of Wisconsin, Madison

Collaborators:

  • Mozhdeh Bahrainian - Project Personnel, University of Wisconsin, Madison

covid

We are still exploring the data at this stage to explore the long-term effects of COVID-19.

Scientific Questions Being Studied

We are still exploring the data at this stage to explore the long-term effects of COVID-19.

Project Purpose(s)

  • Disease Focused Research (COVID-19)

Scientific Approaches

We plan to compare the EHR, survey, and genetic data from participants who have been infected with COVID-19 and continue to experience symptoms with data from participants who recovered quickly using the all of us database.

Anticipated Findings

We hope to find the possible reasons for different long-term effects of COVID-19 and therefore contribute to precision medicine.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Yahui Zhang - Graduate Trainee, University of Michigan

Oculoplastic Conditions Prevalence

Research questions: What is the prevalence of oculoplastic conditions in the general population? Is All of Us a useful research tool in determining this? Question importance: Understanding disease prevalence is important in understanding a disease itself. Learning the prevalence of…

Scientific Questions Being Studied

Research questions: What is the prevalence of oculoplastic conditions in the general population? Is All of Us a useful research tool in determining this?
Question importance: Understanding disease prevalence is important in understanding a disease itself. Learning the prevalence of oculoplastic conditions allows for more informed resource allocation, clinician education, and contextualization for physicians, researchers, and patients. Thus, if the All of Us database is a useful resource for determining oculoplastic condition prevalence, this would be beneficial to patients and the oculoplastics field.

Project Purpose(s)

  • Disease Focused Research (Oculoplastic conditions)

Scientific Approaches

I will search the general dataset using broad terms related to oculoplastics. Prevalence of resulting diagnoses in the dataset will be recorded and grouped into larger categories.

Anticipated Findings

I anticipate finding prevalence in the general population and within different diagnostic categories to align with previous literature. Limited data exists on the prevalence of oculoplastic conditions in the general population, so this information will hopefully meaningfully contribute to the understanding of these conditions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Sara Suhl - Graduate Trainee, Columbia University

Impact of global and local ancestry on genome-wide association V6 studies

We have recently founded the Center for Admixture Science and Technology (CAST), whose overarching goal is to understand health disparities between individuals of different races, ethnicities and socio-economic status. We will take advantage of the diversity of the AoU cohort…

Scientific Questions Being Studied

We have recently founded the Center for Admixture Science and Technology (CAST), whose overarching goal is to understand health disparities between individuals of different races, ethnicities and socio-economic status.
We will take advantage of the diversity of the AoU cohort to develop and apply new methods that incorporate global and local ancestry, single nucleotide polymorphisms (SNPs), complex variants, including short tandem repeats (STRs), variable number tandem repeats (VNTRs), haplotypes and HLA types, and social determinants in multivariate models to perform genome-wide association studies (GWAS) and investigate the associations between genetic variation and complex traits and/or disease. Our efforts will culminate in the development of ancestry-aware polygenic risk scores, which we will benchmark to determine their accuracy across ancestries and admixed individuals.

Project Purpose(s)

  • Methods Development
  • Control Set
  • Ancestry
  • Ethical, Legal, and Social Implications (ELSI)

Scientific Approaches

Our goal is to understand the impact of global ancestry and local ancestry in traits of interest. towards the identification of actionable genetic determinants, and to define whether there are any modifiable social determinants for important traits. Our models are relevant to both binary outcomes (e.g., cancer) and continuous phenotypes (e.g., cholesterol levels).

We will use a generalized linear mixed model. We will be modeling various fixed effects (genetic variants including SNPs, STRs, VNTRs, haplotypes, and HLA types, as well as sex, age and social determinants), and random effects (based on similarity between individuals) in linear or logistic models for continuous (e.g., cholesterol levels) or binary (e.g., cancer) dependent variables, respectively with a corresponding specification of a link function.

We will develop an ancestry-aware polygenic risk score (aaPRS). Typically, PRS is computed by summing over all independent variants meeting a prespecified significance level.

Anticipated Findings

Existing association testing frameworks (eQTL, GWAS, and PRS) handle global ancestry, but do not typically account for the patchwork of local ancestry characteristic of recently admixed populations. Here, we will develop frameworks incorporating both global and local ancestry into association tests and risk prediction. Our analyses will result in better stratification of each individual’s risk for disease and will provide a framework to investigate and characterize the genetic associations for admixed and diverse individuals. In general, we will be among the first to incorporate complex variant types (STRs, VNTRs, haplotypes and HLA types) into genome-wide association studies and polygenic risk scores.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Melissa Gymrek - Early Career Tenure-track Researcher, University of California, San Diego
  • Matteo D'Antonio - Project Personnel, University of California, San Diego

Collaborators:

  • Yang Li - Project Personnel, University of California, San Diego
  • Michael Lamkin - Graduate Trainee, University of California, San Diego
  • Arya Massarat - Graduate Trainee, University of California, San Diego

Test V2

The current project is trying to attempt to create some tables for an NIH grant proposal on polygenic risk scores for hypertension in diverse populations.

Scientific Questions Being Studied

The current project is trying to attempt to create some tables for an NIH grant proposal on polygenic risk scores for hypertension in diverse populations.

Project Purpose(s)

  • Ancestry

Scientific Approaches

For the current projects we are making some cross-tabs between the number of All of Us cohort participants that have genetic data, and a variety of phenotypes (hypertension, alcohol, smoking, exercise, education). These tables are used as supporting materials for a grant proposal. No other analyses will be carried out as poart of this workspace.

Anticipated Findings

The goal of these tables is to submit a grant proposal this is competitive, and hopefully gets funded. If this proposal gets funded we will get back to the All of Us workbench and create a workspace that has more targeted analyses goals. However, as mentioned above, the current workspace will only generate a couple of descriptive tables.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Charles Kooperberg - Late Career Tenured Researcher, Fred Hutchinson Cancer Research Center
1 - 25 of 1934
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.