Srushti Gangireddy

Project Personnel, Vanderbilt University Medical Center

7 active projects

Duplicate of Demo - Siloed Analysis of All of Us and UK Biobank Genomic Data

Historically, researchers responded to limitations in genomic data sharing policy and practice by conducting meta analysis on summary outputs from isolated genomic datasets. Recent work has demonstrated the increased power of individual-level genetic analysis on pooled datasets. In addition, advancements…

Scientific Questions Being Studied

Historically, researchers responded to limitations in genomic data sharing policy and practice by conducting meta analysis on summary outputs from isolated genomic datasets. Recent work has demonstrated the increased power of individual-level genetic analysis on pooled datasets. In addition, advancements in data access and sharing policies coupled with technological advancements in cloud-based environments for data access and analysis have opened up new possibilities for pooled analysis of large-scale genomic datasets. The NIH All of Us Research Program and UK Biobank are two leading examples of large, population scale studies which combine genomic data with deep phenotypic health data. There is a grand opportunity to demonstrate how the world’s largest research-ready biomedical datasets can create more value together and advance discovery in genome science.

Project Purpose(s)

  • Other Purpose (This is a demonstration project meant to support research with All of Us Genomic Data)

Scientific Approaches

The primary goal of this project is to demonstrate the potential of the All of Us Researcher Workbench for pooled analyses of All of Us and UK Biobank data. Specifically, we aim to: 1. Develop and describe an approved, secure path for connecting UK Biobank data to the All of Us Researcher Workbench. 2. Conduct a genome-wide association study of blood lipids on the pooled dataset aimed at demonstrating that biomedical researchers can be more productive when permitted to analyze the union of the cohorts, as opposed to computing aggregate results in separate data silos for each cohort and then combining those aggregates.

Anticipated Findings

The secondary goal of this project is to demonstrate and measure the experience when the same analyses are repeated in a siloed manner. Specifically we aim to: 3. Repeat the previously described genome-wide association study on the All of Us Researcher Workbench when working with the All of Us data and on UK Biobank’s DNAnexus when working with the UK Biobank data. 4. Conduct a meta analysis on the aggregate results for each cohort (in accordance with each program’s data use policies) and compare the result of combining those aggregates to the results from the pooled analysis. Evaluate not only differences in results, but also differences in analysis cost and analyst productivity.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Practice Notebook to Explore AoU dataset

This project will explore the scope of patients with COVID-19 and the characteristics of patients with PASC.

Scientific Questions Being Studied

This project will explore the scope of patients with COVID-19 and the characteristics of patients with PASC.

Project Purpose(s)

  • Educational
  • Other Purpose (practice notebook to familiarize with RW)

Scientific Approaches

We will apply algorithms developed by the RECOVER PCORnet Adult Cohort and compare the overlap in cohorts with the set derived though the N3C algorithm

Anticipated Findings

We expect to find a high degree of concordance between the RECOVER Adult Cohort algorithm and the N3C algorithm, even though the approaches were developed through different machine learning methods on different source patient data sets

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Srushti Gangireddy - Project Personnel, Vanderbilt University Medical Center
  • Mark Weiner - Mid-career Tenured Researcher, Cornell University
  • Hiral Master - Project Personnel, All of Us Program Operational Use

Exploring_AOU_Data

This study will identify patients tested COVID positive to identify conditions occurring after covid-19. We would like to explore genomic and fitbit data and see if they add any value in determining the status of long covid.

Scientific Questions Being Studied

This study will identify patients tested COVID positive to identify conditions occurring after covid-19. We would like to explore genomic and fitbit data and see if they add any value in determining the status of long covid.

Project Purpose(s)

  • Disease Focused Research (long covid-19)

Scientific Approaches

This project uses Recover algorithm to identify patients with long-covid.
We are using xgboost libraries to run the model developed by n3c.

Anticipated Findings

We plan to study how different data like fitbit and genomic data contribute towards the patient having or not having long covid.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Srushti Gangireddy - Project Personnel, Vanderbilt University Medical Center
  • QiPing Feng - Early Career Tenure-track Researcher, Vanderbilt University Medical Center

AOU_Recover_Long_Covid_v6

Identify potential long-COVID patients with high accuracy, achieving areas under the receiver operator characteristic curve using the National COVID Cohort Collaborative’s (N3C) EHR repository, we developed XGBoost machine learning (ML) models.

Scientific Questions Being Studied

Identify potential long-COVID patients with high accuracy, achieving areas under the receiver operator characteristic curve using the National COVID Cohort Collaborative’s (N3C) EHR repository, we developed XGBoost machine learning (ML) models.

Project Purpose(s)

  • Disease Focused Research (Long COVID)

Scientific Approaches

Identify potential long-COVID patients with high accuracy, achieving areas under the receiver operator characteristic curve using the National COVID Cohort Collaborative’s (N3C) EHR repository, we developed XGBoost machine learning (ML) models.

Anticipated Findings

Identify potential long-COVID patients with high accuracy, achieving areas under the receiver operator characteristic curve using the National COVID Cohort Collaborative’s (N3C) EHR repository, we developed XGBoost machine learning (ML) models.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • WeiQi Wei - Other, All of Us Program Operational Use
  • Vern Kerchberger - Early Career Tenure-track Researcher, Vanderbilt University Medical Center
  • Srushti Gangireddy - Project Personnel, Vanderbilt University Medical Center
  • Mark Weiner - Mid-career Tenured Researcher, Cornell University
  • Hiral Master - Project Personnel, All of Us Program Operational Use
  • Gabriel Anaya - Administrator, National Heart, Lung, and Blood Institute (NIH - NHLBI)

Collaborators:

  • Chris Lunt - Other, All of Us Program Operational Use

Implementing Recover Algorithm on AOU Data

Identify potential long COVID patients among three groups in the database: All COVID-19 patients, patients hospitalized with COVID-19, and patients who had COVID-19 but were not hospitalized. The models proved to be accurate, as people identified as at risk for…

Scientific Questions Being Studied

Identify potential long COVID patients among three groups in the database: All COVID-19 patients, patients hospitalized with COVID-19, and patients who had COVID-19 but were not hospitalized. The models proved to be accurate, as people identified as at risk for long COVID were similar to patients seen at long COVID clinics.

Project Purpose(s)

  • Disease Focused Research (Long COVID)

Scientific Approaches

XGBoost machine learning model is developed to identify potential patients with long COVID.
Base population is defined as any non-deceased adult patient (age ≥18 years) with either an International Classification of Diseases-10-Clinical Modification COVID-19 diagnosis code (U07.1) from an inpatient or emergency visit, or a positive SARS-CoV-2 PCR or antigen test, and for whom at least 90 days have passed since COVID-19 index date.
The model examines demographics, health-care utilization, diagnoses, and medications for adults with COVID-19.

Anticipated Findings

Identify with high accuracy, patients who potentially have long COVID. Find the important features.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • WeiQi Wei - Other, All of Us Program Operational Use
  • Vern Kerchberger - Early Career Tenure-track Researcher, Vanderbilt University Medical Center
  • Srushti Gangireddy - Project Personnel, Vanderbilt University Medical Center
  • Hiral Master - Project Personnel, All of Us Program Operational Use
  • Gabriel Anaya - Administrator, National Heart, Lung, and Blood Institute (NIH - NHLBI)

Collaborators:

  • Chris Lunt - Other, All of Us Program Operational Use

RECOVER+AoU

The goal of this initial cross-platform testing effort is focused on expanding the analytical capability of available data sources that have collected data on SARS-CoV-2. As we gather data across the US, we can use independent data sources to better…

Scientific Questions Being Studied

The goal of this initial cross-platform testing effort is focused on expanding the analytical capability of available data sources that have collected data on SARS-CoV-2. As we gather data across the US, we can use independent data sources to better understand PASC in our population and identify possible interventions. As a first step, we hope to leverage available RECOVER data tools and apply within the All of Us Researcher Workbench to assess cross-platform interoperability and analytical equivalence. This would provide a path to engage our research community and guide research towards our understanding of PASC.

Project Purpose(s)

  • Population Health
  • Methods Development
  • Control Set
  • Other Purpose (Testing PASC ML Algorithm from N3C-RECOVER in AoU Platform)

Scientific Approaches

Bring existing data query code and data analytics code from the RECOVER researcher team into the All of Us Researcher Workbench. Use “equivalent” code sets to explore and expand our understanding of PASC and its effects on the US population. Share reproducible findings through programming “notebook” and analysis of standardized datasets (OMOP).

Anticipated Findings

This research activity will be developed in conjunction with an awareness campaign of the collaborative efforts undertaken by both RECOVER and AoU. We intend to highlight the available datasets with SARS-CoV-2 data, as well as the cloud-based researcher workspaces (RECOVER, AoU). With the awareness campaign and cross-platform testing, we intent to create an on-ramp for experienced and young researchers within two large and diverse datasets.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • WeiQi Wei - Other, All of Us Program Operational Use
  • Vern Kerchberger - Early Career Tenure-track Researcher, Vanderbilt University Medical Center
  • Srushti Gangireddy - Project Personnel, Vanderbilt University Medical Center
  • Hiral Master - Project Personnel, All of Us Program Operational Use
  • Gabriel Anaya - Administrator, National Heart, Lung, and Blood Institute (NIH - NHLBI)
  • Chris Lunt - Other, All of Us Program Operational Use

Srushti_LongCovid

Train Machine Learning models to identify potential long-COVID patients among (1) all COVID-19 patients, (2) patients hospitalized with COVID-19, and (3) patients who had COVID-19 but were not hospitalized.

Scientific Questions Being Studied

Train Machine Learning models to identify potential long-COVID patients among (1) all COVID-19 patients, (2) patients hospitalized with COVID-19, and (3) patients who had COVID-19 but were not hospitalized.

Project Purpose(s)

  • Disease Focused Research (Long COVID)

Scientific Approaches

To reflect that long-COVID may look different depending on the severity of the patient’s acute COVID-19, we built three different ML models using the three-site subset: (1) all patients, (2) patients who had been hospitalized with acute COVID-19, and (3) patients who were not hospitalized. The intent of each model is to identify the patients most likely to have long-COVID, using attendance at a long-COVID specialty clinic as a proxy for long-COVID diagnosis. To train and test each model, patients were randomly sampled to yield similar patient counts in both classes (long-COVID clinic patients and patients who did not attend the long-COVID clinic). For the all-patient model, data were also sampled to yield similar numbers of hospitalized and non-hospitalized patients.

Anticipated Findings

The combined demographics of the long-COVID clinic patients show significant differences from the COVID-19 patients at those sites who did not attend the long-COVID clinic (third and fourth columns of Table 1). Notably, non-hospitalized long-COVID clinic patients are disproportionately female.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

1 - 7 of 7
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.