Srushti Gangireddy
Project Personnel, Vanderbilt University Medical Center
8 active projects
V7 PASC Workspace
Scientific Questions Being Studied
This project will explore the scope of patients with COVID-19 and the characteristics of patients with PASC.
Project Purpose(s)
- Educational
- Ancestry
- Other Purpose (practice notebook to familiarize with RW)
Scientific Approaches
We will apply algorithms developed by the RECOVER PCORnet Adult Cohort and compare the overlap in cohorts with the set derived though the N3C algorithm
Anticipated Findings
We expect to find a high degree of concordance between the RECOVER Adult Cohort algorithm and the N3C algorithm, even though the approaches were developed through different machine learning methods on different source patient data sets
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierResearch Team
Owner:
- Xingbo Wang - Research Fellow, Cornell University
- Srushti Gangireddy - Project Personnel, Vanderbilt University Medical Center
- Mark Weiner - Mid-career Tenured Researcher, Cornell University
- Hiral Master - Project Personnel, All of Us Program Operational Use
- Aashri Aggarwal - Undergraduate Student, Cornell University
Collaborators:
- Lina Sulieman - Other, All of Us Program Operational Use
AOU_Recover_Long_Covid_v6
Scientific Questions Being Studied
The purpose of this workspace was to implement the published XGBoost machine learning (ML) model, which was developed using the National COVID Cohort Collaborative’s (N3C) EHR repository to identify potential patients with PASC/Long COVID in All of Us Research Program.
Project Purpose(s)
- Disease Focused Research (Long COVID)
Scientific Approaches
To achieve this objective, data science workflows were used to apply ML algorithms on the Researcher Workbench. This effort allowed an expansion in the number of participants used to evaluate the ML models used to identify risk of PASC/Long COVID and also serve to validate the efforts of one team and providing insight to other teams. These models were implemented within the All of Us Controlled Tier data (C2022Q2R2), which was last refreshed on June 22, 2022. We intend to provide a step-by-step guide for the implementation of N3C's ML Model for identification of PASC/Long COVID Phenotype in the All of Us dataset.
Anticipated Findings
We intend to provide a step-by-step guide for the implementation of N3C's ML Model for identification of PASC/Long COVID Phenotype in the All of Us dataset.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierResearch Team
Owner:
- WeiQi Wei - Other, All of Us Program Operational Use
- Vern Kerchberger - Early Career Tenure-track Researcher, Vanderbilt University Medical Center
- Srushti Gangireddy - Project Personnel, Vanderbilt University Medical Center
- Mark Weiner - Mid-career Tenured Researcher, Cornell University
- Hiral Master - Project Personnel, All of Us Program Operational Use
- Gabriel Anaya - Administrator, National Heart, Lung, and Blood Institute (NIH - NHLBI)
- David Mohs - Other, All of Us Program Operational Use
- Christopher Lord - Project Personnel, All of Us Program Operational Use
- Chenchal Subraveti - Project Personnel, All of Us Program Operational Use
Collaborators:
- Jun Qian - Other, All of Us Program Operational Use
- Chris Lunt - Other, All of Us Program Operational Use
Practice Notebook to Explore AoU dataset
Scientific Questions Being Studied
This project will explore the scope of patients with COVID-19 and the characteristics of patients with PASC.
Project Purpose(s)
- Educational
- Other Purpose (practice notebook to familiarize with RW)
Scientific Approaches
We will apply algorithms developed by the RECOVER PCORnet Adult Cohort and compare the overlap in cohorts with the set derived though the N3C algorithm
Anticipated Findings
We expect to find a high degree of concordance between the RECOVER Adult Cohort algorithm and the N3C algorithm, even though the approaches were developed through different machine learning methods on different source patient data sets
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierResearch Team
Owner:
- Srushti Gangireddy - Project Personnel, Vanderbilt University Medical Center
- Mark Weiner - Mid-career Tenured Researcher, Cornell University
- Hiral Master - Project Personnel, All of Us Program Operational Use
Collaborators:
- Aashri Aggarwal - Undergraduate Student, Cornell University
Exploring_AOU_Data
Scientific Questions Being Studied
This study will identify patients tested COVID positive to identify conditions occurring after covid-19. We would like to explore genomic and fitbit data and see if they add any value in determining the status of long covid.
Project Purpose(s)
- Disease Focused Research (long covid-19)
Scientific Approaches
This project uses Recover algorithm to identify patients with long-covid.
We are using xgboost libraries to run the model developed by n3c.
Anticipated Findings
We plan to study how different data like fitbit and genomic data contribute towards the patient having or not having long covid.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierResearch Team
Owner:
- Srushti Gangireddy - Project Personnel, Vanderbilt University Medical Center
- QiPing Feng - Early Career Tenure-track Researcher, Vanderbilt University Medical Center
Collaborators:
- Elliot Outland - Project Personnel, Vanderbilt University Medical Center
Duplicate of Demo - Siloed Analysis of All of Us and UK Biobank Genomic Data
Scientific Questions Being Studied
Historically, researchers responded to limitations in genomic data sharing policy and practice by conducting meta analysis on summary outputs from isolated genomic datasets. Recent work has demonstrated the increased power of individual-level genetic analysis on pooled datasets. In addition, advancements in data access and sharing policies coupled with technological advancements in cloud-based environments for data access and analysis have opened up new possibilities for pooled analysis of large-scale genomic datasets. The NIH All of Us Research Program and UK Biobank are two leading examples of large, population scale studies which combine genomic data with deep phenotypic health data. There is a grand opportunity to demonstrate how the world’s largest research-ready biomedical datasets can create more value together and advance discovery in genome science.
Project Purpose(s)
- Other Purpose (This is a demonstration project meant to support research with All of Us Genomic Data)
Scientific Approaches
The primary goal of this project is to demonstrate the potential of the All of Us Researcher Workbench for pooled analyses of All of Us and UK Biobank data. Specifically, we aim to: 1. Develop and describe an approved, secure path for connecting UK Biobank data to the All of Us Researcher Workbench. 2. Conduct a genome-wide association study of blood lipids on the pooled dataset aimed at demonstrating that biomedical researchers can be more productive when permitted to analyze the union of the cohorts, as opposed to computing aggregate results in separate data silos for each cohort and then combining those aggregates.
Anticipated Findings
The secondary goal of this project is to demonstrate and measure the experience when the same analyses are repeated in a siloed manner. Specifically we aim to: 3. Repeat the previously described genome-wide association study on the All of Us Researcher Workbench when working with the All of Us data and on UK Biobank’s DNAnexus when working with the UK Biobank data. 4. Conduct a meta analysis on the aggregate results for each cohort (in accordance with each program’s data use policies) and compare the result of combining those aggregates to the results from the pooled analysis. Evaluate not only differences in results, but also differences in analysis cost and analyst productivity.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierImplementing Recover Algorithm on AOU Data
Scientific Questions Being Studied
Identify potential long COVID patients among three groups in the database: All COVID-19 patients, patients hospitalized with COVID-19, and patients who had COVID-19 but were not hospitalized. The models proved to be accurate, as people identified as at risk for long COVID were similar to patients seen at long COVID clinics.
Project Purpose(s)
- Disease Focused Research (Long COVID)
Scientific Approaches
XGBoost machine learning model is developed to identify potential patients with long COVID.
Base population is defined as any non-deceased adult patient (age ≥18 years) with either an International Classification of Diseases-10-Clinical Modification COVID-19 diagnosis code (U07.1) from an inpatient or emergency visit, or a positive SARS-CoV-2 PCR or antigen test, and for whom at least 90 days have passed since COVID-19 index date.
The model examines demographics, health-care utilization, diagnoses, and medications for adults with COVID-19.
Anticipated Findings
Identify with high accuracy, patients who potentially have long COVID. Find the important features.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierResearch Team
Owner:
- WeiQi Wei - Other, All of Us Program Operational Use
- Vern Kerchberger - Early Career Tenure-track Researcher, Vanderbilt University Medical Center
- Srushti Gangireddy - Project Personnel, Vanderbilt University Medical Center
- Hiral Master - Project Personnel, All of Us Program Operational Use
- Gabriel Anaya - Administrator, National Heart, Lung, and Blood Institute (NIH - NHLBI)
Collaborators:
- Chris Lunt - Other, All of Us Program Operational Use
RECOVER+AoU
Scientific Questions Being Studied
The goal of this initial cross-platform testing effort is focused on expanding the analytical capability of available data sources that have collected data on SARS-CoV-2. As we gather data across the US, we can use independent data sources to better understand PASC in our population and identify possible interventions. As a first step, we hope to leverage available RECOVER data tools and apply within the All of Us Researcher Workbench to assess cross-platform interoperability and analytical equivalence. This would provide a path to engage our research community and guide research towards our understanding of PASC.
Project Purpose(s)
- Population Health
- Methods Development
- Control Set
- Other Purpose (Testing PASC ML Algorithm from N3C-RECOVER in AoU Platform)
Scientific Approaches
Bring existing data query code and data analytics code from the RECOVER researcher team into the All of Us Researcher Workbench. Use “equivalent” code sets to explore and expand our understanding of PASC and its effects on the US population. Share reproducible findings through programming “notebook” and analysis of standardized datasets (OMOP).
Anticipated Findings
This research activity will be developed in conjunction with an awareness campaign of the collaborative efforts undertaken by both RECOVER and AoU. We intend to highlight the available datasets with SARS-CoV-2 data, as well as the cloud-based researcher workspaces (RECOVER, AoU). With the awareness campaign and cross-platform testing, we intent to create an on-ramp for experienced and young researchers within two large and diverse datasets.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Registered TierResearch Team
Owner:
- WeiQi Wei - Other, All of Us Program Operational Use
- Vern Kerchberger - Early Career Tenure-track Researcher, Vanderbilt University Medical Center
- Srushti Gangireddy - Project Personnel, Vanderbilt University Medical Center
- Hiral Master - Project Personnel, All of Us Program Operational Use
- Gabriel Anaya - Administrator, National Heart, Lung, and Blood Institute (NIH - NHLBI)
- Chris Lunt - Other, All of Us Program Operational Use
Srushti_LongCovid
Scientific Questions Being Studied
Train Machine Learning models to identify potential long-COVID patients among (1) all COVID-19 patients, (2) patients hospitalized with COVID-19, and (3) patients who had COVID-19 but were not hospitalized.
Project Purpose(s)
- Disease Focused Research (Long COVID)
Scientific Approaches
To reflect that long-COVID may look different depending on the severity of the patient’s acute COVID-19, we built three different ML models using the three-site subset: (1) all patients, (2) patients who had been hospitalized with acute COVID-19, and (3) patients who were not hospitalized. The intent of each model is to identify the patients most likely to have long-COVID, using attendance at a long-COVID specialty clinic as a proxy for long-COVID diagnosis. To train and test each model, patients were randomly sampled to yield similar patient counts in both classes (long-COVID clinic patients and patients who did not attend the long-COVID clinic). For the all-patient model, data were also sampled to yield similar numbers of hospitalized and non-hospitalized patients.
Anticipated Findings
The combined demographics of the long-COVID clinic patients show significant differences from the COVID-19 patients at those sites who did not attend the long-COVID clinic (third and fourth columns of Table 1). Notably, non-hospitalized long-COVID clinic patients are disproportionately female.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Registered TierYou can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.