Eric Banks

Senior Researcher, Arboretum Life Sciences

3 active projects

Improving Ancestry Prediction for New Samples

This research aims to explore and validate methods for genetic ancestry prediction using the All of Us dataset. The primary questions are: (1) How well do current ancestry prediction methods perform across the diverse populations represented in All of Us?…

Scientific Questions Being Studied

This research aims to explore and validate methods for genetic ancestry prediction using the All of Us dataset. The primary questions are: (1) How well do current ancestry prediction methods perform across the diverse populations represented in All of Us? (2) Can we develop improved methods that better capture ancestry information particularly for underrepresented groups and people from multiple ancestries?
This question is important because accurate ancestry predictions are essential for many clinical applications, including Polygenic Risk Scores (PRS). Current methods were developed primarily using European-ancestry data or smaller sample sizes than are available in All of Us, potentially limiting their accuracy.
The exploratory phase will evaluate existing methods' performance and identify areas for improvement, laying groundwork for developing more accurate and inclusive ancestry prediction tools.

Project Purpose(s)

  • Educational

Scientific Approaches

We will analyze genetic data from the All of Us dataset using both established and novel computational methods for ancestry prediction. Datasets will include All of Us genetic data (genotypes from sequence data), the associated demographic and ancestry information from participants, and comparison data from public resources (1000 Genomes Project, HGDP). Methods will include Principal Component Analysis (PCA) for dimensionality reduction and ancestry visualization, and ancestry prediction with AIPS (https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-4166-8.) among others. We will also use statistical validation using cross-validation and performance metrics to assess prediction accuracy. The analysis will be implemented using Python with established bioinformatics libraries and custom analysis code for method development. We will apply rigorous statistical approaches to ensure reproducibility and reliability of results.

Anticipated Findings

We anticipate this study will provide valuable learning opportunities about:

The practical challenges and complexities of working with large-scale genomic datasets through hands-on experience with All of Us data.
The technical nuances of existing ancestry prediction methods, deepening our understanding of their mathematical and computational foundations through direct application and testing.

These learning outcomes will:

Enhance our computational and statistical skills through real-world application
Build practical experience in handling and analyzing diverse population genetic data
Strengthen our understanding of the challenges in ancestry inference
Provide hands-on experience with a major biobank dataset

This work will serve as an educational foundation for future research in ancestry prediction.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Exploring AoU Data

The purpose of this workspace is to familiarize ourselves with exactly how to incorporate All of Us data into our analyses. As this is the first time we are ever using the Researcher Workbench, we would like to explore exactly…

Scientific Questions Being Studied

The purpose of this workspace is to familiarize ourselves with exactly how to incorporate All of Us data into our analyses. As this is the first time we are ever using the Researcher Workbench, we would like to explore exactly how to access things like phenotype level data and to use the Data and Cohort browsers.

We intend to be accessing data associated with Cardiac Heart Disease, but without a specific research plan in this workspace. The work here is intended solely for our own educational purposes and will not be published or used in any downstream studies, etc.

Project Purpose(s)

  • Educational

Scientific Approaches

We will explore exactly how to access things like phenotype level data and to use the Data and Cohort browsers. As an example, we'd like to show that we can select data for all participants with a loss of function mutation in the PCSK9 gene. And then we'd like to show that we can access cardiac-related phenotypes for those participants.

Anticipated Findings

This is purely for educational purposes. Once we understand how to use the All of Us data we will create new workspaces related to studies of interest.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Calculate PRS to Mimic Drug Efficacy

Cardiovascular diseases remain leading causes of mortality worldwide. While traditional risk factors provide valuable insights into disease susceptibility, they fall short in predicting individual disease trajectories. Polygenic Risk Scores have also emerged as promising tools for aggregating genetic information into…

Scientific Questions Being Studied

Cardiovascular diseases remain leading causes of mortality worldwide. While traditional risk factors provide valuable insights into disease susceptibility, they fall short in predicting individual disease trajectories. Polygenic Risk Scores have also emerged as promising tools for aggregating genetic information into clinically meaningful metrics. Current research has primarily focused on using PRS for disease onset prediction, with limited exploration of their utility in predicting disease progression. This gap is particularly significant for CAD, where understanding progression could inform therapeutic strategies and resource allocation. However, the relationship between PRS and disease trajectory remains inadequately characterized.
By leveraging AoU, we will evaluate whether PRS can enhance our ability to predict disease progression in CAD patients. We will then explore similar models in other common disease phenotypes.

Project Purpose(s)

  • Drug Development

Scientific Approaches

Our research will employ a multi-staged analytical approach combining genetic and clinical data analysis. Initially, our focus will be on recapitulating the scientific results from the latest set of scientific publications on CAD-specific PRS using established genome-wide association study (GWAS) summary statistics and validated methodologies.

Next, we hope to expand on the published PRS into disease progression analysis. We will define objective endpoints including: progression to severe CAD (determined by coronary intervention requirements) and major adverse cardiovascular events. We still intend to utilize standard methods for the PRS calculations, incorporating the most recent meta-analyses of CAD-associated variants. We will assess PRS prediction performance through calibration plots and area under the receiver operating characteristic curve (AUC-ROC).

Anticipated Findings

more accurate prediction of CAD progression. The findings could have immediate clinical applications by identifying high-risk individuals who might benefit from more intensive monitoring or aggressive intervention strategies. This aligns with the public interest by potentially reducing healthcare costs through better resource allocation and improving patient outcomes through personalized risk stratification.

The results will contribute to the broader understanding of genetic influences on disease progression, potentially informing drug development and clinical trial design. Furthermore, the methodological framework developed could be adapted for studying progression patterns in other complex diseases, maximizing the public health impact of this research.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

1 - 3 of 3
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.