Yuhang Chen

Graduate Trainee, Yale University

1 active project

FL-WD

Integrating data from diverse sources for collaborative machine learning model training in healthcare and clinical settings necessitates overcoming the challenge of Non-Independent and Identical (Non-IID) partitioned data distribution across. Concurrently, it's imperative to safeguard individual privacy, given the sensitive nature…

Scientific Questions Being Studied

Integrating data from diverse sources for collaborative machine learning model training in healthcare and clinical settings necessitates overcoming the challenge of Non-Independent and Identical (Non-IID) partitioned data distribution across. Concurrently, it's imperative to safeguard individual privacy, given the sensitive nature of healthcare data, which includes personal and health-related information. To tackle these dual challenges, we propose a federated learning approach. This framework employs Differentially Private Variational Autoencoders to create anonymized, simulated datasets to eliminate the re-identification risk. Additionally, it adeptly removes mixed effects—encompassing both fixed and random effects—by projecting the data into a unified embedding space.

Project Purpose(s)

  • Methods Development

Scientific Approaches

This framework is a multimodal applicable solution, and we are planning to generalize this strategy to multiple types of datasets in the healthcare field: Electronic Health Record (EHR) data. Time series sequence data, like EEG, heartbeats, etc.. Image data, like CT, fMRI, etc. The EHR data and the Fitbit data (time series digital wearable device collected data) are from the All of Us (AoU) Research Program. For the medical image datasets, we are going to use the COVID-QU-Ex Dataset from the Kaggle website, which collects the COVID-19 dataset from four publicly available tables which consist of 33,920 chest X-ray (CXR) images.

Anticipated Findings

The anticipated findings from this study using a federated learning approach with Differentially Private Variational Autoencoders (DP-VAEs) in healthcare data integration are multifaceted and significant.
Application of Federated Learning in Healthcare:
By implementing a federated learning framework, the study expects to show the practicality and efficiency of this approach in a healthcare context. This could pave the way for broader adoption of federated learning in medical research, especially in scenarios where data sharing is restricted.
Insights from Multimodal Data Analysis:
By integrating diverse types of healthcare data (EHR, EEG, heartbeats, CT, fMRI, CXR images), the study aims to provide comprehensive insights into patient health and disease states. This could lead to a more holistic understanding of patient health and the development of more effective, personalized treatment plans.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

1 - 1 of 1
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.