Haoran Zhang

Graduate Trainee, Massachusetts Institute of Technology

2 active projects

Deep Metric Learning for Diabetes Subtyping

The International Diabetes Federation estimates that 10% of the world's population will have diabetes by 2035. Patients living with diabetes are at higher risk for many acute and chronic complications, which may lead to increased hospital or ED visits. Accurate…

Scientific Questions Being Studied

The International Diabetes Federation estimates that 10% of the world's population will have diabetes by 2035. Patients living with diabetes are at higher risk for many acute and chronic complications, which may lead to increased hospital or ED visits. Accurate subtyping of those with type 2 diabetes is crucial to understand what characteristics of patients lead to increased risk of adverse outcomes, and is key to more effective and targeted treatments of diabetes and its complications.

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes mellitus)
  • Methods Development

Scientific Approaches

We will make use of a machine learning method called Deep Metric Learning (DML). DML seeks to learn a representation of the patient's state by maximizing its similarity with other patients with the same label. DML has previously been shown to be effective in subtyping several diseases in the medical imaging domain. However, DML has not been widely used on Electronic Health Records (EHR) and genetic data. Here, we propose using DML to learn subtypes for type 2 diabetes using time-series data from the EHR, as well as survey and genetic data.

Anticipated Findings

We anticipate that the DML representations we learn will form natural clusters corresponding to patient subtypes. We anticipate that patients in particular subtypes will exhibit similarities based on their input features (i.e. demographics, labs, vitals, surveys, or genetics). Patients in different subtypes may also have different outcomes (i.e. # hospital visits, complications, mortality). We believe that characterizing such subtypes will be useful for clinicians to provide targeted treatments to different patients, which may improve health outcomes. Characterizing such subtypes may also be useful in more accurate diagnosis of diabetes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Qixuan Jin - Graduate Trainee, Massachusetts Institute of Technology
  • Haoran Zhang - Graduate Trainee, Massachusetts Institute of Technology

Creating Clinical Checklists Using Machine Learning

Checklists and risk scores are ubiquitous in the clinical setting for various disease diagnosis and prediction tasks. However, the vast majority of checklists are developed by panels of experts based on domain knowledge, which is very time consuming. In [1],…

Scientific Questions Being Studied

Checklists and risk scores are ubiquitous in the clinical setting for various disease diagnosis and prediction tasks. However, the vast majority of checklists are developed by panels of experts based on domain knowledge, which is very time consuming. In [1], we proposed a machine learning method to learn predictive checklists from data. Here, we hope to validate this method on clinical data. We hope that the creation of more accurate data-driven checklists will lead to better clinical risk assessments, and ultimately lead to greater adoption of machine learning models in the clinical setting.

[1] Zhang, Haoran, et al. "Learning optimal predictive checklists." Advances in Neural Information Processing Systems 34 (2021): 1215-1229.

Project Purpose(s)

  • Methods Development

Scientific Approaches

We intend to create cohorts for various chronic diseases (e.g. type 2 diabetes, depression), containing multimodal features (i.e. tabular, survey, genetics). We will apply the method from [1] to create checklists for diagnosis of each disease from these features. We will compare the model performance (e.g. accuracy, FPR, FNR) of the checklist against some simple baselines (e.g. logistic regression, decision trees). We will investigate: 1) the benefit of multimodality (e.g. how much does genetic data help in diagnosing a particular disease?), 2) the interpretability-accuracy tradeoff (e.g. how much more accurate are longer checklists?), and 3) comparisons with currently used checklists by domain experts (i.e. how do existing checklists compare with ours, and can we improve over them?)

Anticipated Findings

We anticipate that we will be able to create accurate checklists that improve in predictive performance over existing checklists created from domain knowledge. We anticipate that our work will have two main contributions. First, it will validate our checklist creation method on clinical data, and provide insights on its behavior based on questions 1-3 above. This will hopefully motivate other practitioners to create checklists and other interpretable clinical models in a data-driven fashion. Second, we hope that we will be able to derive disease-specific insights based on the form of our final checklist models. This could lead to improvements to existing clinical checklists or risk scores (i.e. by including a certain additional feature, or adjusting a threshold), ultimately leading to better disease diagnosis and risk estimation.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Qixuan Jin - Graduate Trainee, Massachusetts Institute of Technology
  • Haoran Zhang - Graduate Trainee, Massachusetts Institute of Technology
1 - 2 of 2
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.