Carl Yang

Early Career Tenure-track Researcher, Emory University

2 active projects

Diabetes analysis

We aim to understand the heterogeneity of diabetes by integrative analysis over clinical, genomic and behavioral data in All of Us.

Scientific Questions Being Studied

We aim to understand the heterogeneity of diabetes by integrative analysis over clinical, genomic and behavioral data in All of Us.

Project Purpose(s)

  • Disease Focused Research (diabetes)
  • Population Health
  • Social / Behavioral
  • Methods Development
  • Ancestry

Scientific Approaches

We will create a diabetes cohort with clinical, genomic and behavioral features. We will develop heterogeneous-hyper-graph based neural network models to jointly learn clusters and predict risks for patients.

Anticipated Findings

We hope to find novel, predictive and interpretable diabetes subtypes and the corresponding patient groups, which can help risk prediction and treatment design.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Carl Yang - Early Career Tenure-track Researcher, Emory University

Emory Gene Clustering Work

We are intending to study problems relating to gene data. The main goal of our project would be to use machine learning and computational algorithms to find more a representative subset that is either more accurate or just as accurate…

Scientific Questions Being Studied

We are intending to study problems relating to gene data. The main goal of our project would be to use machine learning and computational algorithms to find more a representative subset that is either more accurate or just as accurate as the whole set of data, but also saving computational cost. This would allow for further research to be conducted on these subsets that could solve future problems. (ie. drug analysis, regression)

Project Purpose(s)

  • Ancestry
  • Other Purpose (The purpose of for using this workspace is to find potential new research on gene clustering data with the intent to publish)

Scientific Approaches

The first data set we want to use is a dataset with Diabetes gene data. Here, we will use this data to see if we can find subset that work within the dataset. We will use different subset selection techniques such as k-means and topological data analysis. For example we would use 1. compute embeddings of observations 2. find centroids using k-means 3. Use TDA to extract important features. Using these methods we would hopefully find the most important features and find the most representative datasets. Some methods that we would use to help this process along is SHAP, t-SNE, and UMAP. SHAP is a mathematical method to explain the predictions of machine learning models. t-SNE is a statistical method for visualizing data by giving each datapoint a location in a two or three-dimensional map. UMAP is an algorithm for dimension reduction based on manifold learning techniques and ideas from topological data analysis

Anticipated Findings

The anticipated finding are a subset of the given dataset that works just as well if not better. This will help reduce computational cost in the future and can provide better results on various problems relating to the given disease.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Carl Yang - Early Career Tenure-track Researcher, Emory University
  • Ethan Young - Undergraduate Student, University of California, Los Angeles
  • Mathias Heider heider - Undergraduate Student, University of Delaware
1 - 2 of 2
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.