Xingbo Wang

Research Fellow, Cornell University

2 active projects

SMILE-PD

This workspace is part of the NIH-funded SMILE-PD project -- Similarity Matching in Longitudinal Electronic Patient Data. The goal is to enable AoU users to select "control" patients that match "case" patients on demographic and clinical characteristics in a manner…

Scientific Questions Being Studied

This workspace is part of the NIH-funded SMILE-PD project -- Similarity Matching in Longitudinal Electronic Patient Data. The goal is to enable AoU users to select "control" patients that match "case" patients on demographic and clinical characteristics in a manner that is intuitive and easy to use while also being rigorous, reproducible and scientifically appropriate.

Project Purpose(s)

  • Methods Development

Scientific Approaches

Analogous to the Word2Vec technique used in Natural Language Processing, this patient-matching algorithm implements a "Patient2Vec" model in which , we we first define the temporal “context” around each event in the EHR sequence. The “context” around event A is the collection of events happening before and after A within a certain time window in the patient EHR corpus. Deriving effective word representations by incorporating contextual information is a fundamental problem in NLP and has been extensively studied. One recent advance to address this issue is the “Word2Vec” technique that trains a two-layer neural network from a text corpus to map each word into a vector space encoding the word contextual correlations. The similarities (usually cosine distance) evaluated in such embedded vector space reflect the contextual associations (e.g., words A and B with high similarity suggests they tend to appear in the same context).

Anticipated Findings

We expect this work to support all users of AoU data that develop risk assessment models or conduct comparative effectiveness research. In both use cases, analyses need to compare exposed cohorts with statistically similar unexposed cohorts. SMILE-PD will help used create an appropriately matched control cohort.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

V7 PASC Workspace

This project will explore the scope of patients with COVID-19 and the characteristics of patients with PASC.

Scientific Questions Being Studied

This project will explore the scope of patients with COVID-19 and the characteristics of patients with PASC.

Project Purpose(s)

  • Educational
  • Ancestry
  • Other Purpose (practice notebook to familiarize with RW)

Scientific Approaches

We will apply algorithms developed by the RECOVER PCORnet Adult Cohort and compare the overlap in cohorts with the set derived though the N3C algorithm

Anticipated Findings

We expect to find a high degree of concordance between the RECOVER Adult Cohort algorithm and the N3C algorithm, even though the approaches were developed through different machine learning methods on different source patient data sets

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Lina Sulieman - Other, All of Us Program Operational Use
1 - 2 of 2
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.