Cong Liu

Research Fellow, Columbia University

1 active project

Validation for OARD

Diagnosis for rare genetic diseases often relies on phenotype-driven methods, which hinges on the accuracy and completeness of the rare disease phenotypes in the underlying annotation knowledgebase. Existing knowledgebases are often manually curated with additional annotations found in published case…

Scientific Questions Being Studied

Diagnosis for rare genetic diseases often relies on phenotype-driven methods, which hinges on the accuracy and completeness of the rare disease phenotypes in the underlying annotation knowledgebase. Existing knowledgebases are often manually curated with additional annotations found in published case reports. We recently developed Open Annotation for Rare Diseases (OARD), a real-world, data-derived resource with annotation for rare disease-related phenotypes. This resource is derived from the EHR of two academic health institutions containing more than 10 million individuals spanning wide age ranges and different disease subgroups. We have identified > 1 million novel disease-phenotype association pairs, which were previously missed by human annotation. Here, we want to leverage the data collected from AllofUs program to further validate the novel associations identified from OARD and provide more robust evidence to the rare disease community.

Project Purpose(s)

  • Population Health
  • Methods Development
  • Ancestry

Scientific Approaches

Datasets: we will use the EHR data (OMOP CDM-based) collected via AllofUs data to identify the concepts observed in different age (by demographics), genetic ancestry (by demographics) and disease (by predefined OMOP condition concepts) subgroups.
Research methods: Genomic-based diagnostic pipelines rely on standardized phenotypic and disease concepts such as HPO, MONDO, OMIM, and Orphanet as the input. We will convert OMOP condition and measurement concepts to HPO and MONDO using cross-reference annotation. We will then derive prevalence and co-occurrence frequencies of concepts as well as association (e.g. jaccard-index, observe/expected ratio, chi-squared statistics, relative frequency) among concepts. We will compare AllofUs derived statistics with our OARD results.
Tools: The coding is mainly based on python. We have developed most codes previously in our OARD project. Because they are both based on OMOP CDM. We expect it can be implemented effortlessly.

Anticipated Findings

By comparing the pairwise association between OARD and AllofUs derived statics, we expect to identify the true signal (i.e. edge of the phenotype-disease knowledge graph) previously missed by human and case report literatures. The enhanced and extended phenotype-disease knowledge graph will provide a sharable resource with a better phenotypic annotation coverage to the rare disease research community and complement current existing HPO-Jax based diagnosis pipeline with novel disease-phenotype associations that has not been reported in the current knowledgebase. We expect other researchers to incorporate the findings from this study to improve their existing phenotype-driven rare disease diagnosis or design novel data-driven rare disease diagnosis algorithm.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

  • Cong Liu - Research Fellow, Columbia University
1 - 1 of 1
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.