Francis Ratsimbazafy
All of Us Program Operational Use
4 active projects
Duplicate of How to Get Started with Registered Tier Data (tier 5)
Scientific Questions Being Studied
We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data.
What should you expect? This notebook will give you an overview of what data is available in the current Curated Data Repository (CDR). It will also teach you how to retrieve information about Electronic Health Record (EHR), Physical Measurements (PM), and Survey data.
Project Purpose(s)
- Educational
- Methods Development
- Other Purpose (This is an All of Us Tutorial Workspace. It is meant to provide instruction for key Researcher Workbench components and All of Us data representation.)
Scientific Approaches
This Tutorial Workspace contains two Jupyter Notebooks (one written in Python, the other in R). Each notebook is divided into the following sections:
1. Setup: How to set up this notebook, install and import software packages, and select the correct version of the CDR.
2. Data Availability Part 1: How to summarize the number of unique participants with major data types: Physical Measurements, Survey, and EHR;
3. Data Availability Part 2: How to delve a little deeper into data availability within each major data type;
4. Data Organization: An explanation of how data is organized according to our common data model.
5. Example Queries: How to directly query the CDR, using two examples of SQL queries to extract demographic data.
6. Expert Tip: How to access the base version of the CDR, for users that want to do their own cleaning.
Anticipated Findings
By reading and running the notebooks in this Tutorial Workspace, you will understand the following:
All of Us data are made available in a Curated Data Repository. Participants may contribute any combination of survey, physical measurement, and electronic health record data. Not all participants contribute all possible data types. Each unique piece of health information is given a unique identifier called a concept_id and organized into specific tables according to our common data model. You can use these concept_ids to query the CDR and pull data on specific health information relevant to your analysis. See our support article Learning the Basics of the All of Us Dataset for more info.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Registered TierResearch Team
Owner:
- Francis Ratsimbazafy - Other, All of Us Program Operational Use
- Neha Saxena - Undergraduate Student, All of Us Program Operational Use
- Jun Qian - Other, All of Us Program Operational Use
- Giselle Routhier - Research Fellow, New York University, Grossman School of Medicine
Systemic Disease and Glaucoma (Cloned)
Scientific Questions Being Studied
We have previously published a predictive model of glaucoma progression using electronic health record (EHR) data pertaining to systemic attributes from a single institution. We aim to use the All of Us dataset to 1) serve as external validation for this single-center model and 2) to train new models focused on predicting glaucoma progression using systemic predictors. This is important to understand whether the original findings are generalizable and provide additional knowledge about the utility of systemic predictors on a national-level dataset.
Project Purpose(s)
- Disease Focused Research (Primary open angle glaucoma)
- Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy. )
Scientific Approaches
We plan to primarily work with EHR data contained in All of Us for a cohort of adult participants diagnosed with primary open-angle glaucoma. We will extract data on systemic conditions and medications for this cohort, as well as physical measurements and vital signs. We will clean the data such that the format is consistent with the data from our previously published model. Then, we will use this data as an external validation of a logistic regression model derived from our prior study that was based at a single academic center. Next, we will use All of Us data to train a new set of models, using techniques such as logistic regression, random forests, and artificial neural networks. We will optimize these models using feature selection methods and class balancing procedures. By evaluating performance metrics such as area under the curve (AUC), precision, recall, and accuracy, we will assess whether we can achieve superior predictive performance when training models using All of Us.
Anticipated Findings
We anticipate that the All of Us data will validate the findings from the model, which demonstrated that blood pressure-related metrics and certain medication classes had predictive value for glaucoma progression. In addition, we anticipate that the models trained with All of Us data will outperform the model trained with single institution data due to larger sample size and greater diversity. These findings will support further investigation in understanding the relationship between systemic conditions like blood pressure with glaucoma progression.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Registered TierResearch Team
Owner:
- Tsung-Ting Kuo - Early Career Tenure-track Researcher, University of California, San Diego
- Sally Baxter - Research Fellow, University of California, San Diego
- Roxana Loperena Cortes - Other, All of Us Program Operational Use
- Francis Ratsimbazafy - Other, All of Us Program Operational Use
- Paulina Paul - Project Personnel, University of California, San Diego
- Melissa Patrick - Project Personnel, All of Us Program Operational Use
- Lucila Ohno-Machado
- Luca Bonomi - Project Personnel, Vanderbilt University Medical Center
- Kelsey Mayo - Other, All of Us Program Operational Use
- Jihoon Kim - Project Personnel, University of California, San Diego
- Bharanidharan Radha Saseendrakumar - Project Personnel, University of California, San Diego
- Ashley Able - Other, All of Us Program Operational Use
Collaborators:
- Chenjie Zeng - Research Fellow, National Human Genome Research Institute (NIH-NHGRI)
Duplicate of Cancer
Scientific Questions Being Studied
We intend to explore the difference in the prevalence of cancer between the AoU population. In particular, we will be looking at the difference between the entire population, the subset with medical records, and the subset with self-reported data.
Project Purpose(s)
- Population Health
Scientific Approaches
We intend to select a list of SNOMED codes corresponding to primary cancers to get the subset with cancer in the medical record
We intend to select the survey question asking about self-reported cancer to get the subset with self-reported cancer
Anticipated Findings
We expect the difference of cancer to vary between self-report and medical record, which could have implications for how cancer is measured on a population-level.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Registered TierResearch Team
Owner:
- Sameep Shah - Project Personnel, University of Chicago
- Francis Ratsimbazafy - Other, All of Us Program Operational Use
- Ashley Able - Other, All of Us Program Operational Use
Collaborators:
- Jun Qian - Other, All of Us Program Operational Use
Demo - Hypertension Prevalence
Scientific Questions Being Studied
We are using the All of Us Researcher Workbench interface to answer the question, "Is hypertension prevalence in the All of Us Research Program similar to hypertension prevalence in the 2015–2016 National Health and Nutrition Examination Survey (NHANES) ?". Clinical approaches to understanding and treating hypertension may benefit from the integration of a precision medicine approach that integrates data on environments, social determinants of health, behaviors, and genomic factors that contribute to hypertension risk. Hypertension is a major public health concern and remains a leading risk factor for stroke and cardiovascular disease.
Project Purpose(s)
- Other Purpose (This work is an AoU demo project. Demo projects are efforts by the AoU Research Program designed to meet the program goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. As an approved demo project, this work was reviewed and overseen by the AoU Research Program Science Committee and the AoU Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use. )
Scientific Approaches
In this cross-sectional, population-based study, we used All of Us baseline data from patient (age>18) provided information (PPI) surveys and electronic health record (EHR) blood pressure measurements and retrospectively examined the prevalence of hypertension in the EHR cohort using Systemized Nomenclature of Medicine (SNOMED codes and blood pressure medications recorded in the EHR. We used the EHR data (SNOMED codes on 2 distinct dates and at least one hypertension medication) as the primary definition, and then add subjects with elevated systolic or elevated diastolic blood pressure on measurements 2 and 3 from PPI. We extracted each participant’s detailed dates of SNOMED code for essential hypertension from the Researcher Workbench table ‘cb_search_all_events’. We calculated an age-standardized HTN prevalence according to the age distribution of the U.S. Census, using 3 groups (18-39, 40-59, ≥ 60).
Anticipated Findings
The prevalence of hypertension in the All of Us cohort is similar to that of published literature. All of Us age-adjusted HTN prevalence was 27.9% compared to 29.6% in National Health and Nutrition Examination Survey. The All of Us cohort is a growing source of diverse longitudinal data that can be utilized to study hypertension nationwide. The prevalence of hypertension varies in the United States (U.S.) by age, sex, and socioeconomic status. Hypertension can often be treated successfully with medication, and prevented or delayed with lifestyle modifications. Even with these established hypertension intervention and prevention strategies, the prevalence of hypertension continues to be at levels of public health concern. The diversity within All of Us may provide insight into factors relevant to hypertension prevention and treatments in a variety of social and geographic contexts and population strata in the U.S.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Registered TierResearch Team
Owner:
- yonghao yu - Administrator, All of Us Program Operational Use
- Francis Ratsimbazafy - Other, All of Us Program Operational Use
- Jun Qian - Other, All of Us Program Operational Use
You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.