Christopher Lord
Project Personnel, All of Us Program Operational Use
13 active projects
Duplicate of Workshop: Intro to All of Us Genomics Data
Scientific Questions Being Studied
This workspace is meant to help researchers get familiar with the All of Us Researcher Workbench. There are five hands-on exercises during the workshop, each with a specific notebook.
Exercise 1: Duplicate the workspace & start the cloud environment
Exercise 2: Looking at the genomic data (notebook)
Exercise 3: GWAS - extracting phenotypic data (notebook)
Exercise 4: GWAS - running Hail GWAS (notebook)
Exercise 5: Advanced GWAS (2 notebooks)
By running the exercises in this workspace, researchers will become more familiar with the genomic data, know how to access the genomic data, see how the genomic data and tools can be used in the Researcher Workbench, and be able to start their own genomic data project.
Project Purpose(s)
- Other Purpose (This workspace is meant for use during the Introduction to Analyzing All of Us Genomic Data workshop. In this workshop, participants will get hands-on experience using the genomics data running a genome-wide association study (GWAS) using Hail. )
Scientific Approaches
We are using the All of Us dataset in order to run a genome-wide association study (GWAS) using Hail. In the workshop, we will give an introduction to the All of Us Researcher Workbench and demonstrate how to use the Cohort Builder and Jupyter Notebooks to set up a research project. Using Jupyter notebooks, we will create a dataset linking the All of Us phenotypic data to the short read whole genome sequencing (srWGS) data. After running the GWAS steps using Hail, we will visualize the results.
Anticipated Findings
This study is running a genome-wide association study (GWAS) using Hail, using height as the selected phenotypic data. We do not anticipate findings from this example workspace but we expect that workshop participants will be able to apply similar methods to their future research.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierResearch Team
Owner:
- Ghada Soliman - Other, City University of New York (CUNY)
- Jennifer Zhang - Project Personnel, All of Us Program Operational Use
- Christopher Lord - Project Personnel, All of Us Program Operational Use
- Chris Lord - Project Personnel, All of Us Program Operational Use
Collaborators:
- Genevieve Brandt - Project Personnel, All of Us Program Operational Use
Duplicate of Data Wrangling in All of Us Program (v7)
Scientific Questions Being Studied
For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.
Project Purpose(s)
- Educational
- Other Purpose (For use with Office hours. notebooks for adding code snippets useful for researchers. This is a placeholder for creating notebooks for best practices among other things)
Scientific Approaches
For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.
Anticipated Findings
For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Registered TierResearch Team
Owner:
- Izabelle Humes - Other, All of Us Program Operational Use
- Obinna Theophilus Nwankwo - Other, New York City Health & Hospitals
- Christopher Lord - Project Personnel, All of Us Program Operational Use
- Aymone Kouame - Other, All of Us Program Operational Use
Collaborators:
- Jun Qian - Other, All of Us Program Operational Use
regenie_ldl_gwas_with_cromwell_aouv7_controlled
Scientific Questions Being Studied
The goal of this workspace is to provide another example of a way to do a GWAS analysis, this time utilizing cromwell to run tools outside of the notebook.
Project Purpose(s)
- Educational
Scientific Approaches
This workspace will feature notebooks using Cromwell to run regenie via WDL and a notebook to do the analysis of the regenie GWAS results. The phenotype of interest is LDL cholesterol and we'll be using participant age and sex assigned at birth as covariates along with the top 15 ancestry PCs.
Anticipated Findings
I expect to be able to reproduce the GWAS results found in TOPMed and in the v6 regenie featured workspace. Given that this will be v7 data, which contain more samples than v6, there will likely be additional hits beyond those discovered in v6 due to more statistical power.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierResearch Team
Owner:
- Laura Gauthier - Project Personnel, Broad Institute
- Christopher Lord - Project Personnel, All of Us Program Operational Use
Collaborators:
- Sophie Schwartz - Project Personnel, All of Us Program Operational Use
- Jun Qian - Other, All of Us Program Operational Use
- Genevieve Brandt - Project Personnel, All of Us Program Operational Use
- Chris Lord - Project Personnel, All of Us Program Operational Use
Duplicate of Workshop: Intro to All of Us Genomics Data
Scientific Questions Being Studied
This workspace is meant to help researchers get familiar with the All of Us Researcher Workbench. There are five hands-on exercises during the workshop, each with a specific notebook.
Exercise 1: Duplicate the workspace & start the cloud environment
Exercise 2: Looking at the genomic data (notebook)
Exercise 3: GWAS - extracting phenotypic data (notebook)
Exercise 4: GWAS - running Hail GWAS (notebook)
Exercise 5: Advanced GWAS (2 notebooks)
By running the exercises in this workspace, researchers will become more familiar with the genomic data, know how to access the genomic data, see how the genomic data and tools can be used in the Researcher Workbench, and be able to start their own genomic data project.
Project Purpose(s)
- Other Purpose (This workspace is meant for use during the Introduction to Analyzing All of Us Genomic Data workshop. In this workshop, participants will get hands-on experience using the genomics data running a genome-wide association study (GWAS) using Hail. )
Scientific Approaches
We are using the All of Us dataset in order to run a genome-wide association study (GWAS) using Hail. In the workshop, we will give an introduction to the All of Us Researcher Workbench and demonstrate how to use the Cohort Builder and Jupyter Notebooks to set up a research project. Using Jupyter notebooks, we will create a dataset linking the All of Us phenotypic data to the short read whole genome sequencing (srWGS) data. After running the GWAS steps using Hail, we will visualize the results.
Anticipated Findings
This study is running a genome-wide association study (GWAS) using Hail, using height as the selected phenotypic data. We do not anticipate findings from this example workspace but we expect that workshop participants will be able to apply similar methods to their future research.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierResearch Team
Owner:
- Jennifer Zhang - Project Personnel, All of Us Program Operational Use
- Tabitha Harrison - Graduate Trainee, University of Washington
- Christopher Lord - Project Personnel, All of Us Program Operational Use
- Chris Lord - Project Personnel, All of Us Program Operational Use
Collaborators:
- Genevieve Brandt - Project Personnel, All of Us Program Operational Use
Duplicate of How to Work with All of Us Genomic Data (Hail - Plink)(v7)
Scientific Questions Being Studied
Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.
Project Purpose(s)
- Other Purpose (Demonstrate to the All of Us Researcher Workbench users how to get started with the All of Us genomic data and tools. It includes an overview of all the All of Us genomic data and shows some simple examples on how to use these data.)
Scientific Approaches
Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.
Anticipated Findings
Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierResearch Team
Owner:
- Jennifer Zhang - Project Personnel, All of Us Program Operational Use
- Tabitha Harrison - Graduate Trainee, University of Washington
- Christopher Lord - Project Personnel, All of Us Program Operational Use
Polygenic_Risk_Score_Genetic_Ancestry_Calibration
Scientific Questions Being Studied
Polygenic risk scores (PRS) are available for a wide array of traits and conditions, offering many potential applications including preventative medicine. There is, however, a serious concern that clinical use of PRS could contribute to health disparities due to the poorer performance of PRS in non-European ancestry individuals.
We aim to improve our ability to correct the genetic ancestry-dependent bias in PRS for 10 conditions (Asthma, Atrial fibrillation, Breast Cancer, Chronic Kidney Disease, Coronary heart disease, Hypercholesterolemia, Obesity/BMI, Prostate cancer, Type 1 Diabetes, Type 2 Diabetes). We will use the AoU dataset to produce a resource that can be used to reduce the ancestry-dependent bias in these 10 PRS. This resource will initially be used by the eMERGE IV consortium, which is an NIH-funded consortium of clinical centers across the United States, with an aim to enroll a prospective cohort of 25,000 individuals.
Project Purpose(s)
- Control Set
Scientific Approaches
Arrays will be imputed using the phasing and imputation tools Eagle2 and Minimac4. Polygenic risk score will then be calculated using the population genomics tool PLINK. A simple linear model will then be fit to the scores, which attempt to describe the macroscopic relationship between genetic ancestry and observed polygenic scores. The fitted parameters of this model can then be used to reduce genetic ancestry-dependent bias when calculating these scores in a clinical setting.
Anticipated Findings
We will produce a set of fitted parameters for a simple model which attempts to describe the macroscopic relationship between genetic ancestry and observed polygenic scores. The fitted parameters of this model can then be used as a resource to reduce genetic ancestry-dependent bias when calculating these scores in a clinical setting.
Demographic Categories of Interest
- Race / Ethnicity
Data Set Used
Controlled TierResearch Team
Owner:
- Niall Lennon - Other, Broad Institute
- Jun Qian - Other, All of Us Program Operational Use
- Chris Kachulis - Project Personnel, Broad Institute
- Christopher Lord - Project Personnel, All of Us Program Operational Use
- Ashley Green - Project Personnel, All of Us Program Operational Use
Collaborators:
- Michael Gatzen - Project Personnel, Broad Institute
- Fabio Cunial - Project Personnel, Broad Institute
Demo Project: State-level Activity Inequality [Published Work]
Scientific Questions Being Studied
How is physical activity distributed within states in the US? Analysis of such activity distributions and inequality can reveal important relationships between physical activity disparities, health outcomes, and modifiable factors, as Althoff et al. studied in their paper, "Large-scale physical activity data reveal worldwide activity inequality" (2017).
Project Purpose(s)
- Educational
Scientific Approaches
The cohort will consist of Fitbit users in the US, with analysis being subdivided to the state level. Various graphs will be utilized to help visualize the low- and high-activity trends across states. Well-defined measures such as the Gini coefficient will be used to aid in the analysis of activity inequality.
Anticipated Findings
The study aims to find relationships between activity inequality and health outcomes, such as obesity levels. With the growing accessibility of fitness trackers and activity sensors built into personal devices, this study hopes to leverage the volume of available data and potentially inform measures to improve population activity and health.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierResearch Team
Owner:
- Hiral Master - Project Personnel, All of Us Program Operational Use
- Hayoung Jeong - Graduate Trainee, Duke University
- Christopher Lord - Project Personnel, All of Us Program Operational Use
- Aymone Kouame - Other, All of Us Program Operational Use
Duplicate of How to Work with All of Us Genomic Data (Hail - Plink)(v7)
Scientific Questions Being Studied
Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.
Project Purpose(s)
- Ancestry
- Other Purpose (Demonstrate to the All of Us Researcher Workbench users how to get started with the All of Us genomic data and tools. It includes an overview of all the All of Us genomic data and shows some simple examples on how to use these data.)
Scientific Approaches
Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.
Anticipated Findings
Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierResearch Team
Owner:
- Jeffrey Haessler - Project Personnel, Fred Hutchinson Cancer Research Center
- Jennifer Zhang - Project Personnel, All of Us Program Operational Use
- Christopher Lord - Project Personnel, All of Us Program Operational Use
Duplicate of How to Work with All of Us Survey Data (v7)
Scientific Questions Being Studied
We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data.
What should you expect?
By running the notebooks in this workspace, you should get familiar with how to query PPI questions/surveys, what the frequencies of answers for each question in each PPI module are.
Project Purpose(s)
- Educational
- Methods Development
- Other Purpose (This is an All of Us Tutorial Workspace created by the Researcher Workbench Support team. It is meant to provide instruction for key Researcher Workbench components and All of Us data representation.)
Scientific Approaches
By running the notebooks in this workspace, you should get familiar with how to query PPI questions/surveys, what the frequencies of answers for each question in each PPI module are.
Anticipated Findings
By reading and running the notebooks in this Tutorial Workspace, researchers will learn the following:
- how to query the survey data,
- how to summarize PPI modules, and questions.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Registered TierResearch Team
Owner:
- Jun Qian - Other, All of Us Program Operational Use
- Christopher Lord - Project Personnel, All of Us Program Operational Use
- Chenyu Li - Graduate Trainee, University of Pittsburgh
- Brandy Mapes - Other, All of Us Program Operational Use
AOU_Recover_Long_Covid_v6
Scientific Questions Being Studied
The purpose of this workspace was to implement the published XGBoost machine learning (ML) model, which was developed using the National COVID Cohort Collaborative’s (N3C) EHR repository to identify potential patients with PASC/Long COVID in All of Us Research Program.
Project Purpose(s)
- Disease Focused Research (Long COVID)
Scientific Approaches
To achieve this objective, data science workflows were used to apply ML algorithms on the Researcher Workbench. This effort allowed an expansion in the number of participants used to evaluate the ML models used to identify risk of PASC/Long COVID and also serve to validate the efforts of one team and providing insight to other teams. These models were implemented within the All of Us Controlled Tier data (C2022Q2R2), which was last refreshed on June 22, 2022. We intend to provide a step-by-step guide for the implementation of N3C's ML Model for identification of PASC/Long COVID Phenotype in the All of Us dataset.
Anticipated Findings
We intend to provide a step-by-step guide for the implementation of N3C's ML Model for identification of PASC/Long COVID Phenotype in the All of Us dataset.
Demographic Categories of Interest
This study will not center on underrepresented populations.
Data Set Used
Controlled TierResearch Team
Owner:
- WeiQi Wei - Other, All of Us Program Operational Use
- Vern Kerchberger - Early Career Tenure-track Researcher, Vanderbilt University Medical Center
- Srushti Gangireddy - Project Personnel, Vanderbilt University Medical Center
- Mark Weiner - Mid-career Tenured Researcher, Cornell University
- Hiral Master - Project Personnel, All of Us Program Operational Use
- Gabriel Anaya - Administrator, National Heart, Lung, and Blood Institute (NIH - NHLBI)
- David Mohs - Other, All of Us Program Operational Use
- Christopher Lord - Project Personnel, All of Us Program Operational Use
- Chenchal Subraveti - Project Personnel, All of Us Program Operational Use
Collaborators:
- Jun Qian - Other, All of Us Program Operational Use
- Chris Lunt - Other, All of Us Program Operational Use
GeneticAncestryDemoProject
Scientific Questions Being Studied
As a demonstration project, this project will describe, characterize and, validate the extent of diversity in the All of Us cohort with respect to the participants' race & ethnicity (which are socially defined), and genetic ancestry (which can be objectively inferred from participants' genome). Socially defined race & ethnicity and genetically inferred ancestry are both relevant to health outcomes. Race & ethnicity shape individuals’ lived experience and social environment, eg structural inequities, environmental injustice, and barriers to healthcare access. Genetic ancestry can affect health outcomes via differences in the frequencies of variants associated with disease and drug response. Specifically, we will ask:
1. What is the extent of racial, ethnic, and genetic diversity in the All of Us cohort?
2. How do genetic ancestry and admixture change over geography and with age in the US?
3. Are there associations between genetic ancestry and health outcomes in the All of Us cohort?
Project Purpose(s)
- Population Health
- Methods Development
- Ancestry
- Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)
Scientific Approaches
To characterize the diversity of the All of Us cohort, we analyzed participant genetic, demographic, and geographic data.
Here is a brief list of methods used:
1. All of Us participant genome-wide genotype was merged and harmonized with global reference population data.
2. Unsupervised clustering analysis techniques - Hopkins statistic, visual assessment of clustering tendency, K-means clustering & UMAP - to assess the extent of genetic structure in the cohort.
3. Supervised genetic ancestry inference using global reference populations, principal components analysis, and the Rye (Rapid ancestrY Estimation) program.
4. Genetic ancestry was compared to participants' self-identified race & ethnicity.
5. Geocoded data and participant age were used to measure how genetic ancestry and admixture vary with respect to participant geography and age.
6. Admixture regression to associate participant health outcomes, gleaned from electronic health records, with their genetic ancestry.
Anticipated Findings
1. The All of Us participant cohort will be racially, ethnically, and genetically diverse, consistent with the project’s aim to recruit underrepresented biomedical research groups in support of health equity.
2. All of Us participant genetic variation will be highly structured and best modeled by clusters rather than a continuum of variation.
3. All of Us participants’ will show patterns of genetically inferred ancestry that are correlated with their socially defined ancestry (i.e. race and ethnicity).
4. All of Us participants’ genetic ancestry and admixture will change over geography and with age.
5. All of Us participants’ genetic ancestry will be associated with a variety of health outcomes.
Demographic Categories of Interest
- Race / Ethnicity
Data Set Used
Controlled TierResearch Team
Owner:
- Shivam Sharma - Graduate Trainee, Georgia Institute of Technology
- Jun Qian - Other, All of Us Program Operational Use
- Christopher Lord - Project Personnel, All of Us Program Operational Use
- Ashley Green - Project Personnel, All of Us Program Operational Use
Collaborators:
- Jennifer Zhang - Project Personnel, All of Us Program Operational Use
AFib epidemiology (AOU v4)
Scientific Questions Being Studied
The overall goal of this study, as a Demonstration project, is to evaluate the ability of the All of Us Research Program data to replicate epidemiologic patterns of atrial fibrillation (AF), a common arrhythmia, previously described in other setting. We will address this goal with these two aims:
• Specific Aim 1. To determine the association of race and ethnicity with the prevalence and incidence of atrial fibrillation (AF). We hypothesize than non-whites will have lower prevalence and incidence of AF than whites.
• Specific Aim 2. To estimate associations of established risk factors for AF with the prevalence and incidence of AF. We hypothesize that increased body mass index, higher blood pressure, diabetes, smoking and a prior history of cardiovascular diseases will be associated with increased prevalence and incidence of AF.
Project Purpose(s)
- Population Health
- Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the Data and Research Center to ensure compliance with program policy, including policies for acceptable data access and use.)
Scientific Approaches
We will select all All of Us participants who self-reported sex at birth male or female, whose self-reported race was white, black or Asian, as well as those who self-reported being Hispanics.
Atrial fibrillation (AF) will be identified from self-reports in the medical survey or from electronic health records (EHR).
Clinical factors will be identified from EHR and study measurements (blood pressure, weight, height).
We will evaluate the association of demographic (age, sex, race/ethnicity) and clinical (body mass index, blood pressure, smoking, cardiovascular diseases) factors with prevalence of self-reported AF and prevalence of AF in the EHR, as well as incident AF ascertained from the EHR.
Anticipated Findings
The overall goal of this project is to evaluate the prevalence and incidence of atrial fibrillation (AF), overall and by race/ethnicity, as well as to confirm the association of established risk factors for AF in the All of Us Research participants. We expect to confirm associations between demographic and clinical variables previously reported in the literature, demonstrating the value of the All of Us Research Program data to address questions regarding this common cardiovascular disease.
Demographic Categories of Interest
- Race / Ethnicity
- Age
Data Set Used
Registered TierResearch Team
Owner:
- Peter Buto - Graduate Trainee, Emory University
- Christopher Lord - Project Personnel, All of Us Program Operational Use
- Alvaro Alonso - Late Career Tenured Researcher, Emory University
Collaborators:
- Vignesh Subbian - Early Career Tenure-track Researcher, University of Arizona
- Francis Ratsimbazafy - Other, All of Us Program Operational Use
- Aymone Kouame - Other, All of Us Program Operational Use
- Aniqa Alam
- Konstantinos Sidiropoulos - Other, Nova Southeastern University
Wearables and The Human Phenome (Published Work)
Scientific Questions Being Studied
Our primary goal is to understand the relation between activity levels with the development and progression of human disease. Higher physical activity is associated with lower prevalence and better outcomes in virtually every human disease. These analyses will generate hypotheses guiding clinical and research interventions focused on activity to reduce morbidity and mortality in patients seeking care.
This workspace is replication workspace for Wearables and The Human Phenome project. We replicated the workspace to provide a clean and reduced version of code that was used to generate the findings, which were published in Nature Medicine (https://www.nature.com/articles/s41591-022-02012-w).
Project Purpose(s)
- Population Health
- Social / Behavioral
Scientific Approaches
We will examine the relationship between daily activity (steps, activity intensity) over time and the prevalence and progression of coded human diseases. We will use the Fitbit data, EHR-curated diagnoses, laboratory values, and survey results.
Anticipated Findings
We expect to find that lower levels of activity are associated with a higher prevalence and more rapid progression of chronic diseases. These data will provide the rationale to link wearables data with electronic health records nationwide as a window into behavioral activity choice as a modifiable risk factor for chronic diseases. We may find substantial variation in activity and disease prevalence/severity by socioeconomic status, which would motivate studies/interventions to reduce these health disparities.
Demographic Categories of Interest
- Race / Ethnicity
- Geography
- Access to Care
- Education Level
- Income Level
Data Set Used
Registered TierResearch Team
Owner:
- Hiral Master - Project Personnel, All of Us Program Operational Use
- Christopher Lord - Project Personnel, All of Us Program Operational Use
- Chenchal Subraveti - Project Personnel, All of Us Program Operational Use
- Jeffrey Annis - Other, Vanderbilt University Medical Center
Collaborators:
- Jun Qian - Other, All of Us Program Operational Use
You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.