John House

National Institute of Environmental Health Sciences (NIH - NIEHS)

1 active project

Testing Fast Logistic and Double Linear Model Regression

We are developing methods for both vQTL on binary traits, as well as fast regression methods for logistic regression in GWAS. This project will evaluate in a cloud computing environment with binary and quantitative traits related to the Top10 leading…

Scientific Questions Being Studied

We are developing methods for both vQTL on binary traits, as well as fast regression methods for logistic regression in GWAS. This project will evaluate in a cloud computing environment with binary and quantitative traits related to the Top10 leading causes of death.

Project Purpose(s)

  • Methods Development

Scientific Approaches

Our approaches are methodological in nature for comparing methods for both vQTL (double linear model regressions) and for fast logistic regression using a Fisher's scoring test and and hdf5 data transformation that we think will improve GWAS compute times by 1-2 orders of magnitude. This will be of great use in a cloud computing environment such as allofus where all resources are charged.

Besides our developed methods, we would also make use of extant tools such as R packages, plink, and the hdf5 library for creating completely indexed files.

Datasets:
Genotype calls on all individuals
Questionnaire Data
Demographic Data

Height, Weight, Sex, Age, pre-calculated BMI if available
Occupational Exposures
Top 10 Leading Causes of Death
Heart disease:
Cancer:
COVID-19:
Accidents (unintentional injuries):
Stroke (cerebrovascular diseases):
Chronic lower respiratory diseases:
Alzheimer’s disease:
Diabetes:
Influenza and pneumonia:
Nephritis, nephrotic syndrome, and nephrosis:

Anticipated Findings

We anticipate validation of GxE hits our PEGS cohort here at NIEHS and in the UK biobank.
We anticipate demonstration of the utility of a novel algorithm for calculating vQTLs in binary traits (here to for not available).
We anticipate logistic regression algorithm to perform an order of magnitude faster than existing methods with more utility for complex regression modeling. All of these will benefit 1) future investigators in the Allofus workspaces and 2) validate findings in UKbiobank for a vQTL catalog of binary traits variants most likely to be involved in GxE (candidate variant analysis for future studies).

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • John House - Other, National Institute of Environmental Health Sciences (NIH - NIEHS)
1 - 1 of 1
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.