Pitfalls in re-analysis of observational omics studies; a post-mortem of the human pathology atlas

Abstract

Uhlen et al. (Reports, 18 august 2017) published an open-access resource with cancer-specific marker genes that are prognostic for patient survival in seventeen different types of cancer. However, their data analysis workflow is prone to the accumulation of false positives. A more reliable workflow with flexible Cox proportional hazards models employed on the same data highlights three distinct problems with such large-scale, publicly available omics datasets from observational studies today; (i) re-analysis results can not necessarily be taken forward by others, highlighting a need to cross-check important analyses with high impact outcomes; (ii) current methods are not necessarily optimal for the re-analysis of such data, indicating an urgent need to develop more suitable methods; and (iii) the limited availability of potential confounders in public metadata renders it very difficult (if not impossible) to adequately prioritize clinically relevant genes, which should prompt an in-depth discussion on how such information could be made more readily available while respecting privacy and ethics concerns.

Publication
bioRXiv
Jeroen Gilis
Jeroen Gilis
PhD candidate in data science

My research interests include machine learning, metabolic engineering and data science.