Author: Pedro Milanez-Almeida, Andrew J. Martins, Ronald N. Germain, John S. Tsang
Abstract: Disrupted molecular pathways are often robustly associated with disease outcome in cancer1,2,3. Although biologically informative transcriptional pathways can be revealed by RNA sequencing (RNA-seq) at up to hundreds of folds reduction in conventionally used coverage4,5,6, it remains unknown how low-depth sequencing datasets perform in the challenging context of developing transcriptional signatures to predict clinical outcomes. Here we assessed the possibility of cancer prognosis with shallow tumor RNA-seq, which would potentially enable cost-effective assessment of much larger numbers of samples for deeper biological and predictive insights. By statistically modeling the relative risk of an adverse outcome for thousands of subjects in The Cancer Genome Atlas7,8,9,10,11,12,13, we present evidence that subsampled tumor RNA-seq data with a few hundred thousand reads per sample provide sufficient information for outcome prediction in several types of cancer. Analysis of predictive models revealed robust contributions from pathways known to be associated with outcomes. Our findings indicate that predictive models of outcomes in cancer may be developed with dramatically increases in sample numbers at low cost, thus potentially enabling the development of more realistic predictive models that incorporate diverse variables and their interactions. This strategy could also be used, for example, in longitudinal analysis of multiple regions of a tumor alongside treatment for quantitative modeling and prediction of outcome in personalized oncology.