Integrative approaches to cancer data analysis remain an active field of research, where effective integration of heterogeneous data sources, like, clinical, morphologic, molecular data, etc. is becoming crucial for subtyping and treating cancer. Moreover, measurements need to be combined not only across patients but also across assay types, i.e. both horizontally and vertically. This makes integration a complex problem. Importantly, management of knowledge, data accessibility and useability, lack of standards and common interfaces are also well recognized challenges in bioinformatics.
We develop a computational framework for integration of heterogeneous data, where relations between structurally unrelated data sources are inferred both from the data themselves, as well as from additional external sources, seamlessly facilitating knowledge discovery. We develop an enhanced novel universal predictive parameter for survival time prediction in cancer patients, focusing here on the TCGA cancer data sets. Our framework applies multiple machine learning regression-based models and incorporates cross-validation methodologies for effective benchmarking.