Evaluating the effect of right-censored endpoint transformation for dimensionality reduction of radiomic features of oropharyngeal cancer patients
Full recordShow full item record
AbstractRadiomics is the process of extracting quantitative features from tomographic images (computed tomography [CT], magnetic resonance [MR], or positron emission tomography [PET] images). Thousands of features can be extracted via quantitative image analyses based on intensity, shape, size or volume, and texture. These radiomic features can then be used in combination with demographic, disease, and treatment indicators to increase precision in diagnosis, assessment of prognosis, and prediction of therapy response. However, for models to be effective and the analysis to be statistically sound, it is necessary to reduce the dimensionality of the data through feature selection or feature extraction. Supervised dimensionality reduction methods identify the most relevant features given a label or outcome such as overall survival (OS) or relapse-free survival (RFS) after treatment. For survival data, outcomes are represented using two variables: time-to-event and a censor flag. Patients that have not yet experienced an event are censored and their time-to-event is their follow up time. This research evaluates the effect of transforming a right-censored outcome into binary, continuous, and censored aware representations for dimensionality reduction of radiomic features to predict overall survival (OS) and relapse-free survival (RFS) of oropharyngeal cancer patients. Both feature selection and feature extraction are considered in this work. For feature selection, eight different methods were applied using a binary outcome indicating event occurrence prior to median follow-up time, a continuous outcome using the Martingale residuals from a proportional hazards model, and the raw right-censored time-to-event outcome. For feature extraction, a single covariate was extracted after clustering the patients according to radiomics data. Three different clustering techniques were applied using the same continuous outcome and raw right-censored outcome. The radiomic signatures are then combined with clinical variables for risk prediction. Three metrics for accuracy and calibration were used to evaluate the performance of five predictive models and an ensemble of the models. Analyses were performed across 529 patients and over 3800 radiomic features. The data was preprocessed to remove redundant and low variance features prior to either selection or clustering. The results show that including a radiomic signature or radiomic cluster label predicts better than using only clinical data. Randomly generating signatures or generating signatures without considering an outcome results in poor calibration scores. Random forest feature selectors with the continuous and right-censored outcomes give the best predictive scores for OS and RFS in terms of feature selection while hierarchical clustering for feature extraction gives similarly predictive scores with compact representation of the radiomic feature space.