ARTICLE AD BOX
I am working with survival analysis methods benchmark and in my repository, there are around 43 datasets. Both numerical and categorical covariates are present in these datasets.
In some datasets, some covariates have high missing rate and some covariates have very low missing rates. I want to know which feature is highly correlated to my target variables. That will help me to understand whether removing that variable or imputing that covariate with KNN imputation will bias the method prediction.
As there are two target variables in survival analysis and one is event status and another is event time, I was unsure which one should be used as target variable for correlation analysis. But in time variable, some rows denote censoring time, so I decided to use event status vs feature correlation plot to understand the feature importance.
I am thinking to work a code like this to understand the analysis:
import pandas as pd # features X = df.drop(columns=["time","event"]) # correlation with event corr_event = X.corrwith(df["event"]) print(corr_event.sort_values(ascending=False))Also, to visualize the correlation with target variable to features, I am planning to use a heatmap plot as like below:
import seaborn as sns import matplotlib.pyplot as plt corr = df.corr() plt.figure(figsize=(10,8)) sns.heatmap(corr, cmap="coolwarm") plt.show()I want to ask whether my approach to identify importance of each feature with target variable using correlation plot make any sense. Any kind of your advice/suggestions will be of great help here.
I have already searched in the internet and found the following post which is not applicable for my case as I want to visualize the correlation between each feature and the target variable.
Estimating correlation with one variable of censored data in R
I have also read a paper titled "Prediction and Survival Analysis of Head and Neck Cancer in Patients Using Epigenomics Data and Advanced Machine Learning Methods" and found the following figure:
I think, this type of plot is required for my case. I have also read the following paper but did not reach to a conclusion.
I have also read this medium blog.
Understanding Feature Relationships Through Correlation Analysis: A Guide for Data Enthusiasts

