Numerical meaning of Lightgbm Gain feature importance

15 hours ago 1
ARTICLE AD BOX

I am using lightgbm to solve a binary classification problem. After training process, I use gain importance to analyse the model. The top 20 feature importance (gain) looks like the following

| Feature | Importance (Gain) | |:-----------------|--------------------:| | hour_mode | 12689.9 | | city_mode | 9648.09 | | weekend_ratio | 2498.07 | | domain_tfidf_766 | 1171.88 | | domain_tfidf_159 | 1099.91 | | domain_tfidf_221 | 962.452 | | domain_tfidf_665 | 962.445 | | domain_tfidf_49 | 908.868 | | domain_tfidf_807 | 803.887 | | domain_tfidf_455 | 790.627 | | domain_tfidf_384 | 712.87 | | domain_tfidf_670 | 699.977 | | domain_tfidf_116 | 699.764 | | domain_tfidf_897 | 687.591 | | domain_tfidf_2 | 673.928 | | domain_tfidf_151 | 661.747 |

However, I can't understand the meaning of the numerical values of the Gain imporance: How large can those values be considered as "very imporant" or "not important"? Can anyone help me?

Some pieces of code are as follows:

gbm_fixed = lgb.train(params, train_data, num_boost_round=150) importances = gbm_fixed.feature_importance(importance_type='gain') tfidf_cols = [f"domain_tfidf_{i}" for i in range(X_tfidf.shape[1])] feature_names = numeric_cols + cat_cols + tfidf_cols importance_df = pd.DataFrame({ 'Feature': feature_names, 'Importance (Gain)': importances }).sort_values(by='Importance (Gain)', ascending=False).reset_index(drop=True) print(importance_df.head(20).to_markdown(index=False))
Read Entire Article