Using machine learning, you’ll analyze trends in danceability, energy, valence

14 hours ago 1
ARTICLE AD BOX

Overview
Welcome to mlX 2.0 Regression Challenge!

In this competition, your mission is to predict a song’s popularity score (0-100) using real-world music data—from audio features and artist stats to track metadata. Using machine learning, you’ll analyze trends in danceability, energy, valence, and more to train a model that forecasts how listeners will react before a song even hits the charts.

The dataset includes real Billboard-tracked tracks, and your goal is to build the most accurate regression model possible. Can your algorithm decode the secret formula behind viral hits, or will it flop harder than a one-hit wonder?

The Challenge:

Predict continuous popularity scores (not just hits/flops).
Uncover hidden patterns in audio and artist data.
Good luck, and may your model top the (leaderboard) charts!
questions/79915177/dlcs1972-machine-learning

# split features and target variable

df["first_name"] = df.Name.str.split(" ").map(lambda x: x[0])

#drop values

df.drop(['Name', 'Cabin','Ticket'], axis=1, inplace=True)

#### label encoding for target variable and others

from sklearn.preprocessing import LabelEncoder

labelencoder = LabelEncoder()

df['Sex'] = labelencoder.fit_transform(df['Sex'])

#### seprete the x,y in train data set

data = df.values

X = data[:,0:8]

Y = data[:,8]

############seperate the numerical and categorical columns

# Numerical columns

num_cols = df.select_dtypes(include=['int64', 'float64']).columns

# Categorical columns

cat_cols = df.select_dtypes(include=['object']).columns

##feature selection

from sklearn.linear_model import LogisticRegression

from sklearn.feature_selection import RFE

model_lr = LogisticRegression()

recur_fe = RFE(model_lr, n_features_to_select=4)

features = recur_fe.fit(X, Y)

print(features.n_features_)

print(features.support_)

print(features.ranking_)

######feature selection

from sklearn.linear_model import Ridge

ridge_re = Ridge(alpha=1.0)

ridge_re.fit(X, Y)

def print_coefs(coef,names = None,sort = False):

if names is None: names = \["X%s" % x for x in range(len(coef))\] lst = zip(coef,names) if sort: lst = sorted(lst,key=lambda x:-np.abs(x\[0\])) return "+".join("%s \* %s" % (round(coef,3),name) for coef,name in lst)

print_coefs(ridge_re.coef_,names=df.columns[:-1],sort=True)

###########visualization of feature importance

import matplotlib.pyplot as plt

import seaborn as sns

feature_importance = pd.Series(ridge_re.coef_, index=df.columns[:-1])

feature_importance.nlargest(10).plot(kind='barh')

##########K-NN clussification

from sklearn.neighbors import KNeighborsClassifier

2️⃣ Prepare data

# X = features, y = target

X = train.drop('target', axis=1)

y = train['target']

3️⃣ Split data (for testing)

from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)

4️⃣ Create K-NN model

model = KNeighborsClassifier(n_neighbors=5) # K = 5 is common

5️⃣ Train model

model.fit(X_train, y_train)

6️⃣ Make predictions

pred = model.predict(X_val)

7️⃣ Evaluate accuracy

from sklearn.metrics import accuracy_score

print("Accuracy:", accuracy_score(y_val, pred))

########################################################

Feature scaling is CRUCIAL

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_val = scaler.transform(X_val)

Without scaling, features with large ranges dominate distance calculation

Distance metric

Default: Euclidean

Can try Manhattan (metric='manhattan') if it fits data

🏎 4. Predict on test CSV

# scale test data

test_scaled = scaler.transform(test)

predictions = model.predict(test_scaled)

# save predictions

import pandas as pd

submission = pd.DataFrame({'Prediction': predictions})

submission.to_csv('submission.csv', index=False)

##############################

######scaling

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

# split

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)

# scale

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_val = scaler.transform(X_val)

##########logical regression

model = LogisticRegression()

model = LogisticRegression(max_iter=1000)

model.fit(X_train, y_train)

###RandomForestClassifier

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=100)

Read Entire Article