Exploring K-Means and Hierarchical Clustering for Effective Marketing Strategies
Background:
To what extent does a platform’s acquisition channel influence the learning outcomes of students?
Are there any geographical locations where most of students discover the platform, specifically through social media platforms like YouTube or Facebook?
In my upcoming project, I will be diving into real-world customer data to carry out market segmentation, a vital task for businesses aiming to grasp customer behavior and enhance marketing effectiveness.
This project will entail various stages such as data preprocessing, exploratory data analysis (EDA), feature engineering, applying clustering algorithms like k-means and hierarchical clustering, and interpreting the outcomes. Through the Customer Segmentation in Marketing with Python project, I aim to uncover the intricacies of customer behavior and pinpoint specific segments that can be addressed with tailored marketing approaches.
Methodology:
Hierarchical Clustering Dendrogram
K-means Clustering
Results:
Exploratory Data Analysis
The data's descriptive statistics (left) and the count of null values (middle) are displayed for additional exploratory data analysis. After replacing all null values with zeros, the dataset no longer contains any null values, as indicated on the right.
The initial 5 rows and data types of the dataset were displayed, which was subsequently utilized for additional exploration.
𝐂𝐨𝐫𝐫𝐞𝐥𝐚𝐭𝐢𝐨𝐧 𝐄𝐬𝐭𝐢𝐦𝐚𝐭𝐞
The correlation coefficient and the correlation heatmap illustrated the relationship between the minutes watched by students, customer lifetime value (CLV), region, and their social media channels.
Raw Data Visualization
The scatter plot above displays the raw dataset without the social channels labeled, which cannot distinguish properly.
Clustering
Hierarchical Clustering
The dendrogram has shown the corresponding distance with different groups of social media channels.
K-means Clustering
Applied elbow method to choose the appropriate number of clusters for K-means clustering, 8 clusters was chosen to the model.
The scatter plot depicted the relationship between students' minutes watched, their CLV, and the clusters associated with various social media channels.
Hey there! If you're curious, why not swing by my GitHub link below? Check it out for more details!