Customer segmentation

In today’s competitive travel industry, understanding customer behavior is essential for business growth. This project analyzes a dataset of 2,000 customers to segment them based on demographic and behavioral attributes such as age, gender, and income.

Used Python for data processing, exploratory data analysis, and visualization. Additionally, machine learning techniques, specifically clustering algorithms like K-Means++ and Agglomerative Clustering, were applied to identify distinct customer segments. These methods allowed us to uncover patterns and relationships within the data, which informed the development of targeted marketing strategies tailored to each customer group.

Skills:

Python
Clustering analysis
Feature engineering
Cluster profiling
Statistical analysis
Data visualization

Exploratory data analysis

In Graph 1, illustrated the proportion of different demographic categories. 60.4% of customers are female. Half are Single, and the other half are Non-Single (divorced, separated, married, or widowed).

Regarding education, 43.8% have a high school education, 37.9% attended university, and 8.7% completed graduate school, while 9.6% are in the other/unknown category.

In terms of location, most customers live in small cities (56.5%), followed by big cities (39.9%), with only 3.6% in mid-sized cities.

For occupation, 39.6% are skilled employees or officials, 49.6% are unemployed or unskilled, and the rest work in management, are self-employed, or are highly qualified employees.

In Graph 3, we can observe that most customers live in big or small cities. Big city residents are mostly female, while small city residents have an even gender balance. The majority of those with university or graduate school education are women. Regarding occupations, most small city residents are unemployed, while big city residents are predominantly skilled employees or in other occupations, with most skilled employees being women. Additionally, the histograms show that women and big city residents tend to have higher incomes than their counterparts.

Defining optimal number of clusters

Elbow method

After standardization, used the Elbow Method to find the optimal number of clusters. In our case, the elbow is clearly visible at 3 clusters (as shown in Graph 4), indicating that 3 is the optimal number of clusters for this dataset.

Silhouette plot

Since the Elbow method indicates that optimal cluster number is 3, we plotted silhouette plot for 4, 3, and 2 clusters.

For 2 clusters, the silhouette average is 0.54, which is higher than both the 0.45 for 4 clusters and 0.43 for 3 clusters. The higher silhouette score suggests that the 2-cluster solution provides the best-defined separation among the clusters. Therefore, we will proceed our analysis with 2 clusters.

Estimating clusters using K-means++

Table 3 presents the characteristics of the two clusters identified by the K-means++ algorithm.

Cluster 1 consists of 976 customers. It is characterized by a middle-aged, non-single woman, who lives in a big city and has completed university education. She is a skilled employee with an above-average income, earning approximately $173,460 annually.

Cluster 2 includes 1024 customers. This cluster is represented by a single man in his early 30s, with a high school education. He lives in a small city and is unemployed, earning a below-average income of around $103,256 per year.

Estimating clusters using Agglomerative clustering technique.

Table 4 presents the characteristics of two clusters identified by the K-Means++ algorithm.

Cluster 1 consists of 1163 customers. It is represented by a non-single woman in her late 40s who lives in a big city and with higher education and income (around $168,085).

Cluster 2 includes 837 customers. This cluster is characterized by a single man in his early 30s, who lives in a small city and is unemployed, with a lower education and income (around $95,041).

Marketing recommendations

For Cluster 1 and Cluster 2, both methods characterize the segment with same profile, with very slight difference in the exact figures of income, age and customer count. Despite small differences, the overall profiles of the clusters from both methods are similar. This consistency supports the segmentation of the customer base into clear groups.

Cluster 1:

Offer more exclusive and high-end travels and services.
Offer child friendly travel plans.

Cluster 2:

Offer affordable packages, discounts, and deals.
Promote travels and services related to skill development or career growth.
Referral programs offering discounts or rewards for bringing in new customers can help expand the customer base.
Location-Specific regional promotions targeting smaller cities.