Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I need some tips to get started in the following essay please: Suppose you had i

ID: 396911 • Letter: I

Question

I need some tips to get started in the following essay please:

Suppose you had information from customers who shop at a grocery store, and you wanted to perform cluster analysis to identify groups of customers who have similar shopping patterns. The data that you have includes the age, income, and educational level of the customer, and the yearly amounts each customer purchases of the following food types: fruits, vegetables, milk, cereal, peanut butter, and bread. What are some of the data preparation steps that should be taken before performing cluster analysis? What distance measure should be used? Explain why you chose the distance measure. Discuss how the retailer could use the results of this cluster analysis to improve grocery sales.

Explanation / Answer

Some of the data preparation steps that should be taken up is as follows:

-       Arrange the individual/shopper names or any other unique identifier in the rows of the data set

-       The columns should have all the variables pertaining to the provided data of age, income, consumption amount etc.

-       In case of any missing value, it should either be estimated or removed

-       The data also needs to be standardized or scaled in order to make the variables comparable. By standardization, we intend to transform the variables so the they have zero mean and a standard deviation of 1

We should use the Correlation – based distance, because it considers two objects to be similar if their features are correlated to a great extent. Hence, grocery buyers having very strong correlation in their purchasing pattern will be clustered together for easy identification. Pearson’s correlation is quite sensitive to outliers, in order to mitigate this impact we can opt for Spearman’s correlation instead.

The cluster analysis, breaks up the entire customer information into major sets or clusters with similar patterns and behavior. These sets with a unique trait of its own can be used for Target marketing in order to improve conversions and ensure customer stickiness. Targeted mailers or campaigns could be directed towards these groups so that we can either increase their average ticket size by either cross-selling or up-selling products from the store. This further prevents customers to switch to competitors unless the competitor comes up with something really unique which lowers the customer’s switching cost significantly.

Say, for example we have identified a group of young males in the age group of 18-25, purchasing protein-rich milk of a fixed volume on a weekly basis. Targeted messages or mails can be sent to these protein loving fitness enthusiasts of offers of additional discount on the larger volume milk packs or cross sell other protein rich items like eggs.