Clustering
- Groups similar data points to reveal patterns and relationships within a dataset.
- Commonly used to segment customers and to help detect anomalous transactions.
- Enables targeted actions (e.g., tailored marketing) and identification of outliers.
Definition
Section titled “Definition”Clustering is a machine learning technique that involves grouping a set of data points into distinct clusters based on their similarity, allowing for better understanding and analysis of the data.
Explanation
Section titled “Explanation”Clustering partitions data into groups of similar items so that items within the same cluster share common characteristics, while items in different clusters are less similar. By organizing data this way, clustering helps uncover patterns and relationships that support analysis and decision making.
Examples
Section titled “Examples”Customer segmentation
Section titled “Customer segmentation”A company with a large dataset containing demographics, purchasing history, and preferences can use clustering to group customers by similar characteristics and behaviors. For example, one cluster may consist of high-income customers who frequently purchase luxury items, while another may consist of budget-conscious customers who primarily purchase necessities. Identifying these clusters enables the company to tailor marketing and sales strategies for each segment.
Fraud detection
Section titled “Fraud detection”A financial institution can apply clustering to a dataset of transactions and customer information, grouping transactions by similarity in features such as amount, location, and time. This can surface potential fraudulent activity when transactions appear as outliers or belong to clusters with markedly different characteristics—for example, a cluster of small transactions at local stores considered normal versus a cluster of large transactions at foreign merchants flagged as potentially fraudulent.
Related terms
Section titled “Related terms”- Machine learning
- Customer segmentation
- Fraud detection
- Outliers / outlier detection