K Means Inverse Regression

TL;DR

Clusters data into K groups, then applies inverse regression to map each cluster into a low-dimensional subspace for analysis or visualization.
Enables visualization of cluster relationships (e.g., in a 2-dimensional space) to reveal patterns across groups.
Related technique: sliced inverse regression, which slices data into K slices before applying inverse regression.

Definition

K-means inverse regression is a method of dimension reduction that clusters data points into a set of K clusters and then uses inverse regression to map each cluster to a low-dimensional subspace.

Explanation

K-means inverse regression combines a clustering step with inverse regression for dimensionality reduction. First, the data are partitioned into K clusters (for example, using K-means). Next, inverse regression is applied to each cluster to produce a mapping from the cluster to a low-dimensional subspace. The resulting low-dimensional representations facilitate visualization and help reveal relationships among clusters.

Sliced inverse regression is a related approach: instead of clustering the data points directly, the data are partitioned into K slices (for example, by a variable such as price or volume), and inverse regression is applied independently to each slice.

Examples

Retail customer dataset

Consider a dataset of customer data from a retail store that includes each customer’s age, income, location, and purchasing habits. Using K-means inverse regression, one can first cluster the customers into K groups based on their characteristics. For instance, we might cluster them into groups based on their income level, or their location.

Next, inverse regression can map each cluster to a 2-dimensional subspace, where the first dimension represents the average income level of the cluster, and the second dimension represents the average location of the cluster. This enables visualization of the clusters in a low-dimensional space and helps to understand relationships between clusters (for example, identifying two clusters with high income levels and similar locations).

Financial portfolio (sliced inverse regression)

Consider a dataset of financial data from a portfolio of stocks that includes prices, volumes, and returns. Using sliced inverse regression, the data can first be sliced into K slices based on stock price or volume. For instance, slices might be defined by stock price range or volume range.

Inverse regression is then applied to each slice to map it to a 2-dimensional subspace, where the first dimension represents the average return of the slice and the second dimension represents the average volatility of the slice. Visualizing these slices in a low-dimensional space can reveal relationships such as slices with high returns and low volatility.

Use cases

Dimension reduction for complex datasets.
Visualizing clusters or slices in a low-dimensional space to reveal relationships.
Uncovering hidden patterns to inform data analysis and decision making.

Sliced inverse regression
Inverse regression
K-means (clustering)