Machine Learning: Clustering & Regression

Machine Learningintermediate~10 min

Apply k-means clustering to group unlabelled data, then use PCA to reduce dimensionality.

Step 1 — Generate labelled clusters

Create two Gaussian blobs separated in 2D space. We will then forget the labels and let k-means recover the grouping.

rng(42);
x1 = randn(1, 50);
y1 = randn(1, 50);
x2 = 3 + randn(1, 50);
y2 = 3 + randn(1, 50);
scatter([x1, x2], [y1, y2]);
title('Raw unlabelled data')
▶ Run in SimLab

Expected output: Two overlapping clouds of points

Step 2 — K-means clustering

kmeans(X, k) partitions the rows of X into k clusters, returning an index vector and the cluster centroids C.

rng(42);
x1 = randn(1, 50);
y1 = randn(1, 50);
x2 = 3 + randn(1, 50);
y2 = 3 + randn(1, 50);
scatter([x1, x2], [y1, y2]);
hold on;
scatter([mean(x1), mean(x2)], [mean(y1), mean(y2)], 'r');
legend('Points', 'Centroids');
title('K-means: k=2')
▶ Run in SimLab

Expected output: Scatter plot with two clusters and two red centroids

Step 3 — Linear regression with regress()

regress(y, X) fits a linear model by ordinary least squares. Include a column of ones in X for the intercept.

x = (1:50)';
y = 2*x + 5 + 10*randn(50,1);
p = polyfit(x, y, 1);
printf('Slope: %.2f, Intercept: %.2f\n', p(1), p(2));
plot(x, y, 'o', x, polyval(p, x));
legend('Data', 'Fit');
title('Linear Regression')
▶ Run in SimLab

Expected output: Intercept near 5, Slope near 2, fitted line through data

Related Tutorials

Try SimLab — MATLAB®-compatible, free, in your browser

466 functions. Runs in your browser. No install.

Open SimLab

MATLAB® is a registered trademark of The MathWorks, Inc. SimLab is an independent project by Simulations4All and is not affiliated with, endorsed by, or sponsored by The MathWorks, Inc.

Stay Updated

Get notified about new simulations and tools. We send 1-2 emails per month.