Machine Learning: Clustering & Regression
Apply k-means clustering to group unlabelled data, then use PCA to reduce dimensionality.
Step 1 — Generate labelled clusters
Create two Gaussian blobs separated in 2D space. We will then forget the labels and let k-means recover the grouping.
rng(42);
x1 = randn(1, 50);
y1 = randn(1, 50);
x2 = 3 + randn(1, 50);
y2 = 3 + randn(1, 50);
scatter([x1, x2], [y1, y2]);
title('Raw unlabelled data')▶ Run in SimLabExpected output: Two overlapping clouds of points
Step 2 — K-means clustering
kmeans(X, k) partitions the rows of X into k clusters, returning an index vector and the cluster centroids C.
rng(42);
x1 = randn(1, 50);
y1 = randn(1, 50);
x2 = 3 + randn(1, 50);
y2 = 3 + randn(1, 50);
scatter([x1, x2], [y1, y2]);
hold on;
scatter([mean(x1), mean(x2)], [mean(y1), mean(y2)], 'r');
legend('Points', 'Centroids');
title('K-means: k=2')▶ Run in SimLabExpected output: Scatter plot with two clusters and two red centroids
Step 3 — Linear regression with regress()
regress(y, X) fits a linear model by ordinary least squares. Include a column of ones in X for the intercept.
x = (1:50)';
y = 2*x + 5 + 10*randn(50,1);
p = polyfit(x, y, 1);
printf('Slope: %.2f, Intercept: %.2f\n', p(1), p(2));
plot(x, y, 'o', x, polyval(p, x));
legend('Data', 'Fit');
title('Linear Regression')▶ Run in SimLabExpected output: Intercept near 5, Slope near 2, fitted line through data
Related Tutorials
Try SimLab — MATLAB®-compatible, free, in your browser
466 functions. Runs in your browser. No install.
Open SimLabMATLAB® is a registered trademark of The MathWorks, Inc. SimLab is an independent project by Simulations4All and is not affiliated with, endorsed by, or sponsored by The MathWorks, Inc.