Machine Learning: Clustering & Regression

Machine Learningintermediate~10 min

Apply k-means clustering to group unlabelled data, then use PCA to reduce dimensionality.

Step 1 — Generate labelled clusters

Create two Gaussian blobs separated in 2D space. We will then forget the labels and let k-means recover the grouping.

rng(42);
group1 = randn(50, 2);
group2 = [3 3] + randn(50, 2);
X = [group1; group2];
scatter(X(:,1), X(:,2));
title('Raw unlabelled data')
▶ Run in SimLab

Expected output: Two overlapping clouds of points

Step 2 — K-means clustering

kmeans(X, k) partitions the rows of X into k clusters, returning an index vector and the cluster centroids C.

rng(42);
X = [randn(50,2); 3+randn(50,2)];
[idx, C] = kmeans(X, 2);
scatter(X(:,1), X(:,2));
hold on;
scatter(C(:,1), C(:,2), 'r', 'filled');
legend('Points', 'Centroids');
title('K-means: k=2')
▶ Run in SimLab

Expected output: Scatter plot with two clusters and two red centroids

Step 3 — Linear regression with regress()

regress(y, X) fits a linear model by ordinary least squares. Include a column of ones in X for the intercept.

x = (1:50)';
y = 2*x + 5 + 10*randn(50,1);
X_des = [ones(50,1) x];
b = regress(y, X_des);
printf('Intercept: %.2f, Slope: %.2f\n', b(1), b(2));
plot(x, y, 'o', x, X_des*b);
legend('Data', 'Fit');
title('Linear Regression')
▶ Run in SimLab

Expected output: Intercept near 5, Slope near 2, fitted line through data

Related Tutorials

Try SimLab — Free MATLAB® Alternative

466 functions. Runs in your browser. No install.

Open SimLab

Stay Updated

Get notified about new simulations and tools. We send 1-2 emails per month.