Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: I have updated the table for dimensionality reduction. There is not much available on the surface level to differentiate between the two. I have included everything that I could find.


NameComments on ApplicabilityReference
Hierarchical Clustering
  1. (N-1) combination of clusters are formed to choose from.
  2. Expensive and slow. n×n  distance matrix needs to be made.
  3. Cannot work on very large datasets.
  4. Results are reproducible.
  5. Does not work well with hyper-spherical clusters.
  6. Can provide insights into the way the data pts. are clustered.
  7. Can use various linkage methods(apart from centroid).

  1. Pre-specified number of clusters.
  2. Less computationally intensive.
  3. Suited for large dataset.
  4. Point of start can be random which leads to a different result each time the algorithm runs.
  5. K-means needs circular data. Hyper-spherical clusters.
  6. K-Means simply divides data into mutually exclusive subsets without giving much insight into the process of division.
  7. K-Means uses median or mean to compute centroid for representing cluster.

Gaussian Mixture Models
  1. Pre-specified number of clusters.
  2. GMs are somewhat more flexible and with a covariance matrix we can make the boundaries elliptical (as opposed to K-means which makes circular boundaries).
  3. Another thing is that GMs is a probabilistic algorithm. By assigning the probabilities to data points, we can express how strong is our belief that a given data point belongs to a specific cluster.
  4. GMs usually tend to be slower than K-Means because it takes more iterations to reach the convergence. (The problem with GMs is that they have converged quickly to a local minimum that is not very optimal for this dataset. To avoid this issue, GMs are usually initialized with K-Means.)

  1. No pre-specified no. of clusters.
  2. Computationally a little intensive.
  3. Cannot efficiently handle large datasets.
  4. Suitable for non-compact and mixed-up arbitrary shaped clusters.
  5. Uses density-based clustering. Cannot work well with density varying data points.
  6. Not effected by noise or outliers.

Linear Discriminant Analysis

It is used to find a linear combination of features that characterizes or separates two or more classes of objects or events.

LDA is a supervised

LDA is also used for clustering sometimes. And almost always outperforms logistic regression.

Principle Component Analysis

It performs a linear mapping of the data from a higher-dimensional space to a lower-dimensional space in such a manner that the variance of the data in the low-dimensional representation is maximized.

PCA is unsupervised