# Agglomerative Hierarchical Clustering in Data Mining

One of the benefits of hierarchical clustering is that you don’t need to already know the number of clusters k in your data in advance. Sadly, there doesn’t seem to be much documentation on how to actually use scipy’s hierarchical clustering to make an informed decision and then retrieve the clusters.

Agglomerative hierarchical clustering is a clustering algorithm that builds a cluster hierarchy from the bottom-up. It starts by adding a cluster for each of the data points to be clustered, followed by iterative pair-wise merging of clusters until only one cluster is left at the top of the hierarchy. The choice of clusters to merge at each iteration is decided based on a distance metric.

In this algorithm bottom up method is used.When the similar type of data is found ,it is combined to form a single cluster.This step is repeated at each level. At the end , root is found after all the data is combined into a single cluster. This is done in the form of a dendrogram. A graph is built with the help of the given data .

In the agglomerative hierarchical approach, we define each data point as a cluster and combine existing clusters at each step. Here are four different methods for this approach:

1. Single Linkage: In single linkage, we define the distance between two clusters as the minimum distance between any single data point in the first cluster and any single data point in the second cluster. On the basis of this definition of distance between clusters, at each stage of the process we combine the two clusters with the smallest single linkage distance.
2. Complete Linkage: In complete linkage, we define the distance between two clusters to be the maximum distance between any single data point in the first cluster and any single data point in the second cluster. On the basis of this definition of distance between clusters, at each stage of the process we combine the two clusters that have the smallest complete linkage distance.
3. Average Linkage: In average linkage, we define the distance between two clusters to be the average distance between data points in the first cluster and data points in the second cluster. On the basis of this definition of distance between clusters, at each stage of the process we combine the two clusters that have the smallest average linkage distance.
4. Centroid Method: In centroid method, the distance between two clusters is the distance between the two mean vectors of the clusters. At each stage of the process we combine the two clusters that have the smallest centroid distance.
5. Ward’s Method: This method does not directly define a measure of distance between two points or clusters. It is an ANOVA based approach. One-way univariate ANOVAs are done for each variable with groups defined by the clusters at that stage of the process.  At each stage, two clusters merge that provide the smallest increase in the combined error sum of squares.

This techniques have been used as many clustering problem domain. Usually researchers during their PhD come accros Agglomerative Hierarchical Clustering to cluster their data