Hierarchical clustering is a commonly used statistical tool for exploring relationships
in statistical data. It clusters data based on a user defined measure
called "distance". "Similarities", "correlation", are sometimes used in place of
"distances", because users' definition of "distance" is related to "similarities" or "correlation".
There are a large number of variants of hierarchical clustering. The differences are in the way
distances are defined and computations (e.g., averagelinkage, topdown) are implemented.
Hierarchical clustering can be used to cluster genes or samples in microarray experiments.
It has been integrated into most academic and commercial microarray analysis
software packages. Hierarchical clustering is also standard component of
statistical software such as SAS,
Splus, and R.
