Star Republic: Guide for Biologists

Microarray --- Hierarchical clustering

Hierarchical clustering is a commonly used statistical tool for exploring relationships in statistical data. It clusters data based on a user defined measure called "distance". "Similarities", "correlation", are sometimes used in place of "distances", because users' definition of "distance" is related to "similarities" or "correlation". There are a large number of variants of hierarchical clustering. The differences are in the way distances are defined and computations (e.g., average-linkage, top-down) are implemented.

Hierarchical clustering can be used to cluster genes or samples in microarray experiments. It has been integrated into most academic and commercial microarray analysis software packages. Hierarchical clustering is also standard component of statistical software such as SAS, S-plus, and R.