These papers propose scoring a clustering algorithm based on the biological similarity of the resulting clusters in some fashion, although all of them ignore the stability issue.
The index proposed in  is based on the idea of mutual information content between statistical clusters and biological attributes.
Naturally, the results may be quite varied (see, e.g.,  is used most often with microarray data sets (partly due to its early integration into existing software), the following algorithms are also generally considered to be solid performers in the clustering world and are freely available through various R .
Past evaluations of clustering algorithms have been of general (non-biological) nature.
A good clustering algorithm should have high BHI and moderate to high BSI.
We evaluated the performance of ten well known clustering algorithms on two gene expression data sets and identified the optimal algorithm in each case.The entropy is taken as a measure of information content and a filtered collection of all GO terms is used as attributes. used an ANOVA based test of equality of means amongst the cluster members to define their validation index.We evaluated the performance of ten well known clustering algorithms using this dual measures approach on two gene expression data sets and identified the optimal algorithm in each case.We use publicly available GO  tools and databases to obtain the functional information in our illustrative real data examples.For example, a good clustering algorithm ideally should produce groups with distinct non-overlapping boundaries, although a perfect separation can not typically be achieved in practice. Although popular statistical clustering algorithms (e.g., UPGMA) have often been reported to successfully produce clusters of functionally similar genes, it is important to make that requirement a part of the evaluation strategy in selecting one from a list of competing clustering algorithms.Some attempts in this direction have been made in recent years (e.g., ).The second performance measure is called a biological stability index (BSI).For a given clustering algorithm and an expression data set, it measures the consistency of the clustering algorithm's ability to produce biologically meaningful clusters when applied repeatedly to similar data sets.While past successes of such analyses have often been reported in a number of microarray studies (most of which used the standard hierarchical clustering, UPGMA, with one minus the Pearson's correlation coefficient as a measure of dissimilarity), often times such groupings could be misleading.More importantly, a systematic evaluation of the entire set of clusters produced by such unsupervised procedures is necessary since they also contain genes that are seemingly unrelated or may have more than one common function.