CLASS-L Archives

March 2002

CLASS-L@LISTS.SUNYSB.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Luca I.G. Toldo" <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Date:
Mon, 11 Mar 2002 07:48:47 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (126 lines)
Dear Philip,

A) Dendogram as trees

Here I assume that one handles "dendogram" as result of a hierarchical
clustering method.
The problem of evaluating cluster quality or cluster validity has not yet
been solved entirely
and in my personal opinion it is very unlikely that a "general solution"
exists to asses cluster quality.
However, you might want to look into the work of Laura Mather (JASIS, 2000)
where she described
a linear algebra measure of cluster quality. Be aware that her measure
relies on a specific property of
the document-term matrix being clustered and therefore read also my comment
on JASIST 2001.
I have implemented her method in Java and published its application in a
bioinformatics conference
last year (MGED3).

B) Dendogram as ontology
Here I assume that one handles "dendogram" as a manually or automatically
generated ontology
whereby the links between the nodes (or in other words the edges between
vertices) may belong
to different semantic types.
This is the task of  "comparing ontologies". Here the matter gets even more
complicated because you have to
build e.g. semantic distances which is itself a topic still lacking a
general valid solution. I have found the recent work
by Maedche and Staab be instructive and perhaps you might want to use that
as starting point for your further explorations.

Regards

Luca Toldo
[log in to unmask]

References
Mather LA (2000) JASIS 51(7):602-613
Maedche A, Staab S (2001) http://citeseer.nj.nec.com/439130.html
Toldo L(2001) JASIST 52(7):601-602
Toldo L (2001) MGED3 http://www.dnachip.org/mged3/mged3_abstracts_v4.pdf






[log in to unmask]@LISTS.SUNYSB.EDU am 09.03.2002 15:37:12

Bitte antworten an [log in to unmask]

Gesendet von:  [log in to unmask]


An:   [log in to unmask]
Kopie:

Thema:    Re: question: comparing dendrograms


No, having smaller clustering levels does not mean the clusters are
"better"
(if it did then everyone would only use the single-link method). It also
does not make much sense to use any other 'internal' criteria such as
whether you get the number of clusters you expect or how distinct the
clusters look in the dendrogram.

The criterion has to either be how well the dendrogram summarizes the
information in the original data or dissimilarity matrix or (in a
simulation) how well it reflects known relationships. One concept of
"better" is that the distance relationships among the objects as implied by
the dendrogram ("ultrametric distances" or "cophenetic values") are closely
related to the values in the dissimilarity matrix that you are clustering.
There has been some controversy about this in the past but I still find the
cophentic correlation (or some index in a similar spirit) to be the most
useful measure of how well a dendrogram fits a dissimilarity matrix --
especially when you want to take the actual clustering levels into account.
I suspect my response may generate some reactions from others!

A 'classic' reference would be the book by Sneath and Sokal but this
approach is described in other publications.

> -----Original Message-----
> From: Classification, clustering, and phylogeny estimation
> [mailto:[log in to unmask]]On Behalf Of Herriton
> Sent: Friday, March 08, 2002 10:57 AM
> To: [log in to unmask]
> Subject: question: comparing dendrograms
>
>
> I have a question on how to compare dendrograms:
>
> I have two dendrograms using two different sets of data.  They both have
> very similar clustering, say, clusters A, B, and C.  The
> difference lies in
> the distance between the members of each cluster.  Say, in cluster A, the
> distance between the members in one dendrogram is on average larger.
>
> I used Euclidean distance and Ward linkage.
>
> Is one dendrogram "better" than the other?  That is, if the
> distance between
> the members of a cluster is on average shorter in one dendrogram,
> does that
> mean the data used for this dendrogram cluster better?
>
> I'm not very familiar with clustering algorithms in general.  Pointers
and
> comments are much appreciated.
>
> Philip
>


#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################

ATOM RSS1 RSS2