## CLASS-L@LISTS.SUNYSB.EDU

#### View:

 Message: [ First | Previous | Next | Last ] By Topic: [ First | Previous | Next | Last ] By Author: [ First | Previous | Next | Last ] Font: Proportional Font

Subject:

a maximum likelihood clustering problem

From:

Classification, clustering, and phylogeny estimation

Date:

Thu, 19 Jun 2008 19:11:32 -0500

Content-Type:

text/plain

Parts/Attachments:

 text/plain (34 lines)
 ```I have what appears to be a fundamental clustering problem but I have not been able to find much relevant literature on it: Given an n-by-n matrix of probabilities P, find the maximum likelihood clustering solution represented by an n-by-n binary matrix X where Xij = 1 if elements i and j are assigned to the same cluster, and 0 otherwise. That is, maximize product{ [Pij*Xij + (1-Pij)*(1-Xij)], over all pairs (i,j): i < j} subject to Xij + Xik + Xjk != 2 for all triples (i,j,k): i Pjk. Do you know of someone who has studied this problem? I am particularly interested in fast algorithms that solve it optimally (or near-optimally). If not, do you know of any closely related problems (beyond traditional clustering) that could shed some light on this? If not, maybe you can point me to a person who is likely to know? Vetle Torvik PS. We have developed a procedure that seems to work well when applied to author name disambiguation (Torvik et al., A probabilistic similarity metric for Medline records: a model for author name disambiguation. JASIST 2005). This paper describes a model to estimate P. To find a clustering solution X, we first correct all transitivity violations by weighted least squares, and then perform agglomerative clustering (iteratively merge the pair of clusters that give the greatest instantaneous increase in likelihood until the likelihood hits max). ---------------------------------------------- CLASS-L list. Instructions: http://www.classification-society.org/csna/lists.html#class-l```