CLASS-L Archives

June 2005

CLASS-L@LISTS.SUNYSB.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Gilles CARAUX <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Date:
Thu, 30 Jun 2005 15:43:34 +0200
Content-Type:
text/plain
Parts/Attachments:
text/plain (128 lines)
Your problem seems to be formulated as the maximum weight bipartite matching and the assignment problem.
You are looking to assign n rows to n columns at the greatest total cost.
This problem can be solved by the popular Hungarian-type O(n3) algorithm.
Many references are available one the net.
Ref : H. Kuhn. The Hungarian method for the assignment problem. Naval Res. Logist. Q., 2:83--97, 1955.

Gilles

Pr. Gilles Caraux
Agro Montpellier / LIRMM
France

Le 20:17 29/06/2005, vous avez écrit:
>Let me restate the problem -- I am thinking there is an easier problem
>then what I have been receiving -- though thank you for those.
>
>We have two dendrograms on the same objects and want to find stopping
>rules to cut each such thazt the overlap in the partitions is maxium. My
>thought (which may make this a more complicated problem) was to generate a
>contingency table of the two partitions and reorder the table to put the
>largest intersection counts on the diagonal.
>
>
>Is there a simpler rule to identify which clusters from the two tree have
>maximum overlap?
>
>Thanks
>
>
>Bill
>---
>
>                        Biostatistics Consulting Center
>                http://ilya.wustl.edu/~shannon/bcc_announcement.pdf
>
>
>William D. Shannon, Ph.D.
>
>Associate Professor of Biostatistics in Medicine
>Division of General Medical Sciences and Biostatistics
>
>Washington University School of Medicine
>Campus Box 8005, 660 S. Euclid
>St. Louis, MO   63110
>
>Phone: 314-454-8356
>Fax: 314-454-5113
>e-mail: [log in to unmask]
>web page: http://ilya.wustl.edu/~shannon
>
>
>On Wed, 29 Jun 2005, Michael Brusco wrote:
>
>> This is a particular type of seriation problem for which optimal solutions
>> can typically be obtained for rather large problems.  Depending on computer
>> RAM, Dynamic programming can provide an optimal solution for problems up to
>> about 25x25 (512MB or RAM).  Branch-and-bound can provide guaranteed
>> optimal solutions for even larger problems and is not encumbered by memory
>> limitations.  However, the branch-and-bound approach is more sensitive to
>> the data entries in the matrix and can require more CPU time.  I've also
>> been able to solve some rather large problems using integer
>> programming.  There are a number of references that focus on seriation
>> problems related to the dominance index you've mentioned, anti-Robinson
>> gradient indices, and least-squares unidimensional scaling.  All of these
>> are amenable to optimal solution via branch-and-bound or dynamic
>> programming.  I can send you some fortran programs if you like.
>>
>> Citations:
>> Hubert, Arabie, & Meulman (2001),  Combinatorial Data Analysis:
>> Optimization by Dynamic Programming, SIAM.
>> Brusco & Stahl (in press), Branch-and-Bound Applications in Combinatorial
>> Data Analysis, Springer.
>> Hubert & Golledge (1981), Psychometrika - dynamic programming
>> Brusco & Stahl (2001), Psychometrika, pp. 5-24 - dynamic programming
>> Brusco (2002) Psychometrika, pp. 459-471 - branch and bound (anti-Robinson
>> gradients)
>>
>> Mike
>>
>> At 06:53 PM 6/29/2005 +0200, you wrote:
>> >If your matrix can be reorganize as an Robinson matrix, you have a lot of
>> >good algorithms as I described in the manual of PermutMatrix, a software
>> >for seriation and clustering (http://www.lirmm.fr/~caraux/PermutMatrix/).
>> >I also give some references.
>> >For the general case, I think you have to perform every reordering if you
>> >want an exact solution.
>> >
>> >Gilles
>> >
>> >
>> >Le 18:18 29/06/2005, vous avez écrit:
>> > >Mighty mathematicians we have a question.
>> > >
>> > >We have small matrices of integers that we need to reorder so that the sum
>> > >of the diagonal is maximum.
>> > >
>> > >These matrices are relatively small (say 10x10 max) so we can brute force
>> > >our way by performing every reordering, but would rather not.
>> > >
>> > >Can anyone help we biostatisticians with an algorithm or theorem?
>> > >
>> > >
>> > >Bill
>> > >---
>> > >
>> > >                        Biostatistics Consulting Center
>> > >                http://ilya.wustl.edu/~shannon/bcc_announcement.pdf
>> > >
>> > >
>> > >William D. Shannon, Ph.D.
>> > >
>> > >Associate Professor of Biostatistics in Medicine
>> > >Division of General Medical Sciences and Biostatistics
>> > >
>> > >Washington University School of Medicine
>> > >Campus Box 8005, 660 S. Euclid
>> > >St. Louis, MO   63110
>> > >
>> > >Phone: 314-454-8356
>> > >Fax: 314-454-5113
>> > >e-mail: [log in to unmask]
>> > >web page: http://ilya.wustl.edu/~shannon
>>
>> Michael J. Brusco, Marketing Department, College of Business, Florida State
>> University, Tallahassee, FL 32306-1110.  Voice: (850)644-6512, FAX:
>> (850)644-4098, email: [log in to unmask]
>>

ATOM RSS1 RSS2