Hi Jim I have not received any response. The goal is to cluster the sibpairs and apply advanced genetic linkage programs to the sibpairs within a cluster. The assumptions are as follows: 1. There are mulitple genetic models in the population. If treated as a single homogeneous population we lose power to detect the genetic effect due to the mixture of heterogeneous groups. 2. The covariates are distributed differently across the genetic subgroups. Therefore clustering on the covariates will result in homogeneous subsets and the within-cluster genetic models will have more power to detect the genetic effect. It is essential in any partitioning that sibpairs are contained in the same subset since the genetic linkage models are fit to sibpair data. We are beginning to discuss this problem in my group and one suggestion I made is based on calculating convex hulls. For example age measured on the two siblings can be represented by two points ordered as (min(age), max(age)) and (max(age), min(age)). By representing all sibpairs data as the two points we can fit a convex hull to the sibpair. Similarly we can fit a convex hull around all the data using the two points for each pair of siblings. We might be able to measure distance as a function of the two areas coverend by the convex hulls. We hope to have some ideas developed for CSNA. Bill --- William D. Shannon, Ph.D. Assistant Professor of Biostatistics in Medicine Division of General Medical Sciences and Biostatistics Washington University School of Medicine Campus Box 8005, 660 S. Euclid St. Louis, MO 63110 Phone: 314-454-8356 Fax: 314-454-5113 e-mail: [log in to unmask] web page: http://ilya.wustl.edu/~shannon On Tue, 11 Mar 2003, F. James Rohlf wrote: > Have you received any responses yet? > > What are the distances based on? Genetic data or the height and weights you > mention? The answer to your question must depend on what you wish to do with > these distances. Do you want to cluster the entire matrix or just compute an > average distance between sibs vs. between families? > > Jim > > > -----Original Message----- > > From: Classification, clustering, and phylogeny estimation > > [mailto:[log in to unmask]]On Behalf Of shannon > > Sent: Saturday, March 01, 2003 4:09 PM > > To: [log in to unmask] > > Subject: Distance measure > > > > > > I have a dataset that I do not know how to calculate pairwise distances. > > > > In sibpair linkage analysis a unit of analysis is the sibpair (2 brothers, > > 2 sisters, or a brother and sister). The covariate vector contains > > information on each of the pair (e.g., two ages, two heights, two > > weights). There is no way to order these covariates. > > > > Consider the two ages on two sibpairs. Let age_ij be the age for the i^th > > sibling in the j^th sibpair. The data matrix could be in any of the > > following orders: > > > > Sibpair Age1 Age2 > > 1 age_11 age_21 > > 2 age_12 age_22 > > > > or 1 age_21 age_11 > > 2 age_12 age_22 > > > > or 1 age_11 age_21 > > 2 age_22 age_12 > > > > or 1 age_21 age_11 > > 2 age_22 age_21 > > > > We want to calculate the distance between the sibpairs since these are the > > units of analysis. > > > > We can arbitrarily invoke a rule (e.g., youngest is always the first age) > > and use standard distance measures -- but these are ad hoc and the genetic > > linkage people are unsatisfied with this (though it is done routinely). > > > > This is analogous to the difference between correlation and intraclass > > correlation. > > > > Any suggustions? > > > > Bill > > --- > > > > William D. Shannon, Ph.D. > > > > Assistant Professor of Biostatistics in Medicine > > Division of General Medical Sciences and Biostatistics > > > > Washington University School of Medicine > > Campus Box 8005, 660 S. Euclid > > St. Louis, MO 63110 > > > > Phone: 314-454-8356 > > Fax: 314-454-5113 > > e-mail: [log in to unmask] > > web page: http://ilya.wustl.edu/~shannon > > > >