I consider this a clustering problem (aren't all problems clustering problems?).  I have been trying to find a solution but can't find anything more sophisticated than pairwise t-tests which is less than optimal.

The problem we are attacking is the following.  In cancer epidemiology survival curves are estimated for different strata (i.e., different curves for different tumor types by tumor grade by gender by age category, etc.).  Rob Culverhouse and I have been publishing work on non-linear modeling in genetics and want to apply it to the analysis of this type of cancer survival data.

We are starting with lung cancer data (several 10's of thousands of records) with survival time/censoring as well as four tumor types (e.g., adenocacinoma, small cell) and 5 tumor grades (grades I - IV and unknown) giving us a 4 x 5 table.  Within each cell is a survival curve.

We would like to collapse these 20 cells into a smaller number such that cells collapsed together have equal survival functions.

Ideally I would like an analogous method to multiple comparisons in post-hoc anova or G^2 statistic (?) in log-linear modeling.

Any hints would be appreciated.


CLASS-L list.