CLASS-L Archives

March 2006


Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Mel Janowitz <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Fri, 3 Mar 2006 10:23:34 -0500
TEXT/PLAIN (49 lines)

DIMACS Computational and Mathematical Epidemiology
                Seminar Series Presents


Title: Using cluster analysis to determine the influence of
epidemiological features on medical status of lung cancer patients

Speaker: Dmitriy Fradkin,

Date: Monday, March 13, 2006 12:00-1:30

Location: DIMACS Center, CoRE Bldg, Room 431,
          Rutgers University, Busch Campus, Piscataway, NJ


In this work we analyze lung cancer data, obtained from SEER,
for 217,558 patients diagnosed in 1988-2000. Each patient is
characterized by 23 epidemiological (essentially demographic)
and 22 medical features. The main idea of this analysis consists
in clustering the data in the space of epidemiological features
only, and analyzing influence of the epidemiological classification
on medical status of patients. The influence is estimated by using
the T-test to determine differences in the distributions of medical
features between clusters.

We partitioned the epidemiological part of data into 20 clusters.
Out of 190 cluster pairs, there are 2 pairs with only 1 distinguishing
medical feature and 4 pairs with 2 distinguishing features. All other
pairs differ in at least 3 medical features. We also found some
medical features that are not different in any pair of clusters,
and some that take distinct values in many clusters.

Such analysis indicates which medical aspects are most affected
by epidemiological status. On the other hand, it aids in finding
epidemiological subpopulation (clusters) that are very different
from others in their medical characterization.

This is a joint work with Dona Schneider and Ilya Muchnik

see: DIMACS Computational and Mathematical Epidemiology Seminar Series
2005 - 2006