Adding to this, I also observe that the problem here is that the strength
of information to decide about the distance differs dependent on the
sequences.
AC1 and BD1 look very different but are still compatible.
ABC1 and ABCD1 intersect on A and B but contradict on C. There is no
objective data driven way to decide which of these pairs should have a
higher distance. Hopefully this can be decided by subject matter
knowledge; if not, the data analyst has to make up a bold "random"
decision (or to use a formula that makes the decision for him so that he
can pretend objectivity).
Christian
On Thu, 24 Sep 2009, Christian Hennig wrote:
> Hi there,
>
> I would probably approach it like this.
>
> Nomenclature: "ACB1" means that AC were tried and didn't work and B then
> worked. So "CABD" means "all tried in the given order, none worked".
>
> Zero distance: two identical situations (same drugs were tried, the same one
> worked)
> (d(AB1,AB1), but also d(CBDA,CDAB); order of what was tried out and failed
> should generally be ignored... is this appropriate?)
>
> Small distance: same one worked. (Among these, distance should be smaller if
> there are also intersections in terms of "drugs tried and did not work")
> (d(ABC1,AC1) should be smaller than d(ABC1,C1) or d(ABC1,DC1), but probably
> not much smaller.)
>
> Intermediate: not the same drug worked (or in one case one worked and in the
> other one none of them), but there is intersection among the drugs that were
> tried and did not work.
> Distance should be increased if there is a drug that in one case worked but
> in one case was tried and didn't (creating an incompatibility of sequences in
> the same patient).
> (d(ABD1,ABC1)<d(AD1,ABC1)<d(AD1,ADC1))
>
> Large distance: different drugs worked (or none at all in one of the two
> cases), and there is no intersection, but it is still possible to put all of
> them together in a compatible way in a single patient:
> (d(BA1,CD1)  actually this may be assessed to be smaller than incompatible
> d(AD1,ADC1) from above with intersection.)
>
> Maximum distance: no intersection and incompatible.
> (d(ABCD,B1), d(AB1,BD1))
>
> Of course, if accepting this, a precise scaling is still needed (though if
> then methods are used that are invariant against monotone transformations,
> this probably doesn't matter too much.)
> I think that this summarises pretty much all the decisions that have to be
> made, and if possible subject matter knowledge and expert assessment should
> be used to make them.
>
> Just my two cents,
> Christian
>
>
>
>
>
> On Wed, 23 Sep 2009, Shannon, William wrote:
>
>> A follow up based on some questions I got from members of the lsit.
>>
>> The data will be a list of distinct 0's and 1's and missing values.
>> Suppose patient 1 received drug A with no effect and then drug B which was
>> effective  their data would be (0 1 Missing Missing). Patient 2 receives
>> drugs C and D with no effect but A works, and B is never given  their
>> data would be (1 Missing 0 0). Etc.
>>
>> Assume the columns or entries of the vectors corresponding to drug A B C D
>> where the entry is 0 if not effective, 1 if effective, and missing if not
>> given. Assume also the order of drug given is random.
>>
>> It may be order and number of ineffective drugs given should be ignored and
>> distance based on responding to the same drug or different drug.
>>
>> Thank you
>>
>> Bill Shannon, PhD
>> Associate Prof. of Biostatistics in Medicine
>> Washington University School of Medicine
>> Director, Biostatistical Consulting Center
>> 3144548356
>> ________________________________________
>> From: Shannon, William
>> Sent: Wednesday, September 23, 2009 11:44 AM
>> To: class l list ([log in to unmask])
>> Cc: Shannon, William; Farrokh Alemi
>> Subject: looking for a distance measure
>>
>> Hi Everyone
>>
>> I may be working with a data set that has the following structure and will
>> need to develop a distance measure. I have not had time to think carefully
>> about it but am hoping someone might have already worked with data like
>> this.
>>
>> Patients present to the doctor with a disease and it is unknown which of
>> four drugs they will respond to (the goal of this project is to improve the
>> ability to predict and be able to give the correct drug first). MD?s treat
>> these patients empirically ? give them drug A and see if they respond, if
>> not give them drug B and see if they respond, etc.
>>
>> We assume a patient either responds or does not, and that there is no carry
>> over or order of drug effect (i.e., if you respond to drug B it is
>> irrelevant if you had already had drug A). I also assume there is no set
>> order on which drugs are given first.
>>
>> The data for each patient will be a vector of 0?s for non response and a 1
>> for response, with the number of 0?s dependent on how many drugs were given
>> empirically before a response occurred.
>>
>> How do we calculate a pair wise distance matrix between pairs of patients
>> with this data?
>>
>>
>> Thank you.
>>
>> Bill Shannon, PhD
>> Associate Professor of Biostatistics in Medicine
>> Washington University School of Medicine
>> St. Louis, MO
>>
>> 3144548356
>> [log in to unmask]<mailto:[log in to unmask]>
>>
>> 
>> CLASSL list.
>> Instructions: http://www.classificationsociety.org/csna/lists.html#classl
>>
>
> ***  ***
> Christian Hennig
> University College London, Department of Statistical Science
> Gower St., London WC1E 6BT, phone +44 207 679 1698
> [log in to unmask], www.homepages.ucl.ac.uk/~ucakche
>
***  ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[log in to unmask], www.homepages.ucl.ac.uk/~ucakche

CLASSL list.
Instructions: http://www.classificationsociety.org/csna/lists.html#classl
