## CLASS-L@LISTS.SUNYSB.EDU

 Options: Use Proportional Font Show Text Part by Default Show All Mail Headers Message: [<< First] [< Prev] [Next >] [Last >>] Topic: [<< First] [< Prev] [Next >] [Last >>] Author: [<< First] [< Prev] [Next >] [Last >>]

 Subject: Re: Score Matrix From: "J. Douglas Carroll" <[log in to unmask]> Reply To: Classification, clustering, and phylogeny estimation Date: Wed, 2 May 2001 14:03:19 -0400 Content-Type: text/plain Parts/Attachments: text/plain (69 lines)
```I'm not familiar with the particular BLOSUM62 score matrix you are using, but
at least one way to normalize your matrix to meet desired constraints (often
used for "substitution data" in psychology, e.g., in studies of word
substitution) is to define a new similarity maeasure s*_ij as:

s*_ij = s_ij/[(s_ii)(s_jj)]**1/2

I am assuming here that your original similarity measure is such that
the off-diagonals are all smaller than the diagonals, at least within
rows and/or columns.  Again, without knowing more details of your original
measure, it's hard to say which normalization would be most appropriate!
You didn't even say, for example, whether the original measure was
symmetric or nonsymmetric, which is a very important factor.  If NOT
symmetric, and you want to symmetrize, you could define measure above,
but then take geometric mean of s*-ij and s*_ji if you want to symmetrize--
which is also a measure often used in psychological or linguistic studies
of confusion data.

Doug Carroll.

At 12:29 PM 05/02/2001 -0500, William Shannon wrote:
>Hi experts,
>
>We have the following problem in scoring multiple aligned proteins and
> generating a similarity matrix.
>
>For i = 1,2,...,20 amino acids, a substition of amino acid i -> j results in a
> similarity s_ij. Note that s_ii >> 0 for most amino acids (i.e., amino acid i
> has similarity to itself of s_ii >> 0).
>
>We can generate a similarity 'score' matrix for a set of N proteins but the
> diagonals >> 1. We would like to scale this matrix so each diagonal is 1, and
> each off-diagonal element is 0 <= s_ij <= 1.
>
>Thanks for any suggestions.
>Bill
>
>PS -- for protein alignment experts we are using the BLOSUM62 score matrix and
> working with already multiple aligned proteins.
>
>--
>
>William D. Shannon, Ph.D.
>
>Assistant Professor of Biostatistics in Medicine
>Division of General Medical Sciences and Biostatistics
>
>Washington University School of Medicine
>Campus Box 8005, 660 S. Euclid
>St. Louis, MO   63110
>
>Phone: 314-454-8356
>Fax: 314-454-5113
>web page: http://ilya.wustl.edu/~shannon
>
>

######################################################################
# J. Douglas Carroll, Board of Governors Professor of Management and #
#Psychology, Rutgers University, Graduate School of Management,      #
#Marketing Dept., MEC125, 111 Washington Street, Newark, New Jersey  #
#07102-3027.  Tel.: (973) 353-5814, Fax: (973) 353-5376.             #
# Home: 14 Forest Drive, Warren, New Jersey 07059-5802.              #
# Home Phone: (908) 753-6441 or 753-1620, Home Fax: (908) 757-1086.  #