Hi, in this original BLOSUM62 score matrix the identities of the amino acids are weighted differently (I suppose, this comes from their different distribution in nature). In the problem you described you want to scale all identities to 1 and you will lost this information, right? What are the priorities in this transformation? Do you want to get metric properties? Or do you want to sustain the different distribution of the amino acids? Can you give us few words about why do you want to transform this matrix and how do you want to use it? Uta Bohnebeck *************************************************************** Uta Bohnebeck Tel: +49-421-218-7838/ -7090 Universität Bremen Fax: +49-421-218-7196 TZI, IS / AG-KI [log in to unmask] Universitätsallee 21-23 Postfach 330 440 D-28334 Bremen --------------------------------------------------------------- http://www.informatik.uni-bremen.de/~bohnebec/home.html --------------------------------------------------------------- -----Ursprüngliche Nachricht----- Von: Classification, clustering, and phylogeny estimation [mailto:[log in to unmask]]Im Auftrag von William Shannon Gesendet: Mittwoch, 2. Mai 2001 19:40 An: [log in to unmask] Betreff: More protein stuff As a follow-up to my previous email the first 5 rows,columns of the score matrix are: > blosum62[1:5,1:5] A R N D C A 4 -1 -2 -2 0 R -1 5 0 -2 -3 N -2 0 6 1 -3 D -2 -2 1 6 -3 C 0 -3 -3 -3 9 Comparing sequences ARN to ADC gives similarity scores: (s_AA = 4) + (s_RD = -2) + (s_NC = -3) = -1 and ARN to itself (s_AA = 4) + (s_RR = 5) + (s_NN = 6) = 15 and ADC to itself (s_AA = 4) + (s_DD = 6) + (s_CC = 9) = 19 so the similarity matrix is 15 -1 -1 19 -- William D. Shannon, Ph.D. Assistant Professor of Biostatistics in Medicine Division of General Medical Sciences and Biostatistics Washington University School of Medicine Campus Box 8005, 660 S. Euclid St. Louis, MO 63110 Phone: 314-454-8356 Fax: 314-454-5113 e-mail: [log in to unmask] web page: http://ilya.wustl.edu/~shannon