CLASS-L Archives

April 2007


Options: Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
Hillel Coren <[log in to unmask]>
Mon, 30 Apr 2007 14:43:19 -0400
text/plain; charset="ISO-8859-1"
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
text/plain (55 lines)
Let me start by saying thank you for taking the time out to help, it is very
much appreciated.

>It appears that you want to predict a continuous variable rather a 
>nominal level one so I don't see it as a classification problem.

Ideally yes, but i'd be happy to use some sort of discretization process to
convert the sales conversion to a class (ie, low, medium & high)

>Do you have a limited set of words that applies to every case? "top" 
>"10" . . . ? with values yes and no or yes/no/does not apply?

The set of words would continue to grow as more articles are analyzed

>How many cases (entities, records, lines) do you have in your data set?

About 50,000

>How many variables (attribute fields, columns)) to you want to use as 
>predictors? Do you have the one nominal level predictor (publication) 
>and 6 dichotomous predictors only?

Based on my research it seemed like converting each word into an attribute
was the way to go (akin to a customers shopping cart which has only a couple
of the stores many products)

>How many different values does the variable "publication" have?

About 1,000

>What does "sales conversion" rate mean?

For each article I know how many leads were produced and how many ended in
sales. The "sales conversion" is simply this ratio (higher is better)

>Why do you think having these words in the article field would be 
>predictive of sales conversion?

Currently we simply use a list of 200 keywords to decide which articles to
use. I'm pretty sure that this list can be improved. A good example is the
word 'doctor'. The product being sold are plaques (the kind that doctors
love to hand on their walls). I feel pretty sure that their are other
patterns in the data waiting to be pulled out.

CLASS-L list.