Let me start by saying thank you for taking the time out to help, it is very
>It appears that you want to predict a continuous variable rather a
>nominal level one so I don't see it as a classification problem.
Ideally yes, but i'd be happy to use some sort of discretization process to
convert the sales conversion to a class (ie, low, medium & high)
>Do you have a limited set of words that applies to every case? "top"
>"10" . . . ? with values yes and no or yes/no/does not apply?
The set of words would continue to grow as more articles are analyzed
>How many cases (entities, records, lines) do you have in your data set?
>How many variables (attribute fields, columns)) to you want to use as
>predictors? Do you have the one nominal level predictor (publication)
>and 6 dichotomous predictors only?
Based on my research it seemed like converting each word into an attribute
was the way to go (akin to a customers shopping cart which has only a couple
of the stores many products)
>How many different values does the variable "publication" have?
>What does "sales conversion" rate mean?
For each article I know how many leads were produced and how many ended in
sales. The "sales conversion" is simply this ratio (higher is better)
>Why do you think having these words in the article field would be
>predictive of sales conversion?
Currently we simply use a list of 200 keywords to decide which articles to
use. I'm pretty sure that this list can be improved. A good example is the
word 'doctor'. The product being sold are plaques (the kind that doctors
love to hand on their walls). I feel pretty sure that their are other
patterns in the data waiting to be pulled out.