CLASS-L Archives

April 2007

CLASS-L@LISTS.SUNYSB.EDU

Options: Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Sender:
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
Date:
Mon, 30 Apr 2007 13:31:42 -0400
MIME-version:
1.0
Reply-To:
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
Content-type:
text/plain; charset=windows-1252; format=flowed
Subject:
From:
Art Kendall <[log in to unmask]>
In-Reply-To:
Organization:
Social Research Consultants
Content-transfer-encoding:
8BIT
Parts/Attachments:
text/plain (71 lines)
  I am not sure I understand your query, it looks like it may be a 
Categorical Regression or even an ordinary Multiple Regression application.
At this superficial look it appears that your data could be translated 
to a statistical package such as SPSS.
It appears that you want to predict a continuous variable rather a 
nominal level one so I don't see it as a classification problem.

Do you have a limited set of words that applies to every case? "top" 
"10" . . . ? with values yes and no or yes/no/does not apply?
How many cases (entities, records, lines) do you have in your data set?
How many variables (attribute fields, columns)) to you want to use as 
predictors? Do you have the one nominal level predictor (publication) 
and 6 dichotomous predictors only?
How many different values does the variable "publication" have?

What does "sales conversion" rate mean?

Why do you think having these words in the article field would be 
predictive of sales conversion?

Art Kendall
Social Research Consultants

Hillel Coren wrote:
> I'm working on the following problem and was hoping that someone in
> the forum could lend a hand. I've got the following data:
>
> publication | article | sales conversion
> ----------------------------------------------------
> forbes | top 10 businesses | 0.283
> newsweek | best 10 pet stores | 0.347
> … | … | …
>
> The goal is to use the publication and article to select future
> articles which we predict will have a high sales conversion rate. I've
> transformed the data into the following ARFF file.
>
> @resource articles
>
> @attribute publication {forbes, newsweek, ...}
> @attribute top {yes, no}
> @attribute 10 {yes, no}
> @attribute businesses {yes, no}
> @attribute best {yes, no}
> @attribute pet {yes, no}
> @attribute stores {yes, no}
> …
> @attribute sales_conversion NUMERIC
>
> @data
> {0 Forbes, 1 yes, 2 yes, 3 yes, 7 0.283}
> {0 newsweek, 4 yes, 2 yes, 5 yes, 6 yes, 7 0.283}
> …
>
> What do you feel would be the best way to approach this problem (I'm
> using WEKA)?
>
> Thank you for your help,
> Hillel
>
> ----------------------------------------------
> CLASS-L list.
> Instructions: http://www.classification-society.org/csna/lists.html#class-l
>
>
>   

----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l

ATOM RSS1 RSS2