I have 3 classes and hundreds of attributes each with ~50 values (continuous, but discretized based on quartiles).  After calculating information gain for each attribute I sort the information gain in descending order.  My goal is not to generate a tree, but rather to perform instance-based learning using the cumulative list of attributes selected. 

Something I have not seen in the literature is what to do if a majority of the attributes with the greatest information gain have less impurity but in one particular class.  Given this problem, is there a commonly used method for weighting or selecting attributes which are the purest for a class?  (I have tried selecting attributes with the greatest gain for each class, looping through 3 classes each time I select the next best attribute, and that seemed to work better than just selecting attributes with the greatest gain).  Do I need to "prune" unwanted attributes?  If so, are there any papers which show background methods and criteria for pruning unwanted attributes in instance-based learning?  

Last, another remaining question is that because my goal is not really to build a hierarchical tree, each time I select an attribute I use the accumulated attribute data and loop through all of the objects (train) in order to assign each object to the predicted class.  Each time I add an attribute, a confusion matrix is generated for classification of all the objects -- from which I obtain accuracy.  So I get a confusion matrix for the cumulative list of attributes at each step.  In this scenario, when should I stop selecting attributes?  Recall that I am not building a tree for which I can assess purity in each node, but rather picking off attributes to train and generate a confusion matrix in instance-based learning.