What genes are useful for
classification?
This is the classical machine-learning (statistical)
problem of feature (variable) selection.
Rank the genes according to their separation between
classes, using some kind of t-statistic, or
recursively choose the next best feature from the
list of unused features.
Consider a series of classifiers incorporating the first
p genes on the list, p = 1,2,…
Estimate the performance of the p-feature classifier
with leave-out-one analysis.
Pick the best performing p.