Running cross-validation

Magnus Lie Hetland magnus at hetland.org
Sun Sep 8 10:41:17 EDT 2002


Hi!

I'm working on a classifier developed in GALib, and I'd like to do
some simple cross-validation to select the "winning" classifier
individual, in order to avoid overfitting. In other words, I'd like to
calculate the fitness of all individuals both on a dataset X and a
dataset Y, but only let the fitness from dataset X influence
evolution. The individual at the global minimum (optimum) for dataset
Y will then be chosen. (This is a fairly standard thing, I guess.)

What is the easiest way of achieving this with GALib? Should I
subclass GAGeneticAlgorithm and override something? Should I just
collect the best of each generation and then (after finishing the
evolution) simply evaluate the fitness on Y "manually"?

And... Does anyone have any good references for this sort of
cross-validation to avoid overfitting? (When searching for
cross-validation I only find material about subsequent testing; am I
using the wrong term?)

(Sorry if I've missed some obvious documentation on this...)

-- 
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org



More information about the galib mailing list