[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: speed freak buys farm?



Howard-
Others have asked the question "Are you sure you running the right
algorithm?"  Although Koza has implemented his genetic programming
approach efficiently, I'm not convinced that genetic programming
is the best solution for your task.

I've read most of his papers, and looked at his book, (and seen the
videotape), and I agree that it's a very interesting research
paradigm, and has a lot of potential.  It overcomes a limitation of
typical genetic algorithms in that it works on trees rather than
bit-strings.  The great thing about genetic programming, is its
generality. it can solve a wide variety of problems, from learning
classification rules, curve fitting, robot control, game playing etc.
But I don't think you really care about this generality.  You have a
specific problem in mind:
>The particular problem that I am battling with evaluates a 1000 deep stack
>filter of window 7 across a test series of 600 data points

There's not enough information here to tell exactly what your task
is, but there may be a special purpose algorithm (function optimization?),
that can be used instead of the general purpose one.  For an area,
I know more about (learning classification rules), there are algorithms
(e.g., decision trees) that run thousands of times faster than genetic
programming, and produce more accurate rules.  I would imagine there
might be techniques (e.g., gradient descent, simulated annealing, etc.)
that might also be applied to your problem.

In a previous message, you wrote:
>Ultimately, you will end up with the 'fittest' (optimal) set of factors.

Actually, you'll learn one of a large family of functions that performs
perfectly on the fitness (or training) data.  However, like all
induction algorithms, there are no guarantees about how it will
perform on unseen data drawn from the same population.  A potential
problem with genetic programming is that the function learned is
typically quite complex.  There are reasons to believe (the minimum
description or Occam's razor), that simpler functions will be more
accurate than complex ones on unseen data.  For example, look at the
spiral example in Koza's book.  Although the learned function correctly
distinguishes which of two spirals each point lies on, the resulting
function (illustrated in gray in the figures) doesn't really look like
two interleaved spirals.

Koza rarely discusses how functions created by genetic programming
work on unseen data.  One excuse is that it would be difficult.  To do
this well would require running genetic programming a number of times
on different subsets of the data, and measuring the predictive
accuracy on unseen data.  With repeated trials, you could find the
average accuracy, and compare it to alternative approaches.  When each
trial takes a week on Quadra, I can understand why this isn't done
often.

By the way, it might be the case that a slight modification of genetic
programming would generalize better on unseen data.  The fitness
function typically used in this approach computes something (mean
squared error, accuracy, etc) on the a sample of the data.  A fitness
function that in addition penalized complex functions (e.g., by
subtracting the number of cons cells in the s-exp of the program)
might perform better.  I'll let you know in a few months.

Finally, if you are going to use genetic programming, you really
aren't restricted to lisp.  It is very simple to write an interpreter
in any language for the subset of lisp that genetic programming
program operates on.  All you need is constants and function
application. In fact, that's what the "fast-eval" function does.  For
example, you could write a C program that mutates and breeds
s-expressions, write the "terminal" functions in C, and write an
interpreter to "eval" them.


Mike

P.S.  The above messages isn't meant as a critcism of Koza.  He's
clearly made a significant research contribution.  However, until more
experience is gained with genetic programming, and it is compared to
alternative approaches in AI, (and statistics), I wouldn't recommend
purchasing any hardware to run it on a specific application.  In case
anyone from NSF or DARPA is reading, I'd strongly recommend buying
Koza a connection machine or a farm of Quadra so that he can continue
research in this area.