Symbolic Regression
This is an open and independent investigation around the following question (see also this):
Can propositional theories be evolved?
Can evolution, through natural selection, select a theory that fits a set of un-anticipated but observed common sense (in the form of propositional facts) and make predictions that fit unobserved events?
For example, suppose we lived in a world with the following propositions:
PersonInFrontOfCar
YellowLight
RedLight
Policeman
Policecar
Slippery
Snow
Dry
Brake
And a large supply of observable situations of how these appear together, like:
YellowLight.
~RedLight.
~Snow.
Dry.
Policecar.
~PersonInFrontOfCar.
Brake.
Can a genetic algorithm evolve and select the following un-announced patterns to make predictions of future events (exclusively from the observations)?
PersonInFrontOfCar => Brake.
(YellowLight && Policeman && ~Slippery) => Brake.
Policecar => Policeman.
Snow => Slippery.
Slippery => ~Dry.
RedLight => Brake.
Winter => Snow.
Genetic Programming
Suppose our individuals are propositional theories, in the form of formulas, which start at random configurations. For example, each individual looks more or less like this:
- Individual 0:
Winter => Dry. Policeman => Brake.
- Individual 1:
Winter => ~Dry. Dry => Snow.
- Individual 2:
YellowLight => ~Dry && Snow.
- ...
- Individual n:
RedLight => YelloLight.
Each individual is evaluated by the degree they support the evidence: how many factual observations fit into the theory. For each situation, each theory is tested to see if it explains it.
For example, suppose we in situation 0
we observe Winter. ~Dry.
. The theory of Individual 0
directly conflicts with this evidence, making it less useful evolutionarily speaking. The theory Individual 1
, on the other hand, supports the evidence, making it more fit.
If natural selection occurred through a number of generations, would we arrive at?
PersonInFrontOfCar => Brake.
(YellowLight && Policeman && ~Slippery) => Brake.
Policecar => Policeman.
Snow => Slippery.
Slippery => ~Dry.
RedLight => Brake.
Winter => Snow.
Marginal Costs
Arriving at that alone isn't sufficient: the algorithm has to work for any input without any modification.
For example, if we changed the observations to a completely different scenario, can a foundational genetic algorithm, with zero marginal costs (i.e. without any recompilation cycle), derive theories that fit any propositional evidence?
For example, given:
cat_fur || dog_fur.
~thompson_allergy.
macavity_criminal.
Can the algorithm derive the following theory without any recompilation cycle from the past theory?
dog_fur => thompson_allergy.
cat_fur => macavity_criminal.
Related Work
- Symbolic Regression
- Symbolic Regression - An Overview
- Eureqa
- Automated Mathematician
- A large benchmark study of recent symbolic regression methods
Open Questions
- Would it work for first order logic too?
- Would it be able to derive taxonomies?
- Would it be able to deal with correlations?
- Would it derive rules that can't be understood because they are missing terms?
- Would the search space be so big that it would need guidance on where to look at?
- What are the algorithmic complexity of this? It it proportional to the number of observations? Complexity of the rules?
Interested? Find me!