Symbolic Regression

This is an open and independent investigation around the following question (see also this):

Can propositional theories be evolved?

Can evolution, through natural selection, select a theory that fits a set of un-anticipated but observed common sense (in the form of propositional facts) and make predictions that fit unobserved events?

For example, suppose we lived in a world with the following propositions:

• `PersonInFrontOfCar`
• `YellowLight`
• `RedLight`
• `Policeman`
• `Policecar`
• `Slippery`
• `Snow`
• `Dry`
• `Brake`

And a large supply of observable situations of how these appear together, like:

• `YellowLight.`
• `~RedLight.`
• `~Snow.`
• `Dry.`
• `Policecar.`
• `~PersonInFrontOfCar.`
• `Brake.`

Can a genetic algorithm evolve and select the following un-announced patterns to make predictions of future events (exclusively from the observations)?

• `PersonInFrontOfCar => Brake.`
• `(YellowLight && Policeman && ~Slippery) => Brake.`
• `Policecar => Policeman.`
• `Snow => Slippery.`
• `Slippery => ~Dry.`
• `RedLight => Brake.`
• `Winter => Snow.`

Genetic Programming

Suppose our individuals are propositional theories, in the form of formulas, which start at random configurations. For example, each individual looks more or less like this:

• Individual 0: `Winter => Dry. Policeman => Brake.`
• Individual 1: `Winter => ~Dry. Dry => Snow.`
• Individual 2: `YellowLight => ~Dry && Snow.`
• ...
• Individual n: `RedLight => YelloLight.`

Each individual is evaluated by the degree they support the evidence: how many factual observations fit into the theory. For each situation, each theory is tested to see if it explains it.

For example, suppose we in situation `0` we observe `Winter. ~Dry.`. The theory of `Individual 0` directly conflicts with this evidence, making it less useful evolutionarily speaking. The theory `Individual 1`, on the other hand, supports the evidence, making it more fit.

If natural selection occurred through a number of generations, would we arrive at?

• `PersonInFrontOfCar => Brake.`
• `(YellowLight && Policeman && ~Slippery) => Brake.`
• `Policecar => Policeman.`
• `Snow => Slippery.`
• `Slippery => ~Dry.`
• `RedLight => Brake.`
• `Winter => Snow.`

Marginal Costs

Arriving at that alone isn't sufficient: the algorithm has to work for any input without any modification.

For example, if we changed the observations to a completely different scenario, can a foundational genetic algorithm, with zero marginal costs (i.e. without any recompilation cycle), derive theories that fit any propositional evidence?

For example, given:

• `cat_fur || dog_fur.`
• `~thompson_allergy.`
• `macavity_criminal.`

Can the algorithm derive the following theory without any recompilation cycle from the past theory?

• `dog_fur => thompson_allergy.`
• `cat_fur => macavity_criminal.`

Open Questions

• Would it work for first order logic too?
• Would it be able to derive taxonomies?
• Would it be able to deal with correlations?
• Would it derive rules that can't be understood because they are missing terms?
• Would the search space be so big that it would need guidance on where to look at?
• What are the algorithmic complexity of this? It it proportional to the number of observations? Complexity of the rules?

Interested? Find me!