# Symbolic Regression

This is an open and independent investigation around the following question (see also this):

Can propositional theories be evolved?

Can evolution, through natural selection, select a theory that fits a set of un-anticipated but observed common sense (in the form of propositional facts) and make predictions that fit unobserved events?

For example, suppose we lived in a world with the following propositions:

`PersonInFrontOfCar`

`YellowLight`

`RedLight`

`Policeman`

`Policecar`

`Slippery`

`Snow`

`Dry`

`Brake`

And a large supply of observable situations of how these appear together, like:

`YellowLight.`

`~RedLight.`

`~Snow.`

`Dry.`

`Policecar.`

`~PersonInFrontOfCar.`

`Brake.`

Can a genetic algorithm evolve and select the following un-announced patterns to make predictions of future events (exclusively from the observations)?

`PersonInFrontOfCar => Brake.`

`(YellowLight && Policeman && ~Slippery) => Brake.`

`Policecar => Policeman.`

`Snow => Slippery.`

`Slippery => ~Dry.`

`RedLight => Brake.`

`Winter => Snow.`

# Genetic Programming

Suppose our individuals are propositional theories, in the form of formulas, which start at random configurations. For example, each individual looks more or less like this:

- Individual 0:
`Winter => Dry. Policeman => Brake.`

- Individual 1:
`Winter => ~Dry. Dry => Snow.`

- Individual 2:
`YellowLight => ~Dry && Snow.`

- …
- Individual n:
`RedLight => YelloLight.`

Each individual is evaluated by the degree they support the evidence: how many factual observations fit into the theory. For each situation, each theory is tested to see if it explains it.

For example, suppose we in situation `0`

we observe `Winter. ~Dry.`

. The theory of `Individual 0`

directly conflicts with this evidence, making it less useful evolutionarily speaking. The theory `Individual 1`

, on the other hand, supports the evidence, making it more fit.

If natural selection occurred through a number of generations, would we arrive at?

`PersonInFrontOfCar => Brake.`

`(YellowLight && Policeman && ~Slippery) => Brake.`

`Policecar => Policeman.`

`Snow => Slippery.`

`Slippery => ~Dry.`

`RedLight => Brake.`

`Winter => Snow.`

# Marginal Costs

Arriving at that alone isn’t sufficient: the algorithm has to work for any input without any modification.

For example, if we changed the observations to a completely different scenario, can a foundational genetic algorithm, with zero marginal costs (i.e. without any recompilation cycle), derive theories that fit any propositional evidence?

For example, given:

`cat_fur || dog_fur.`

`~thompson_allergy.`

`macavity_criminal.`

Can the algorithm derive the following theory without any recompilation cycle from the past theory?

`dog_fur => thompson_allergy.`

`cat_fur => macavity_criminal.`

# Related Work

- Symbolic Regression
- Symbolic Regression - An Overview
- Eureqa
- Automated Mathematician
- A large benchmark study of recent symbolic regression methods

# Open Questions

- Would it work for first order logic too?
- Would it be able to derive taxonomies?
- Would it be able to deal with correlations?
- Would it derive rules that can’t be understood because they are missing terms?
- Would the search space be so big that it would need guidance on where to look at?
- What are the algorithmic complexity of this? It it proportional to the number of observations? Complexity of the rules?

Interested? Find me!