Evolutionary Algorithms

1. Main Concept

An evolutionary algorithm is an optimisation algorithm for black box functions (the gradient is unknown). This is a reinforcement learning problem. The concept is:

  1. We maintain a population of individuals (different genotypes).
  2. We evaluate their fitness on the black box function (phenotype).
  3. The fittest individuals are selected to reproduce and create the next generation.

There are three main operators:

TypeGenotypeCrossoverMutation
Genetic AlgorithmBinary stringSwap portions of stringRandom bit-flip
Genetic ProgrammingProgram treeSwap subtreesRandom symbol changes in tree
Evolutionarny StrategiesString of realsNot UsedDraw from Gaussian distribution

They are easy to parallelize but slower than gradient descent when gradient is known and the problem is simple.

2. Genetic Algorithms

To define a genetic algorithm, we need to define:

2.1 Biased Roulette Wheel Selection

The probability of selecting individual is where is the fitness of individual . We pick a random number and select the individual where the cumulative probability exceeds : .

2.2 Tournament Selection

  1. Randomly draw two individuals from the population.
  2. Select the one with higher fitness as a parent.
  3. Repeat until enough parents are selected.

This method is less susceptible to premature convergence, and easier to parallelise.

2.3 Elitism

Elitism ensures the best individuals are carried over to the next generation without alteration. This prevents losing the best solutions due to random chance. This is usually fixed to of the population.

3. Evolutionary Strategies

Here, the genotype is a list of reals, parent selection is uniform and mutation is generated from a gaussian.

3.1 (μ + λ) ES

  1. Randomly generate individuals.
  2. Evaluate population.
  3. Select best individuals as parents.
  4. Generate offsprings where and is a randomly selected parent.
  5. The new population is .
  6. Repeat from (2).

Usually . The main challenge of μ + λ is choosing .

  • A large sigma would quickly converge but be hard to refine.
  • A small sigma would refine well, but take longer to converge and get stuck in local optima.

We could update sigma over time by adding to the genotype:

Where is the learning rate. Heuristically, where .

A novelty search algorithm uses an archive to store previously seen behaviours. The novelty of an individual is measured by its average distance to its nearest neighbours in the archive: .

For the distance metric, we define a space of behavioral descriptors, which capture important aspects of the phenotype.

For example, in a robot maze navigation task, the behavioral descriptor could be the final position of the robot. This is not directly related to the fitness (e.g. reaching the goal), but captures the essence of the behavior.

Instead of optimising the quality of the solution, NS optimises the novelty. We do this by using novelty score instead of fitness for evaluation.

5. Quality-Diversity Optimisation

We want both diverse and high quality solutions. QD algorithms follow these steps:

  1. A stochasitic selection is taken from the collection of solutions. Usually uniform, but can be biased towards quality/novelty.
  2. Offspring are generated via mutation/crossover.
  3. Each offspring is evaluated for quality (fitness) and behavioral descriptor.
  4. The offspring is added to the collection if it is either:
    • More novel than existing solutions (based on a novelty threshold).
    • Higher quality than the existing solution in the same behavioral niche.

5.1 MAP-Elites

Multidimensional Archive of Phenotypic Elites is a simple QD algorithm to discretise the behavioral descriptor space in a grid and try to fill it with the best solutions. Each new solution fills a cell corresponding to the behavioral descriptor. If the cell is already occupied, it replaces the existing solution if it has higher fitness.

It takes as a hyperparameter the size of the grid. Its easy to implement but density is not necessarily uniform.

5.2 Quantifying Performance

We can find the diversity (archive size) and performance (max or mean fitness value), as well as the convergence speed of these two points. We can summarize this in a QD-score (the sum of the fitness of all solutions in the archive).

The tradeoff between these can be represented in a Pareto front.

Back to Home