View on GitHub

GECCO-2019-tag-accessed-memory

Source code and supplemental material for our 2019 GECCO submission, Tag-accessed memory for GP

Genetic Programming System

Here, we provide a more detailed description of the linear GP system we used in this work. This document also reports configuration details used in this work. Our exact configuration files along with experiment source code can be found in our online GitHub repository: https://github.com/amlalejini/GECCO-2019-tag-accessed-memory.

GP Representation

Programs

Programs are linear sequences of instructions, and each instruction has three arguments that may modify its behavior. Our instruction set supports basic computations (e.g., addition, subtraction, multiplication, etc.) and allows programs to control the flow of execution (e.g., conditional branching, looping, etc.).

Virtual CPU

Programs are executed in the context of a simple virtual CPU, which manages the flow of execution (e.g., looping, current instruction, etc.) and gives programs access to 16 memory registers used for storing data and for performing computations.

Tag-accessed Memory

Many traditional GP systems that give genetic programs access to memory (e.g., indexable memory registers) use rigid naming schemes where memory is numerically indexed, and mutation operators must guarantee the validity of memory-referencing instructions. Tag-accessed memory allows programs to use tag-based referencing to index into memory registers. Tags are evolvable labels that give genetic programs a flexible mechanism for specification. Tags allow for inexact referencing; a referring tag references the closest matching referent. To facilitate inexact referencing, the similarity (or dissimilarity) between any two tags must be quantifiable; thus, a referring tag can always reference the closest matching referent tag. This ensures that all possible tags are valid references. When using a tag-accessed memory model, each of the 16 memory registers in the virtual CPU are statically tagged with length-16 bit strings. Tags used for memory registers were generated using the Hadamard matrix and were as follows:

Register 0: 1111111111111111
Register 1: 0101010101010101
Register 2: 0011001100110011
Register 3: 1001100110011001
Register 4: 0000111100001111
Register 5: 1010010110100101
Register 6: 1100001111000011
Register 7: 0110100101101001
Register 8: 0000000011111111
Register 9: 1010101001010101
Register 10: 1100110000110011
Register 11: 0110011010011001
Register 12: 1111000000001111
Register 13: 0101101010100101
Register 14: 0011110011000011
Register 15: 1001011001101001

Each program instruction has three tag arguments (i.e., each instruction argument is a length-16 bit string). Tag-based instruction arguments reference the memory position with the closest matching tag; as such, argument tags need not exactly match any of the tags with memory positions.

Direct-indexed Memory

Direct-indexed memory is the traditional form of memory access in linear GP. Each program instruction has three numeric arguments (0 through 15) that are used to directly specify memory registers.

Below is a cartoon contrasting tag-accessed memory with direct accessed memory.

../media/memory-access-cartoon.png

Instruction Set

We used identical instruction sets in both memory model conditions (tag-accessed and direct-indexed). However, in conditions using the tag-accessed memory, instructions used tag arguments that used tag-based referencing to index into the virtual CPU’s memory registers, and in conditions using the direct-indexed memory, instructions used numeric arguments that directly indexed into the virtual CPU’s memory registers.

Below, we describe our default instruction set (used across all problems), and all problem-specific instructions.

Default Instructions

Note:

EOP: End of program
Reg: Register
Reg[0] indicates the value at the register specified by an instruction’s first argument (either tag-based or numeric), Reg[1] indicates the value at the register specified by an instruction’s second argument, and Reg[2] indicates the value at the register specified by the instruction’s third argument.
Reg[0], Reg[1], etc: Register 0, Register 1, etc.

Instructions that would produce undefined behavior (e.g., division by zero) are treated as no operations.

Instruction	Arguments Used	Description
`Add`	3	Reg[2] = Reg[0] + Reg[1]
`Sub`	3	Reg[2] = Reg[0] - Reg[1]
`Mult`	3	Reg[2] = Reg[0] * Reg[1]
`Div`	3	Reg[2] = Reg[0] / Reg[1]
`Mod`	3	Reg[2] = Reg[0] % Reg[1]
`TestNumEqu`	3	Reg[2] = Reg[0] == Reg[1]
`TestNumNEqu`	3	Reg[2] = Reg[0] != Reg[1]
`TestNumLess`	3	Reg[2] = Reg[0] < Reg[1]
`TestNumLessTEqu`	3	Reg[2] = Reg[0] <= Reg[1]
`TestNumGreater`	3	Reg[2] = Reg[0] > Reg[1]
`TestNumGreaterTEqu`	3	Reg[2] = Reg[0] >= Reg[1]
`Floor`	1	Floor(Reg[0])
`Not`	1	Reg[0] = !Reg[0]
`Inc`	1	Reg[0] = Reg[0] + 1
`Dec`	1	Reg[0] = Reg[0] - 1
`CopyMem`	2	Reg[0] = Reg[1]
`SwapMem`	2	Swap(Reg[0], Reg[1])
`If`	1	If Reg[0] != 0, proceed. Otherwise skip to the next `Close` or EOP.
`IfNot`	1	If Reg[0] == 0, proceed. Otherwise skip to the next `Close` or EOP.
`While`	1	While Reg[0] != 0, loop. Otherwise skip to next `Close` or EOP.
`Countdown`	1	Same as `While`, but decrement Reg[0] if Reg[0] != 0.
`Close`	0	Indicate the end of a control block of code (e.g., loop, if).
`Break`	0	Break out of current control flow (e.g., loop).
`Return`	0	Return from program execution (exit program execution).
`Set-0`	1	Reg[0] = 0
`Set-1`	1	Reg[0] = 1
`Set-2`	1	Reg[0] = 2
`Set-3`	1	Reg[0] = 3
`Set-4`	1	Reg[0] = 4
`Set-5`	1	Reg[0] = 5
`Set-6`	1	Reg[0] = 6
`Set-7`	1	Reg[0] = 7
`Set-8`	1	Reg[0] = 8
`Set-9`	1	Reg[0] = 9
`Set-10`	1	Reg[0] = 10
`Set-11`	1	Reg[0] = 11
`Set-12`	1	Reg[0] = 12
`Set-13`	1	Reg[0] = 13
`Set-14`	1	Reg[0] = 14
`Set-15`	1	Reg[0] = 15
`Set-16`	1	Reg[0] = 16

Problem-specific Instructions

Problem - Number IO

Instruction	# Arguments Used	Description
`LoadInt`	1	Reg[0] = integer input
`LoadDouble`	1	Reg[0] = double input
`SubmitNum`	1	Output Reg[0]

Problem - For Loop Index

Instruction	# Arguments Used	Description
`LoadStart`	1	Reg[0] = start input
`LoadEnd`	1	Reg[0] = end input
`LoadStep`	1	Reg[0] = step input
`SubmitNum`	1	Output Reg[0]

Problem - Grade

Instruction	# Arguments Used	Description
`SubmitA`	0	Classify grade as ‘A’
`SubmitB`	0	Classify grade as ‘B’
`SubmitC`	0	Classify grade as ‘C’
`SubmitD`	0	Classify grade as ‘D’
`SubmitF`	0	Classify grade as ‘F’
`LoadThreshA`	1	Reg[0] = input threshold for ‘A’
`LoadThreshB`	1	Reg[0] = input threshold for ‘B’
`LoadThreshC`	1	Reg[0] = input threshold for ‘C’
`LoadThreshD`	1	Reg[0] = input threshold for ‘D’
`LoadGrade`	1	Reg[0] = input grade to classify

Problem - Median

Instruction	# Arguments Used	Description
`LoadNum1`	1	Reg[0] = input 1
`LoadNum2`	1	Reg[0] = input 2
`LoadNum3`	1	Reg[0] = input 3
`SubmitNum`	1	Output Reg[0]

Problem - Smallest

Instruction	# Arguments Used	Description
`LoadNum1`	1	Reg[0] = input 1
`LoadNum2`	1	Reg[0] = input 2
`LoadNum3`	1	Reg[0] = input 3
`LoadNum4`	1	Reg[0] = input 4
`SubmitNum`	1	Output Reg[0]

Experiment Configuration Details

Here, we discuss only the configuration details for the experiments reported in our extended abstract. For details on preliminary experiments, see our data analysis supplemental material.

We used the lexicase parent selection algorithm to solve five problems from Helmuth and Spector’s general program synthesis benchmark suite (Helmuth and Spector, 2015): number IO, smallest, median, grade, and for loop index. We used identical training and testing sets as in (Helmuth and Spector, 2015). Refer to (Helmuth and Spector, 2015) for more details about these problems.

For each problem, we evolved 50 replicate populations of 500 individuals at a range of mutation rate conditions. In all but the number IO problem, we evolved programs for 500 generations. Because number IO is substantially easier than each of the other problems (Helmuth and Spector, 2015), we only evolved these programs for 100 generations.

We propagated programs asexually and applied mutations to offspring. We applied single-instruction insertions, deletions, and substitutions at a per-instruction rate of 0.005 each and multi-instruction sequence duplications and deletions at a per-program rate of 0.05. Because the relative performance of these two techniques depends heavily on the chosen mutation operations and rates, we used a wide range of argument (numeric and tag) mutation rates for each technique. We mutated tag-based arguments at the following per-bit rates: 0.0001, 0.001, 0.0025, 0.005, 0.0075, 0.01, 0.025, 0.05, 0.075, and 0.1. We mutated numeric arguments at the following per-argument rates: 0.0001, 0.001, 0.0025, 0.005, 0.0075, 0.01, 0.025, 0.05, 0.075, and 0.1.

Each problem is defined by a set of test cases in which programs are given specified input data and are scored on how close their output is to the correct output; depending on the problem, we measured scores either on a gradient or on a binary pass-fail basis. During an evaluation, we limited the total number of instructions a program could execute; this limit varied by problem. Programs were not required to stop on their own as long as they output their results before reaching their execution limit.

Below we give the configuration details specific to each problem.

Configuration Details - Number IO

Maximum allowed program length: 32
Maximum number of instruction-execution steps: 32
Test scores were measured on a pass/fail basis.
Generations: 100

Configuration Details - For loop index

Maximum allowed program length: 128
Maximum number of instruction-execution steps: 256
Test scores were measured on a gradient, using the Levenshtein distance between the program’s output and the correct output sequence.
Generations: 500

Configuration Details - Grade

Maximum allowed program length: 128
Maximum number of instruction-execution steps: 128
Test scores were measured on a pass/fail basis.
Generations: 500

Configuration Details - Median

Maximum allowed program length: 64
Maximum number of instruction-execution steps: 64
Test scores were measured on a pass/fail basis.
Generations: 500

Configuration Details - Smallest

Maximum allowed program length: 64
Maximum number of instruction-execution steps: 64
Test scores were measured on a pass/fail basis.
Generations: 500

References

Helmuth, T., & Spector, L. (2015). General Program Synthesis Benchmark Suite. In Proceedings of the 2015 on Genetic and Evolutionary Computation Conference - GECCO ’15 (pp. 1039–1046). New York, New York, USA: ACM Press. https://doi.org/10.1145/2739480.2754769