Genetic Programming System
Here, we provide a more detailed description of the linear GP system we used in this work. This document also reports configuration details used in this work. Our exact configuration files along with experiment source code can be found in our online GitHub repository.
Contents
GP Representation
In this work, we used a tag-based linear GP representation. Our representation’s extensive use of tag-based referencing makes it unique relative to traditional linear GP representations.
Tag-based Referencing
Tags are evolvable labels that can be mutated, and the similarity (or dissimilarity) between any two tags is quantifiable. Spector et al. initially demonstrated tags in GP as a mechanism for labeling and referring to program modules (Spector et al., 2011a; Spector et al., 2011b; Spector et al., 2012b). The use of tags in GP was further expanded in SignalGP (Lalejini and Ofria, 2018) where tag-based referencing was used to facilitate signal-driven program execution. Tags allow for inexact referencing; a referring tag can always link to the program module labeled with the most similar tag, ensuring that all possible tags are valid references. Here, tags are represented as length-16 bit strings, and the Hamming distance between two tags specifies their dissimilarity. As in Spector et al.’s previous work (Spector et al., 2011a; Spector et al., 2011b; Spector et al., 2012b) and in SignalGP (Lalejini and Ofria, 2018), we use tags to label and reference program modules; however, we take the use of tags one step further, using them to specify locations in memory.
Programs
Programs in our representation are linear sequences of instructions, and each instruction has up to three tag-based arguments that may modify its behavior.
Our instruction set supports the declaration of modules, which are named (tagged) segments of code that can be triggered (i.e., called by other instructions) during execution.
Modules are declared and labeled by Module-Def
instructions whose tag-based arguments are used to tag the declared module.
Virtual CPU
Programs are executed in the context of a virtual CPU, which maintains a call stack that stores state information about the program’s active module calls (e.g., local memory contents and a pointer to the current position in the program). The virtual CPU also gives programs access to four types of memory buffers, each consisting of 16 statically tagged positions: input memory, output memory, working memory, and global memory. Input, output, and working memory are local to a particular module call, while changes to global memory are shared across module calls. Input memory is used to pass information to a triggered module from the calling state, and output memory is used to return information from triggered modules back to the calling state. Working memory is used for performing local operations (e.g., addition, multiplication, etc.).
Tag-accessed Memory
Many traditional GP systems that give genetic programs access to memory (e.g., indexable memory registers) use rigid naming schemes where memory is numerically indexed, and mutation operators must guarantee the validity of memory-referencing instructions. Tag-accessed memory allows programs to use tag-based referencing to index into memory registers. Tags are evolvable labels that give genetic programs a flexible mechanism for specification. Tags allow for inexact referencing; a referring tag references the closest matching referent. To facilitate inexact referencing, the similarity (or dissimilarity) between any two tags must be quantifiable; thus, a referring tag can always reference the closest matching referent tag. This ensures that all possible tags are valid references. When using a tag-accessed memory model, each of the 16 memory registers in any of the four types of memory buffers (input, output, working, and global) are statically tagged with length-16 bit strings. Tags used for memory registers were generated using the Hadamard matrix and were as follows:
- Register 0:
1111111111111111
- Register 1:
0101010101010101
- Register 2:
0011001100110011
- Register 3:
1001100110011001
- Register 4:
0000111100001111
- Register 5:
1010010110100101
- Register 6:
1100001111000011
- Register 7:
0110100101101001
- Register 8:
0000000011111111
- Register 9:
1010101001010101
- Register 10:
1100110000110011
- Register 11:
0110011010011001
- Register 12:
1111000000001111
- Register 13:
0101101010100101
- Register 14:
0011110011000011
- Register 15:
1001011001101001
Note that register 0 has the same tag in each of the four types of memory buffers (input, output, working, and global). The same is true for all other registers.
Each program instruction has three tag arguments (i.e., each instruction argument is a length-16 bit string). Tag-based instruction arguments reference the memory position with the closest matching tag; as such, argument tags need not exactly match any of the tags with memory positions.
Because many problems in the general program synthesis benchmark suite (Helmuth and Spector, 2015) require the use of a variety of data types,
our virtual CPU allows all positions in memory to contain any one of three data types: numbers, strings, or lists (which can contain a mix of numbers and strings).
Instruction arguments specify positions in memory using tag-based referencing, searching for and retrieving the contents of the closest matching memory position of the required type.
For example, the Add
instruction’s first two arguments specify which two numbers in memory to add together, and the third argument specifies where to store the result.
If there are no numbers in memory, the instruction does nothing.
The same is true for instruction arguments that specify strings or lists.
In this work, we mutated all tag arguments at a per-bit rate (of 0.001). The tags on memory registers never changed.
Module Execution
When a module is triggered by a Call
instruction (using the instruction’s first argument tag to reference the module with the closest matching tag-based label), a new call state is created and pushed onto the virtual CPU’s call stack.
The working memory of the calling state is copied into the input memory of the new call state (i.e., the arguments to the called module are the full contents of the calling state’s working memory); the working and output memory of the new call state are initially empty.
Module calls may return by either executing a Return
instruction or by reaching the end of the module’s instruction sequence.
When a module call returns, its call state is popped from the call stack, and anything stored in the output memory of the returning call state is copied to the working memory of the caller state (otherwise leaving the state of the caller’s working memory unchanged).
Alternatively, modules can be triggered by executing a Routine
instruction, which behaves in the same way as the Call
instruction, except the called module shares its input, working, and output memory with the caller.
Instruction Set
Below, we describe our default instruction set (used across all problems), and all problem-specific instructions.
Default Instructions
Note:
- EOP: End of program
- wReg: Working memory register
- iReg: Input memory register
- oReg: Output memory register
- gReg: Global memory register
- wReg[0]/iReg[0]/oReg[0]/gReg[0] indicates the value at the register specified by an instruction’s first argument (either tag-based or numeric), Reg[1] indicates the value at the register specified by an instruction’s second argument, and Reg[2] indicates the value at the register specified by the instruction’s third argument.
- wReg[0], wReg[1], etc: working register 0, working register 1, etc.
Instructions that would produce undefined behavior (e.g., division by zero) are treated as no operations.
Instruction | Arguments Used | Description |
---|---|---|
Add |
3 | wReg[2] = wReg[0] + wReg[1] |
Sub |
3 | wReg[2] = wReg[0] - wReg[1] |
Mult |
3 | wReg[2] = wReg[0] * wReg[1] |
Div |
3 | wReg[2] = wReg[0] / wReg[1] |
Mod |
3 | wReg[2] = wReg[0] % wReg[1] |
TestNumEqu |
3 | wReg[2] = wReg[0] == wReg[1] |
TestNumNEqu |
3 | wReg[2] = wReg[0] != wReg[1] |
TestNumLess |
3 | wReg[2] = wReg[0] < wReg[1] |
TestNumLessTEqu |
3 | wReg[2] = wReg[0] <= wReg[1] |
TestNumGreater |
3 | wReg[2] = wReg[0] > wReg[1] |
TestNumGreaterTEqu |
3 | wReg[2] = wReg[0] >= wReg[1] |
Floor |
1 | Floor(wReg[0]) |
Not |
1 | wReg[0] = !wReg[0] |
Inc |
1 | wReg[0] = wReg[0] + 1 |
Dec |
1 | wReg[0] = wReg[0] - 1 |
CopyMem |
2 | wReg[0] = wReg[1] |
SwapMem |
2 | Swap(wReg[0], wReg[1]) |
Input |
||
Output |
||
CommitGlobal |
||
PullGlobal |
||
TestMemEqu |
||
TestmemNEqu |
||
If |
1 | If wReg[0] != 0, proceed. Otherwise skip to the next Close or EOP. |
IfNot |
1 | If wReg[0] == 0, proceed. Otherwise skip to the next Close or EOP. |
While |
1 | While wReg[0] != 0, loop. Otherwise skip to next Close or EOP. |
Countdown |
1 | Same as While , but decrement wReg[0] if wReg[0] != 0. |
Foreach |
||
Close |
0 | Indicate the end of a control block of code (e.g., loop, if). |
Break |
0 | Break out of current control flow (e.g., loop). |
Call |
||
Routine |
||
Return |
0 | Return from program execution (exit program execution). |
ModuleDef |
||
MakeVector |
||
VecGet |
||
VecSet |
||
VecLen |
||
VecAppend |
||
VecPop |
||
VecRemove |
||
VecReplaceAll |
||
VecIndexOf |
||
VecOccurencesOf |
||
VecReverse |
||
VecSwapIfLess |
||
VecGetFront |
||
VecGetBack |
||
IsNum |
||
IsVec |
||
Set-0 |
1 | wReg[0] = 0 |
Set-1 |
1 | wReg[0] = 1 |
Set-2 |
1 | wReg[0] = 2 |
Set-3 |
1 | wReg[0] = 3 |
Set-4 |
1 | wReg[0] = 4 |
Set-5 |
1 | wReg[0] = 5 |
Set-6 |
1 | wReg[0] = 6 |
Set-7 |
1 | wReg[0] = 7 |
Set-8 |
1 | wReg[0] = 8 |
Set-9 |
1 | wReg[0] = 9 |
Set-10 |
1 | wReg[0] = 10 |
Set-11 |
1 | wReg[0] = 11 |
Set-12 |
1 | wReg[0] = 12 |
Set-13 |
1 | wReg[0] = 13 |
Set-14 |
1 | wReg[0] = 14 |
Set-15 |
1 | wReg[0] = 15 |
Set-16 |
1 | wReg[0] = 16 |
String-specific Instructions
These instructions were included only in problems where programs needed to manipulate strings.
Instruction | Arguments Used | Description |
---|---|---|
StrLength |
||
StrConcat |
||
IsStr |
Problem-specific Instructions
Problem - Number IO
Instruction | # Arguments Used | Description |
---|---|---|
LoadInt |
1 | wReg[0] = integer input |
LoadDouble |
1 | wReg[0] = double input |
SubmitNum |
1 | Output wReg[0] |
Problem - For Loop Index
Instruction | # Arguments Used | Description |
---|---|---|
LoadStart |
1 | wReg[0] = start input |
LoadEnd |
1 | wReg[0] = end input |
LoadStep |
1 | wReg[0] = step input |
SubmitNum |
1 | Output wReg[0] |
Problem - Compare String Lengths
Instruction | # Arguments Used | Description |
---|---|---|
LoadStr1 |
1 | wReg[0] = string-1 |
LoadStr2 |
1 | wReg[0] = string-2 |
LoadStr3 |
1 | wReg[0] = string-3 |
SubmitTrue |
0 | Output true |
SubmitFalse |
0 | Output false |
SubmitVal |
1 | Output logical value at wReg[0] |
Problem - Grade
Instruction | # Arguments Used | Description |
---|---|---|
SubmitA |
0 | Classify grade as ‘A’ |
SubmitB |
0 | Classify grade as ‘B’ |
SubmitC |
0 | Classify grade as ‘C’ |
SubmitD |
0 | Classify grade as ‘D’ |
SubmitF |
0 | Classify grade as ‘F’ |
LoadThreshA |
1 | wReg[0] = input threshold for ‘A’ |
LoadThreshB |
1 | wReg[0] = input threshold for ‘B’ |
LoadThreshC |
1 | wReg[0] = input threshold for ‘C’ |
LoadThreshD |
1 | wReg[0] = input threshold for ‘D’ |
LoadGrade |
1 | wReg[0] = input grade to classify |
Problem - Median
Instruction | # Arguments Used | Description |
---|---|---|
LoadNum1 |
1 | wReg[0] = input 1 |
LoadNum2 |
1 | wReg[0] = input 2 |
LoadNum3 |
1 | wReg[0] = input 3 |
SubmitNum |
1 | Output wReg[0] |
Problem - Smallest
Instruction | # Arguments Used | Description |
---|---|---|
LoadNum1 |
1 | wReg[0] = input 1 |
LoadNum2 |
1 | wReg[0] = input 2 |
LoadNum3 |
1 | wReg[0] = input 3 |
LoadNum4 |
1 | wReg[0] = input 4 |
SubmitNum |
1 | Output wReg[0] |
Evolution
Across all of our experiments, we evolved populations of 512 programs for varied numbers of generations depending on the experimental treatment. We propagated programs asexually and applied mutations to offspring. We applied single-instruction substitutions, insertions, and deletions at a per-instruction rate of 0.005 each and multi-instruction sequence duplications and deletions (slip mutations~\citep{Lalejini2017}) at a per-program rate of 0.05. We applied single-module duplications and deletions at a per-module rate of 0.05, and we mutated argument tags at a per-bit rate of 0.001. During mutation, we limited the maximum length of programs (which varied by problem).
References
Helmuth, T., & Spector, L. (2015). General Program Synthesis Benchmark Suite. In Proceedings of the 2015 on Genetic and Evolutionary Computation Conference - GECCO ’15 (pp. 1039–1046). New York, New York, USA: ACM Press. https://doi.org/10.1145/2739480.2754769
Lalejini, A., & Ofria, C. (2018). Evolving event-driven programs with SignalGP. In Proceedings of the Genetic and Evolutionary Computation Conference on - GECCO ’18 (pp. 1135–1142). New York, New York, USA: ACM Press. https://doi.org/10.1145/3205455.3205523
Spector, L., Martin, B., Harrington, K., & Helmuth, T. (2011a). Tag-based modules in genetic programming. In Proceedings of the 13th annual conference on Genetic and evolutionary computation - GECCO ’11 (p. 1419). New York, New York, USA: ACM Press. https://doi.org/10.1145/2001576.2001767
Spector, L., Harrington, K., Martin, B., & Helmuth, T. (2011b). What’s in an Evolved Name? The Evolution of Modularity via Tag-Based Reference. In R. Riolo, E. Vladislavleva, & J. H. Moore (Eds.), Genetic Programming Theory and Practice IX (pp. 1–16). New York, NY: Springer New York. https://doi.org/10.1007/978-1-4614-1770-5_1
Spector, L., Harrington, K., & Helmuth, T. (2012). Tag-based modularity in tree-based genetic programming. In Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference - GECCO ’12 (p. 815). New York, New York, USA: ACM Press. https://doi.org/10.1145/2330163.2330276