View on GitHub

GECCO-2019-cohort-lexicase

This repository contains materials for our cohort lexicase project.

Genetic Programming System

Here, we provide a more detailed description of the linear GP system we used in this work. This document also reports configuration details used in this work. Our exact configuration files along with experiment source code can be found in our online GitHub repository.

Contents

GP Representation

In this work, we used a tag-based linear GP representation. Our representation’s extensive use of tag-based referencing makes it unique relative to traditional linear GP representations.

Tag-based Referencing

Tags are evolvable labels that can be mutated, and the similarity (or dissimilarity) between any two tags is quantifiable. Spector et al. initially demonstrated tags in GP as a mechanism for labeling and referring to program modules (Spector et al., 2011a; Spector et al., 2011b; Spector et al., 2012b). The use of tags in GP was further expanded in SignalGP (Lalejini and Ofria, 2018) where tag-based referencing was used to facilitate signal-driven program execution. Tags allow for inexact referencing; a referring tag can always link to the program module labeled with the most similar tag, ensuring that all possible tags are valid references. Here, tags are represented as length-16 bit strings, and the Hamming distance between two tags specifies their dissimilarity. As in Spector et al.’s previous work (Spector et al., 2011a; Spector et al., 2011b; Spector et al., 2012b) and in SignalGP (Lalejini and Ofria, 2018), we use tags to label and reference program modules; however, we take the use of tags one step further, using them to specify locations in memory.

Programs

Programs in our representation are linear sequences of instructions, and each instruction has up to three tag-based arguments that may modify its behavior. Our instruction set supports the declaration of modules, which are named (tagged) segments of code that can be triggered (i.e., called by other instructions) during execution. Modules are declared and labeled by Module-Def instructions whose tag-based arguments are used to tag the declared module.

Virtual CPU

Programs are executed in the context of a virtual CPU, which maintains a call stack that stores state information about the program’s active module calls (e.g., local memory contents and a pointer to the current position in the program). The virtual CPU also gives programs access to four types of memory buffers, each consisting of 16 statically tagged positions: input memory, output memory, working memory, and global memory. Input, output, and working memory are local to a particular module call, while changes to global memory are shared across module calls. Input memory is used to pass information to a triggered module from the calling state, and output memory is used to return information from triggered modules back to the calling state. Working memory is used for performing local operations (e.g., addition, multiplication, etc.).

Tag-accessed Memory

Many traditional GP systems that give genetic programs access to memory (e.g., indexable memory registers) use rigid naming schemes where memory is numerically indexed, and mutation operators must guarantee the validity of memory-referencing instructions. Tag-accessed memory allows programs to use tag-based referencing to index into memory registers. Tags are evolvable labels that give genetic programs a flexible mechanism for specification. Tags allow for inexact referencing; a referring tag references the closest matching referent. To facilitate inexact referencing, the similarity (or dissimilarity) between any two tags must be quantifiable; thus, a referring tag can always reference the closest matching referent tag. This ensures that all possible tags are valid references. When using a tag-accessed memory model, each of the 16 memory registers in any of the four types of memory buffers (input, output, working, and global) are statically tagged with length-16 bit strings. Tags used for memory registers were generated using the Hadamard matrix and were as follows:

Note that register 0 has the same tag in each of the four types of memory buffers (input, output, working, and global). The same is true for all other registers.

Each program instruction has three tag arguments (i.e., each instruction argument is a length-16 bit string). Tag-based instruction arguments reference the memory position with the closest matching tag; as such, argument tags need not exactly match any of the tags with memory positions.

Because many problems in the general program synthesis benchmark suite (Helmuth and Spector, 2015) require the use of a variety of data types, our virtual CPU allows all positions in memory to contain any one of three data types: numbers, strings, or lists (which can contain a mix of numbers and strings). Instruction arguments specify positions in memory using tag-based referencing, searching for and retrieving the contents of the closest matching memory position of the required type. For example, the Add instruction’s first two arguments specify which two numbers in memory to add together, and the third argument specifies where to store the result. If there are no numbers in memory, the instruction does nothing. The same is true for instruction arguments that specify strings or lists.

In this work, we mutated all tag arguments at a per-bit rate (of 0.001). The tags on memory registers never changed.

Module Execution

When a module is triggered by a Call instruction (using the instruction’s first argument tag to reference the module with the closest matching tag-based label), a new call state is created and pushed onto the virtual CPU’s call stack. The working memory of the calling state is copied into the input memory of the new call state (i.e., the arguments to the called module are the full contents of the calling state’s working memory); the working and output memory of the new call state are initially empty. Module calls may return by either executing a Return instruction or by reaching the end of the module’s instruction sequence. When a module call returns, its call state is popped from the call stack, and anything stored in the output memory of the returning call state is copied to the working memory of the caller state (otherwise leaving the state of the caller’s working memory unchanged). Alternatively, modules can be triggered by executing a Routine instruction, which behaves in the same way as the Call instruction, except the called module shares its input, working, and output memory with the caller.

Instruction Set

Below, we describe our default instruction set (used across all problems), and all problem-specific instructions.

Default Instructions

Note:

Instructions that would produce undefined behavior (e.g., division by zero) are treated as no operations.

Instruction Arguments Used Description
Add 3 wReg[2] = wReg[0] + wReg[1]
Sub 3 wReg[2] = wReg[0] - wReg[1]
Mult 3 wReg[2] = wReg[0] * wReg[1]
Div 3 wReg[2] = wReg[0] / wReg[1]
Mod 3 wReg[2] = wReg[0] % wReg[1]
TestNumEqu 3 wReg[2] = wReg[0] == wReg[1]
TestNumNEqu 3 wReg[2] = wReg[0] != wReg[1]
TestNumLess 3 wReg[2] = wReg[0] < wReg[1]
TestNumLessTEqu 3 wReg[2] = wReg[0] <= wReg[1]
TestNumGreater 3 wReg[2] = wReg[0] > wReg[1]
TestNumGreaterTEqu 3 wReg[2] = wReg[0] >= wReg[1]
Floor 1 Floor(wReg[0])
Not 1 wReg[0] = !wReg[0]
Inc 1 wReg[0] = wReg[0] + 1
Dec 1 wReg[0] = wReg[0] - 1
CopyMem 2 wReg[0] = wReg[1]
SwapMem 2 Swap(wReg[0], wReg[1])
Input    
Output    
CommitGlobal    
PullGlobal    
TestMemEqu    
TestmemNEqu    
If 1 If wReg[0] != 0, proceed. Otherwise skip to the next Close or EOP.
IfNot 1 If wReg[0] == 0, proceed. Otherwise skip to the next Close or EOP.
While 1 While wReg[0] != 0, loop. Otherwise skip to next Close or EOP.
Countdown 1 Same as While, but decrement wReg[0] if wReg[0] != 0.
Foreach    
Close 0 Indicate the end of a control block of code (e.g., loop, if).
Break 0 Break out of current control flow (e.g., loop).
Call    
Routine    
Return 0 Return from program execution (exit program execution).
ModuleDef    
MakeVector    
VecGet    
VecSet    
VecLen    
VecAppend    
VecPop    
VecRemove    
VecReplaceAll    
VecIndexOf    
VecOccurencesOf    
VecReverse    
VecSwapIfLess    
VecGetFront    
VecGetBack    
IsNum    
IsVec    
Set-0 1 wReg[0] = 0
Set-1 1 wReg[0] = 1
Set-2 1 wReg[0] = 2
Set-3 1 wReg[0] = 3
Set-4 1 wReg[0] = 4
Set-5 1 wReg[0] = 5
Set-6 1 wReg[0] = 6
Set-7 1 wReg[0] = 7
Set-8 1 wReg[0] = 8
Set-9 1 wReg[0] = 9
Set-10 1 wReg[0] = 10
Set-11 1 wReg[0] = 11
Set-12 1 wReg[0] = 12
Set-13 1 wReg[0] = 13
Set-14 1 wReg[0] = 14
Set-15 1 wReg[0] = 15
Set-16 1 wReg[0] = 16

String-specific Instructions

These instructions were included only in problems where programs needed to manipulate strings.

Instruction Arguments Used Description
StrLength    
StrConcat    
IsStr    

Problem-specific Instructions

Problem - Number IO

Instruction # Arguments Used Description
LoadInt 1 wReg[0] = integer input
LoadDouble 1 wReg[0] = double input
SubmitNum 1 Output wReg[0]

Problem - For Loop Index

Instruction # Arguments Used Description
LoadStart 1 wReg[0] = start input
LoadEnd 1 wReg[0] = end input
LoadStep 1 wReg[0] = step input
SubmitNum 1 Output wReg[0]

Problem - Compare String Lengths

Instruction # Arguments Used Description
LoadStr1 1 wReg[0] = string-1
LoadStr2 1 wReg[0] = string-2
LoadStr3 1 wReg[0] = string-3
SubmitTrue 0 Output true
SubmitFalse 0 Output false
SubmitVal 1 Output logical value at wReg[0]

Problem - Grade

Instruction # Arguments Used Description
SubmitA 0 Classify grade as ‘A’
SubmitB 0 Classify grade as ‘B’
SubmitC 0 Classify grade as ‘C’
SubmitD 0 Classify grade as ‘D’
SubmitF 0 Classify grade as ‘F’
LoadThreshA 1 wReg[0] = input threshold for ‘A’
LoadThreshB 1 wReg[0] = input threshold for ‘B’
LoadThreshC 1 wReg[0] = input threshold for ‘C’
LoadThreshD 1 wReg[0] = input threshold for ‘D’
LoadGrade 1 wReg[0] = input grade to classify

Problem - Median

Instruction # Arguments Used Description
LoadNum1 1 wReg[0] = input 1
LoadNum2 1 wReg[0] = input 2
LoadNum3 1 wReg[0] = input 3
SubmitNum 1 Output wReg[0]

Problem - Smallest

Instruction # Arguments Used Description
LoadNum1 1 wReg[0] = input 1
LoadNum2 1 wReg[0] = input 2
LoadNum3 1 wReg[0] = input 3
LoadNum4 1 wReg[0] = input 4
SubmitNum 1 Output wReg[0]

Evolution

Across all of our experiments, we evolved populations of 512 programs for varied numbers of generations depending on the experimental treatment. We propagated programs asexually and applied mutations to offspring. We applied single-instruction substitutions, insertions, and deletions at a per-instruction rate of 0.005 each and multi-instruction sequence duplications and deletions (slip mutations~\citep{Lalejini2017}) at a per-program rate of 0.05. We applied single-module duplications and deletions at a per-module rate of 0.05, and we mutated argument tags at a per-bit rate of 0.001. During mutation, we limited the maximum length of programs (which varied by problem).

References

Helmuth, T., & Spector, L. (2015). General Program Synthesis Benchmark Suite. In Proceedings of the 2015 on Genetic and Evolutionary Computation Conference - GECCO ’15 (pp. 1039–1046). New York, New York, USA: ACM Press. https://doi.org/10.1145/2739480.2754769

Lalejini, A., & Ofria, C. (2018). Evolving event-driven programs with SignalGP. In Proceedings of the Genetic and Evolutionary Computation Conference on - GECCO ’18 (pp. 1135–1142). New York, New York, USA: ACM Press. https://doi.org/10.1145/3205455.3205523

Spector, L., Martin, B., Harrington, K., & Helmuth, T. (2011a). Tag-based modules in genetic programming. In Proceedings of the 13th annual conference on Genetic and evolutionary computation - GECCO ’11 (p. 1419). New York, New York, USA: ACM Press. https://doi.org/10.1145/2001576.2001767

Spector, L., Harrington, K., Martin, B., & Helmuth, T. (2011b). What’s in an Evolved Name? The Evolution of Modularity via Tag-Based Reference. In R. Riolo, E. Vladislavleva, & J. H. Moore (Eds.), Genetic Programming Theory and Practice IX (pp. 1–16). New York, NY: Springer New York. https://doi.org/10.1007/978-1-4614-1770-5_1

Spector, L., Harrington, K., & Helmuth, T. (2012). Tag-based modularity in tree-based genetic programming. In Proceedings of the fourteenth international conference on Genetic and evolutionary computation conference - GECCO ’12 (p. 815). New York, New York, USA: ACM Press. https://doi.org/10.1145/2330163.2330276