Overview

Here, we analyze the experimental results from our GECCO 2018 paper, “Evolving Event-driven Programs with SignalGP” (Lalejini and Ofria, 2018).

(just a little) Background

Our paper introduces SignalGP, a new genetic programming (GP) technique designed to provide evolution access to the event-driven programming paradigm. Additionally, we use SignalGP to demonstrate the value of evolving event-driven programs by evolving fully event-driven SignalGP programs and imperative variants of SignalGP programs to solve two problems: the changing environment problem and the distributed leader election problem. This document fully details the data analyses performed to analyze our experimental results for both of these problems.

For the full context of these analyses, a detailed description of our experimental design, and musings on SignalGP, see our paper (Lalejini and Ofria, 2018). A pre-print can be found here.

Source Code

The GitHub repository that contains the source code for this document and our experiments can be found here: https://github.com/amlalejini/GECCO-2018-Evolving-Event-driven-Programs-with-SignalGP.

Dependencies & General Setup

If you haven’t noticed already, this is an R markdown file. We used R version 3.3.2 (2016-10-31) for all statistical analyses (R Core Team, 2016).

All Dunn’s tests were performed using the FSA package (Ogle, 2017).

library(FSA)
## ## FSA v0.8.20. See citation('FSA') if used in publication.
## ## Run fishR() for related website and fishR('IFAR') for related book.

In addition to R, we used Python 3 for data manipulation and visualization. To get Python and R to place nice together, we use the reticulate R package.

library(reticulate)
# We have to tell reticulate which python we want it to use.
use_python("/anaconda3/bin/python")
knitr::knit_engines$set(python = reticulate::eng_python)

All visualizations use the seaborn Python package (Waskom et al., 2017), which wraps Python’s matplotlib library, providing a “high-level interface for drawing attractive statistical graphics”.

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

To handle our data on the Python side of things, we used the pandas Python package, which provides “high-performance, easy-to-use data structures and data analysis tools for the Python programming language”.

import pandas as pd

Problem: Changing Environment

Description

The changing environment problem is a toy problem to test GP programs’ capacity to respond appropriately to environmental signals. The changing environment problem is explicitly designed to be solved using the event-driven paradigm. Thus, under-performance by imperative SignalGP variants was expected.

From our paper (Lalejini and Ofria, 2018):

The changing environment problem requires agents to coordinate to coordinate their behavior with a randomly changing environment. The environment can be in one of K possible states; to maximize fitness, agents must match their internal state to the current state of the environment. The environment is initialized to a random state and has a 12.5% chance of changing to a random state at every subsequent time step. Successful agents must adjust their internal state whenever an environmental change occurs.

Agents adjust their internal state by executing on of K state-altering instructions. Thus, to be successful, agents must monitor the environment state while adjusting their internal state as appropriate.

To explore the value of giving evolution access to the event-driven programming paradigm, we compared the performance of programs with three different mechanisms for sensing the environment:

  1. an event-driven treatment where environmental changes produce signals that have environment-specific tags and can trigger SignalGP program functions
  2. an imperative control where programs need to actively poll the environment to determine its state (using K sensory instructions to test if the environment is in a particular state)
  3. a combined treatment where agents are capable of using environmental signals or actively polling the environment

We evolved SignalGP agents to solve the changing environment problem at K equal to 2, 4, 8, and 16 environmental states. The changing environment problem is designed to scale in difficulty as the number of environments increases. We ran 100 replicate populations of each treatment at K equal to 2, 4, 8, and 16 environmental states. In all replicates and across all treatments, we evolved populations of 1000 agents for 10,000 generations, starting from a simple ancestor program.

Statistical Methods

For every run, we extracted the program with the highest fitness after 10,000 generations of evolution. To account for environmental stochasticity, we tested each program 100 times, using a program’s average performance as its fitness in our analyses. For each environmental size, we compared the performances of evolved programs across treatments. For our analyses, we set our significance level, \(\alpha\), to:

sig_level <- 0.05

To determine if any of the treatments were significant within a set (p < 0.05), we performed a Kruskall-Wallis rank sum test. For a set in which the Kruskal-Wallis test was significant, we performed a post-hoc Dunn’s test, applying a Bonferroni correction for multiple comparisons.

Results

Finally, enough with all of that mumbo-jumbo explanation stuff! Results! Pretty pictures! P values or whatever!

But, before all that, we’ll load in our data:

chgenv_data <- read.csv("../data/chg_env/mt_final_fitness.csv")

A note about how treatments are named within the data: treatment names describe the parameters and their values used when running the experiment. Parameters and their values are adjacent in the name, and parameter-value combinations are separated by underscores. For example, ED1_AS1_ENV2_TSK0 indicates that event-driven (ED) signals were enabled (1), active sensors (AS) were enabled (1), and there were two environments states. In other words, ED1_AS1_ENV2_TSK0 indicates the two-state environment combined treatment. The trailing TSK0 can be ignored.

Two-state Environment

Let’s partition out the two-state environment data:

chgenv_s2_data <- chgenv_data[grep("_ENV2_", chgenv_data$treatment),]
chgenv_s2_data <- chgenv_s2_data[chgenv_s2_data$analysis == "fdom",]
chgenv_s2_data$treatment <- relevel(chgenv_s2_data$treatment, ref="ED1_AS1_ENV2_TSK0")

Accio, visualization!

From the boxplot, the event-driven and combined treatments alone are able to produce optimally-performing programs. But, we’ll be responsible and perform some actual statistical analyses.

First, we’ll do a Kruskal-Wallis test to confirm that at least one of the treatments within the set is different from the others.

chgenv_s2_kt <- kruskal.test(fitness ~ treatment, chgenv_s2_data)
chgenv_s2_kt
## 
##  Kruskal-Wallis rank sum test
## 
## data:  fitness by treatment
## Kruskal-Wallis chi-squared = 283.26, df = 2, p-value < 2.2e-16

According to the Kruskal-Wallis test, at least one treatment is significantly different from the others. Next, we’ll do a post-hoc Dunn’s test, applying a Bonferroni correction for multiple comparisons.

chgenv_s2_dt <- dunnTest(fitness ~ treatment, data=chgenv_s2_data, method="bonferroni")
chgenv_s2_dt_comps <- chgenv_s2_dt$res$Comparison
chgenv_s2_dt
## Dunn (1964) Kruskal-Wallis multiple comparison
##   p-values adjusted with the Bonferroni method.
##                              Comparison         Z      P.unadj
## 1 ED0_AS1_ENV2_TSK0 - ED1_AS0_ENV2_TSK0 -14.57562 4.014904e-48
## 2 ED0_AS1_ENV2_TSK0 - ED1_AS1_ENV2_TSK0 -14.57562 4.014904e-48
## 3 ED1_AS0_ENV2_TSK0 - ED1_AS1_ENV2_TSK0   0.00000 1.000000e+00
##          P.adj
## 1 1.204471e-47
## 2 1.204471e-47
## 3 1.000000e+00

We’ll use some Python code (most of which is under the hood of this document; check out the source code on Git to stare directly into the sun) to pretty-print our Dunn’s test results.

## ===== Significant comparisons: =====
## Imperative - Event-driven
##   adjusted p-value: 1.2044712461463627e-47
##   Z statistic: -14.575615035405951
## Imperative - Combined
##   adjusted p-value: 1.2044712461463627e-47
##   Z statistic: -14.575615035405951
## ===== Not significant comparisons: =====
## Event-driven - Combined
##  adjusted p-value: 1.0
##  Z statistic: 0.0

What we saw in the boxplot is confirmed: the imperative treatment produced significantly lower performing than the combined or event-driven treatment.

Teasing Apart the Combined Treatment

In the combined treatment, evolution had access to both the event-driven (signal-based) strategy and the imperative (sensor-polling) strategy. As shown above, performance in the event-driven and combined treatments did not significantly differ. However, this result along does not reveal what strategies were favored in the combined treatment.

To tease this apart, we re-evaluated programs evolved under the combined treatment in two distinct conditions: one in which we deactivated polling sensors and one in which we prevented environment change signals from triggering functions in SignalGP functions.

First, we’ll extract the relevant data.

chgenv_s2_comb_data <- chgenv_data[grep("ED1_AS1_ENV2_", chgenv_data$treatment),]

Visualize!

When we take away signals, performance decreases; however, when we take away sensors, there’s no effect.

Stats to confirm!

First, we’ll do a Kruskal-Wallis test to confirm that at least one of the treatments within the set is different from the others.

chgenv_s2_comb_kt <- kruskal.test(fitness ~ analysis, chgenv_s2_comb_data)
chgenv_s2_comb_kt
## 
##  Kruskal-Wallis rank sum test
## 
## data:  fitness by analysis
## Kruskal-Wallis chi-squared = 283.27, df = 2, p-value < 2.2e-16

According to the Kruskal-Wallis test, at least one treatment is significantly different from the others. Next, we’ll do a post-hoc Dunn’s test, applying a Bonferroni correction for multiple comparisons.

chgenv_s2_comb_dt <- dunnTest(fitness ~ analysis, data=chgenv_s2_comb_data, method="bonferroni")
chgenv_s2_comb_dt_comps <- chgenv_s2_comb_dt$res$Comparison
chgenv_s2_comb_dt
## Dunn (1964) Kruskal-Wallis multiple comparison
##   p-values adjusted with the Bonferroni method.
##                Comparison        Z      P.unadj        P.adj
## 1       fdom - no_sensors  0.00000 1.000000e+00 1.000000e+00
## 2       fdom - no_signals 14.57563 4.014092e-48 1.204228e-47
## 3 no_sensors - no_signals 14.57563 4.014092e-48 1.204228e-47

Yup!

Four-state Environment

Spoiler alert! The four-, eight-, and sixteen- state environments are all pretty much the same story as the two-state environment. I’m going to cut out most of the commentary for the rest of these results.

Let’s partition out the four-state environment data:

chgenv_s4_fdom_data <- chgenv_data[grep("_ENV4_", chgenv_data$treatment),]
chgenv_s4_fdom_data <- chgenv_s4_fdom_data[chgenv_s4_fdom_data$analysis == "fdom",]
chgenv_s4_fdom_data$treatment <- relevel(chgenv_s4_fdom_data$treatment, ref="ED1_AS1_ENV4_TSK0")

Accio, visualization!

From the boxplot, the event-driven and combined treatments alone are able to produce optimally-performing programs. But, we’ll be responsible and perform some actual statistical analyses.

First, we’ll do a Kruskal-Wallis test to confirm that at least one of the treatments within the set is different from the others.

chgenv_s4_fdom_kt <- kruskal.test(fitness ~ treatment, chgenv_s4_fdom_data)
chgenv_s4_fdom_kt
## 
##  Kruskal-Wallis rank sum test
## 
## data:  fitness by treatment
## Kruskal-Wallis chi-squared = 283.26, df = 2, p-value < 2.2e-16

According to the Kruskal-Wallis test, at least one treatment is significantly different from the others. Next, we’ll do a post-hoc Dunn’s test, applying a Bonferroni correction for multiple comparisons.

chgenv_s4_fdom_dt <- dunnTest(fitness ~ treatment, data=chgenv_s4_fdom_data, method="bonferroni")
chgenv_s4_fdom_dt_comps <- chgenv_s4_fdom_dt$res$Comparison
chgenv_s4_fdom_dt
## Dunn (1964) Kruskal-Wallis multiple comparison
##   p-values adjusted with the Bonferroni method.
##                              Comparison         Z      P.unadj
## 1 ED0_AS1_ENV4_TSK0 - ED1_AS0_ENV4_TSK0 -14.57561 4.015039e-48
## 2 ED0_AS1_ENV4_TSK0 - ED1_AS1_ENV4_TSK0 -14.57561 4.015039e-48
## 3 ED1_AS0_ENV4_TSK0 - ED1_AS1_ENV4_TSK0   0.00000 1.000000e+00
##          P.adj
## 1 1.204512e-47
## 2 1.204512e-47
## 3 1.000000e+00
## ===== Significant comparisons: =====
## Imperative - Event-driven
##   adjusted p-value: 1.2045118388700292e-47
##   Z statistic: -14.575612733980755
## Imperative - Combined
##   adjusted p-value: 1.2045118388700292e-47
##   Z statistic: -14.575612733980755
## ===== Not significant comparisons: =====
## Event-driven - Combined
##  adjusted p-value: 1.0
##  Z statistic: 0.0
Teasing Apart the Combined Treatment

In the combined treatment, evolution had access to both the event-driven (signal-based) strategy and the imperative (sensor-polling) strategy. As shown above, performance in the event-driven and combined treatments did not significantly differ. However, this result along does not reveal what strategies were favored in the combined treatment.

To tease this apart, we re-evaluated programs evolved under the combined treatment in two distinct conditions: one in which we deactivated polling sensors and one in which we prevented environment change signals from triggering functions in SignalGP functions.

First, we’ll extract the relevant data.

chgenv_s4_comb_data <- chgenv_data[grep("ED1_AS1_ENV4_", chgenv_data$treatment),]

Visualize!

When we take away signals, performance decreases; however, when we take away sensors, there’s no effect.

Stats to confirm!

First, we’ll do a Kruskal-Wallis test to confirm that at least one of the treatments within the set is different from the others.

chgenv_s4_comb_kt <- kruskal.test(fitness ~ analysis, chgenv_s4_comb_data)
chgenv_s4_comb_kt
## 
##  Kruskal-Wallis rank sum test
## 
## data:  fitness by analysis
## Kruskal-Wallis chi-squared = 283.27, df = 2, p-value < 2.2e-16

According to the Kruskal-Wallis test, at least one treatment is significantly different from the others. Next, we’ll do a post-hoc Dunn’s test, applying a Bonferroni correction for multiple comparisons.

chgenv_s4_comb_dt <- dunnTest(fitness ~ analysis, data=chgenv_s4_comb_data, method="bonferroni")
chgenv_s4_comb_dt_comps <- chgenv_s4_comb_dt$res$Comparison
chgenv_s4_comb_dt
## Dunn (1964) Kruskal-Wallis multiple comparison
##   p-values adjusted with the Bonferroni method.
##                Comparison        Z      P.unadj        P.adj
## 1       fdom - no_sensors  0.00000 1.000000e+00 1.000000e+00
## 2       fdom - no_signals 14.57564 4.013687e-48 1.204106e-47
## 3 no_sensors - no_signals 14.57564 4.013687e-48 1.204106e-47

Eight-state Environment

Let’s partition out the eight-state environment data:

chgenv_s8_fdom_data <- chgenv_data[grep("_ENV8_", chgenv_data$treatment),]
chgenv_s8_fdom_data <- chgenv_s8_fdom_data[chgenv_s8_fdom_data$analysis == "fdom",]
chgenv_s8_fdom_data$treatment <- relevel(chgenv_s8_fdom_data$treatment, ref="ED1_AS1_ENV8_TSK0")

Accio, visualization!