Chapter 8 Contextual-signal problem analysis

Here, we give an overview of the contextual-signal diagnostic problem, and we provide our data analyses for related experiments. All of our source code for statistical analyses and data visualizations is embedded in this document. The raw data can be found on the OSF project associated with this work (Lalejini, Moreno, and Ofria 2020).

Please file an issue or make a pull request on github to report any mistakes, ask questions, request more explanation, et cetera.

8.1 Overview

In the contextual-signal problem, programs must respond appropriately to a sequence of two input signals where the first, ‘’contextual’‘, signal dictates how a program should respond to each possible second,’‘response’’, signal. In this work, there are a total of four possible input signals and four possible output responses. Programs output these responses by executing one of four response instructions.

The dataframe below gives the correct output for each combination of input signals.

##          input output  type
## 1  OP:S0;OP:S0      0 S0;S0
## 2  OP:S0;OP:S1      1 S0;S1
## 3  OP:S0;OP:S2      2 S0;S2
## 4  OP:S0;OP:S3      3 S0;S3
## 5  OP:S1;OP:S0      1 S1;S0
## 6  OP:S1;OP:S1      2 S1;S1
## 7  OP:S1;OP:S2      3 S1;S2
## 8  OP:S1;OP:S3      0 S1;S3
## 9  OP:S2;OP:S0      2 S2;S0
## 10 OP:S2;OP:S1      3 S2;S1
## 11 OP:S2;OP:S2      0 S2;S2
## 12 OP:S2;OP:S3      1 S2;S3
## 13 OP:S3;OP:S0      3 S3;S0
## 14 OP:S3;OP:S1      0 S3;S1
## 15 OP:S3;OP:S2      1 S3;S2
## 16 OP:S3;OP:S3      2 S3;S3

8.2 Analysis Dependencies

Load all required R libraries.

These analyses were conducted in the following computing environment:

##                _                           
## platform       x86_64-pc-linux-gnu         
## arch           x86_64                      
## os             linux-gnu                   
## system         x86_64, linux-gnu           
## status                                     
## major          4                           
## minor          0.4                         
## year           2021                        
## month          02                          
## day            15                          
## svn rev        80002                       
## language       R                           
## version.string R version 4.0.4 (2021-02-15)
## nickname       Lost Library Book

8.3 Setup

Load data, initial data cleanup, configure some global settings.

####### Load max fit program data #######
data_loc <- paste0(working_directory, "data/max_fit_orgs.csv")
data <- read.csv(data_loc, na.strings="NONE")

# Specify factors (not all of these matter for this set of runs).
data$matchbin_thresh <- factor(
  data$matchbin_thresh,
  levels=c(0, 25, 50, 75)
)

data$TAG_LEN <- factor(
  data$TAG_LEN,
  levels=c(32, 64, 128, 256)
)

data$task <- factor(
  data$task,
  levels=c("S2", "S3", "S4")
)

# Filter down to only data we use in paper.
data <- filter(data, task=="S4")

# Define function to summarize regulation/memory configurations.
get_con <- function(reg, mem) {
  if (reg == "0" && mem == "0") {
    return("none")
  } else if (reg == "0" && mem=="1") {
    return("memory")
  } else if (reg=="1" && mem=="0") {
    return("regulation")
  } else if (reg=="1" && mem=="1") {
    return("both")
  } else {
    return("UNKNOWN")
  }
}
# Specify experimental condition for each datum.
data$condition <- mapply(
  get_con,
  data$USE_FUNC_REGULATION,
  data$USE_GLOBAL_MEMORY
)

data$condition <- factor(
  data$condition,
  levels=c("regulation", "memory", "none", "both")
)

# Given knockout info, what strategy does a program use?
get_strategy <- function(use_reg, use_mem) {
  if (use_reg=="0" && use_mem=="0") {
    return("use neither")
  } else if (use_reg=="0" && use_mem=="1") {
    return("use memory")
  } else if (use_reg=="1" && use_mem=="0") {
    return("use regulation")
  } else if (use_reg=="1" && use_mem=="1") {
    return("use both")
  } else {
    return("UNKNOWN")
  }
}

# Specify experimental conditions (to make labeling easier).
data$strategy <- mapply(
  get_strategy,
  data$relies_on_regulation,
  data$relies_on_global_memory
)

data$strategy <- factor(
  data$strategy,
  levels=c(
    "use regulation",
    "use memory",
    "use neither",
    "use both"
  )
)

# Filter data to include only replicates labeled as solutions
sol_data <- filter(data, solution=="1")

####### Load instruction execution data #######
inst_exec_data <- read.csv(paste0(working_directory, "data/exec_trace_summary.csv"), na.strings="NA")

inst_exec_data$condition <- mapply(
  get_con,
  inst_exec_data$USE_FUNC_REGULATION,
  inst_exec_data$USE_GLOBAL_MEMORY
)

inst_exec_data$condition <- factor(
  inst_exec_data$condition,
  levels=c("regulation", "memory", "none", "both")
)

inst_exec_data$task <- factor(
  inst_exec_data$task,
  levels=c("S2", "S3", "S4")
)

####### Load network data #######
reg_network_data <- read.csv(paste0(working_directory, "data/reg_graphs_summary.csv"), na.strings="NA")
reg_network_data <- filter(reg_network_data, run_id %in% data$SEED)

get_task <- function(seed) {
  return(filter(data, SEED==seed)$task)
}

reg_network_data$task <- mapply(
  get_task,
  reg_network_data$run_id
)

reg_network_data$task <- factor(reg_network_data$task)

####### misc #######
# Configure our default graphing theme
theme_set(theme_cowplot())

8.4 Problem-solving success

The number of successful replicates by condition:

Test for significance using Fisher’s exact test.

##              success fail
## reg-enabled      200    0
## reg-disabled     173   27
## 
##  Fisher's Exact Test for Count Data
## 
## data:  perf_table
## p-value = 5.818e-09
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  7.714282      Inf
## sample estimates:
## odds ratio 
##        Inf

8.5 How many generations elapse before solutions evolve?

Test for statistical difference between conditions using a Wilcoxon rank sum test.

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  update by condition
## W = 28950, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  557.9999 764.0000
## sample estimates:
## difference in location 
##                    657

8.6 Evolved strategies

8.6.3 Gene regulatory networks

Looking only at successful programs that rely on regulation. At a glance, what do gene regulatory networks look like?

First, the total edges found in networks:

Next, let’s look at edges by type.

Test for a statistical difference between edge types using a wilcoxon rank sum test:

## [1] "Median # repressed edges: 62"
## [1] "Median # promoting edges: 73"
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  reg_edges_cnt by reg_edge_type
## W = 15690, p-value = 0.0001927
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -13.000026  -4.000018
## sample estimates:
## difference in location 
##              -8.000026

8.6.4 Program instruction execution traces

8.6.4.1 Execution time

How many time steps do evolved programs use to solve the contextual-signal task?

Test for significant difference between conditions using Wilcoxon rank sum test:

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  total_execution_time by condition
## W = 30794, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  634.0001 810.0000
## sample estimates:
## difference in location 
##               722.8488

8.6.4.2 What types of instructions to successful programs execute?

Here, we look at the distribution of instruction types executed by successful programs. We’re primarily interested in the proportion of control flow instructions, so let’s look at that first.

Test for significant difference between conditions using a Wilcoxon rank sum test:

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  control_flow_inst_prop by condition
## W = 30479, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  0.04280319 0.05431491
## sample estimates:
## difference in location 
##             0.04838185

In case you’re curious, here’s all categories of instructions:

8.7 Visualizing an evolved gene regulatory network

Let’s take a closer look at a successful gene regulatory network.

Specifically, we’ll be looking at the solution evolved in run id 2.399710^{4} (arbitrarily selected).

8.7.1 Evolved regulatory network

We use the igraph package to draw this program’s gene regulatory network.

## png 
##   2

References

Lalejini, Alexander M, Matthew A Moreno, and Charles Ofria. 2020. “Tag-Based Genetic Regulation for Genetic Programming.” OSF. https://doi.org/10.17605/OSF.IO/928FX.