Repeated Signal Task Analysis

Overview

Analysis Dependencies

Setup

What conditions produce solutions to the repeated signal task?

Number of successful replicates by condition

2 -signal task - statistical analysis of solution counts
4 -signal task - statistical analysis of solution counts
8 -signal task - statistical analysis of solution counts
16 -signal task - statistical analysis of solution counts
Takeaway

Scores by condition

How many generations elapse before solutions arise?

Two-signal task
Four-signal task
Eight-signal task
Sixteen-signal task

Teasing apart evolved digital organism strategies

Do solutions rely on genetic regulation or global memory access for signal-response plasticity?
Are evolved programs relying on stochastic strategies?
What forms of genetic regulation do evolved digital organisms rely on?

Gene regulatory network edges
Knockout experiments

Case study: visualizing regulation in an evolved digital organism

Function regulation over time
Environmental signal tag-match score over time
Evolved regulatory network

This document is a companion to our submission to the 2020 Artificial Life conference, ‘Genetic regulation facilitates the evolution of signal-response plasticity in digital organisms’. Here, we give an overview of the repeated signal diagnostic task (mirroring the overview given in the paper), and we provide our data analyses for our repeated signal task experiments. All of our source code for statistical analyses and data visualizations is embedded in this document. Please file an issue or make a pull request on github to report any mistakes or request more explanation.

Overview

# Experimental parameters referenced in-text all in one convenient place.
time_steps <- 128
replicates <- 100
population_size <- 1000
generations <- 10000
env_complexities <- c(2, 4, 8, 16)

The repeated signal task is one of several diagnostics we used to evaluate whether genetic regulation faculties contribute to, and potentially detract from, the functionality of evolved SignalGP digital organisms. The repeated signal task, specifically, requires organisms to exhibit signal-response plasticity; that is, they must shift their response to a repeated environmental signal during their lifetime.

The repeated signal task requires organisms to express the appropriate (distinct) response to a single environmental signal each of the $K$ times that it is repeated. Organisms express reponses by executing one of $K$ response instructions. For example, if organisms receive four signals from the environment (i.e., $K=2$ ), a maximally fit organism will express Response-1 after the first signal, Response-2 after the second, Response-3 after the third, and Response-4 after the fourth. The figure below depicts examples of optimal behavior in two- and four-signal tasks.

repeated-signal-task-overview

Requiring organisms to execute a distinct instruction for each repetition of the environmental signal represents organisms having to perform distinct behaviors. Note that each repetition of the environmental signal is identical (i.e., each has an identical tag), and as such, organisms must track how many times the signal has been repeated and dynamically shift their responses accordingly.

We allowed organisms 128 time steps to express the appropriate response after receiving an environmental signal. Once the allotted time to respond expires or the organism expresses any response, the organism’s threads of execution are reset, resulting in a loss of all thread-local memory. Only the contents of an organism’s global memory and each function’s regulatory state persist. The environment then produces the next signal (identical to all previous environment signals) to which the organism may respond. An organism must use their global memory buffer or genetic regulation to correctly shift its response to each subsequent environmental signal. An organism’s fitness is equal to the number of correct responses expressed during evaluation.

We evolved populations of 1000 SignalGP organisms to solve the repeated signal task at four complexity levels: $K=2$ , $K=4$ , $K=8$ , $K=16$ (where $K$ denotes the number of repeated signals). $K=16$ is the most complex (and thus most challenging) environment, and $K=2$ is the simplest (and thus least challenging) environment. We evolved populations for 10^{4} generations or until an organism capable of achieving a perfect score during task evaluation (i.e., able to express the appropriate response to all $K$ signals) evolved.

For each number of repeated signals ( $K=$ 2, 4, 8, 16), we ran 100 replicate populations (each with a distinct random number seed) of four experimental conditions:

a memory-only treatment where organisms must use global memory (in combination with procedural flow-control mechanisms) to correctly respond to environmental signals.
a regulation-augmented treatment where organisms have access to both global memory and genetic regulation.
a regulation-only control where organisms can adjust signal-responses through regulation.
a control where organisms can use neither global memory nor genetic regulation (which should make the repeated signal task impossible to solve; if we see solutions, something is not working as we expected…)

Note that in this document, we’ll often refer to the treatments with the following shorthand (for space considerations when typing and on graphs):

Memory-only treatment as: ‘mem.’ or ‘memory’,
Regulation-augmented treatment as: ‘both’ (i.e., both global memory and genetic regulation)
Regulation-only control as: ‘reg.’ or ‘regulation’
Control with neither global memory access nor genetic regulation as: ‘none’ or ‘neither’

Analysis Dependencies

Load all required R libraries.

library(ggplot2)  # (Wickham, 2016)
library(tidyr)    # (Wickham and Henry, 2020)
library(dplyr)    # (Wickham et al., 2020)
library(reshape2) # (Wickham, 2007)
library(cowplot)  # (Wilke, 2019)
library(viridis)  # (Garnier, 2018)
library(igraph)   # (Csardi and Nepusz, 2006)

These analyses were conducted in the following computing environment:

print(version)

##                _                           
## platform       x86_64-apple-darwin15.6.0   
## arch           x86_64                      
## os             darwin15.6.0                
## system         x86_64, darwin15.6.0        
## status                                     
## major          3                           
## minor          6.2                         
## year           2019                        
## month          12                          
## day            12                          
## svn rev        77560                       
## language       R                           
## version.string R version 3.6.2 (2019-12-12)
## nickname       Dark and Stormy Night

Setup

Load data, initial data cleanup, configure some global settings.

data_dir <- "../data/balanced-reg-mult/alt-sig/"
max_fit_org_data_loc <- paste(data_dir, "max_fit_orgs.csv", sep="")
max_fit_org_data <- read.csv(max_fit_org_data_loc, na.strings="NONE")

# We don't need programs for these analyses + they bloat the data set, so we'll chop them off.
max_fit_org_data <- subset(max_fit_org_data, select = -c(program))

# Specify factors (not all of these matter for this set of runs).
max_fit_org_data$matchbin_thresh <- factor(max_fit_org_data$matchbin_thresh,
                                           levels=c(0, 25, 50, 75))
max_fit_org_data$NUM_SIGNAL_RESPONSES <- factor(max_fit_org_data$NUM_SIGNAL_RESPONSES,
                                                levels=c(2, 4, 8, 16, 32))
max_fit_org_data$NUM_ENV_CYCLES <- factor(max_fit_org_data$NUM_ENV_CYCLES,
                                          levels=c(2, 4, 8, 16, 32))
max_fit_org_data$TAG_LEN <- factor(max_fit_org_data$TAG_LEN,
                                   levels=c(32, 64, 128))

# Define function to summarize regulation/memory configurations.
get_con <- function(reg, mem) {
  if (reg == "0" && mem == "0") {
    return("none")
  } else if (reg == "0" && mem=="1") {
    return("memory")
  } else if (reg=="1" && mem=="0") {
    return("regulation")
  } else if (reg=="1" && mem=="1") {
    return("both")
  } else {
    return("UNKNOWN")
  }
}

# Specify experimental condition for each datum.
max_fit_org_data$condition <- mapply(get_con, 
                                     max_fit_org_data$USE_FUNC_REGULATION,
                                     max_fit_org_data$USE_GLOBAL_MEMORY)
max_fit_org_data$condition <- factor(max_fit_org_data$condition, 
                                     levels=c("regulation", "memory", "none", "both"))

# Does this program rely on a stochastic strategy?
max_fit_org_data$stochastic <- 1 - max_fit_org_data$consistent 


# Load network data
reg_network_data_loc <- paste(data_dir, "reg_graphs_summary.csv", sep="")
reg_network_data <- read.csv(reg_network_data_loc, na.strings="NA")

# Settings for statistical analyses.
alpha <- 0.05
correction_method <- "bonferroni"

# Configure our default graphing theme
theme_set(theme_cowplot())

What conditions produce solutions to the repeated signal task?

We expected populations with access to genetic regulation to be more successful on the repeated signal task than those evolved without access to genetic regulation.

We can look at (1) the number of successful replicates (i.e., replicates in which a program capable of perfectly solving the repeated signal task evolved) per condition and (2) the scores of the highest-fitness program evolved in each replciate.

Number of successful replicates by condition

Note that an organism is categorized as a ‘solution’ if it can correctly respond to each of repetition of the environment signal.

# Labels for each
label_lu <- c(
  "2" = "2-signal task", 
  "4" = "4-signal task", 
  "8" = "8-signal task", 
  "16" = "16-signal task", 
  "32" ="32-signal task"
)

# Filter data to include only replicates labeled as solutions
sol_data <- filter(max_fit_org_data, solution=="1")
# Graph the number of solutions evolved in each condition, faceted by environmental complexity
ggplot(sol_data, aes(x=condition, fill=condition)) +
  geom_bar() +
  geom_text(stat="count", 
            mapping=aes(label=..count..), 
            position=position_dodge(0.9), vjust=0) +
  scale_y_continuous(name="# evolved solutions", 
                     breaks=seq(0, replicates, 10), 
                     limits=c(0, replicates+2)) +
  scale_fill_discrete(name="Condition:",
                      limits=c("regulation", "memory", "both", "none"),
                      labels=c("Regulation-only (R)", "Memory-only (M)", "Both (B)", "Neither (N)")) +
  scale_x_discrete(name="Condition",
                   limits=c("regulation", "memory", "both", "none"),
                   labels=c("R", "M", "B", "N")) +
  facet_wrap(~ NUM_SIGNAL_RESPONSES, 
             nrow=1, 
             labeller=labeller(NUM_SIGNAL_RESPONSES=label_lu)) +
  ggtitle("Solution Counts") +
  theme(legend.position="bottom", 
        axis.title.x=element_blank()) +
  ggsave("./imgs/repeated-signal-solultion-cnts.png", 
         width=8, height=2.75)

Note that in the lowest-complexity environment ( $K=2$ ), 1 replicates using the original version of SignalGP digital organisms produce solutions. This, in addition to hand-coded programs, confirm that it is possible to solve the repeated signal task without regulation. We see many more replicates yield solutions when given access to genetic regulation. We also do not observe any solutions evolve when given access to neither genetic regulation nor global memory.

We use a Fisher’s exact test to determine if there are significant differences (p < 0.05) between the numbers of solutions found in the regulation-augmented SignalGP condition and all other conditions (within each environment complexity level). We correct for multiple comparisons using the bonferroni method.

# This code chunk is sort of a monster to have things print out all pretty-like in the knitted HTML document.

# For each environment complexity level, do a fisher's exact test and print results.
for (env in env_complexities) {
  env_data <- filter(max_fit_org_data, NUM_SIGNAL_RESPONSES==env)
  cat("#### ", env, "-signal task - statistical analysis of solution counts  \n")
  
  # Extract successes/fails for each condition.
  mem_success_cnt <- nrow(filter(env_data, solution=="1" & condition=="memory"))
  mem_fail_cnt <- nrow(filter(env_data, condition=="memory")) - mem_success_cnt
  reg_success_cnt <- nrow(filter(env_data, solution=="1" & condition=="regulation"))
  reg_fail_cnt <- nrow(filter(env_data, condition=="regulation")) - reg_success_cnt
  both_success_cnt <- nrow(filter(env_data, solution=="1" & condition=="both"))
  both_fail_cnt <- nrow(filter(env_data, condition=="both")) - both_success_cnt
  none_success_cnt <- nrow(filter(env_data, solution=="1" & condition=="none"))
  none_fail_cnt <- nrow(filter(env_data, condition=="none")) - none_success_cnt
  
  # Regulation-enabled SGP vs. Classic SGP
  mem_sgp_table <- matrix(c(both_success_cnt, 
                            mem_success_cnt, 
                            both_fail_cnt, 
                            mem_fail_cnt), 
                          nrow=2)
  rownames(mem_sgp_table) <- c("both", "mem-only")
  colnames(mem_sgp_table) <- c("success", "fail")
  mem_sgp_fishers <- fisher.test(mem_sgp_table)
  
  # Regulation-enabled SGP vs. Regulation-only SGP
  reg_sgp_table <- matrix(c(both_success_cnt, 
                            reg_success_cnt, 
                            both_fail_cnt, 
                            reg_fail_cnt), nrow=2)
  rownames(reg_sgp_table) <- c("both", "reg-only")
  colnames(reg_sgp_table) <- c("success", "fail")
  reg_sgp_fishers <- fisher.test(reg_sgp_table)
  
  # Regulation-enabled SGP vs. No reg/mem SGP
  none_sgp_table <- matrix(c(both_success_cnt, 
                             none_success_cnt, 
                             both_fail_cnt, 
                             none_fail_cnt), nrow=2)
  rownames(none_sgp_table) <- c("both", "none")
  colnames(none_sgp_table) <- c("success", "fail")
  none_sgp_fishers <- fisher.test(none_sgp_table)
  
  # Adjust for multiple comparisons
  adjusted <- p.adjust(p=c(mem_sgp_fishers$p.value, 
                           reg_sgp_fishers$p.value, 
                           none_sgp_fishers$p.value), 
                       method=correction_method)
  mem_sgp_fishers$p.adjusted <- adjusted[1]
  reg_sgp_fishers$p.adjusted <- adjusted[2]
  none_sgp_fishers$p.adjusted <- adjusted[3]
  
  # Summarize!
  no_sig_comps <- TRUE
  cat("Significant comparisons (Fisher's exact test, p < ", alpha, ",",
      correction_method, " correction for multiple comparisons):  \n\n")
  if (mem_sgp_fishers$p.adjusted < alpha) {
    cat("- Regulation-augmented SignalGP (both regulation & global memory) vs. global-memory-only SignalGP (original version of SignalGP)  \n")
    no_sig_comps <- FALSE
  }
  if (reg_sgp_fishers$p.adjusted < alpha) {
    cat("- Regulation-augmented SignalGP (both regulation & global memory) vs. regulation-only SignalGP (no global memory access)  \n")
    no_sig_comps <- FALSE
  }
  if (none_sgp_fishers$p.adjusted < alpha) {
    cat("- Regulation-augmented SignalGP (both regulation & global memory) vs. regulation- and memory-disabled SignalGP control  \n")
    no_sig_comps <- FALSE
  }
  if (no_sig_comps) {
    cat("- NONE  \n")
  }
  
  cat("\n")
  cat("**Results - Regulation-augmented SignalGP (both regulation & global memory) vs. global-memory-only SignalGP (original version of SignalGP)**  \n")
  cat("```\n")
  print(mem_sgp_table)
  print(mem_sgp_fishers)
  cat("p.adjusted = ", mem_sgp_fishers$p.adjusted, 
      " (method = ", correction_method, ")", "  \n")
  cat("```\n")
  
  cat("**Results - Regulation-augmented SignalGP (both regulation & global memory) vs. regulation-only SignalGP (no global memory access)**  \n")
  cat("```\n")
  print(reg_sgp_table)
  print(reg_sgp_fishers)
  cat("p.adjusted = ", reg_sgp_fishers$p.adjusted, 
      " (method = ", correction_method, ")", "  \n")
  cat("```\n")
  
  cat("**Results - Regulation-augmented SignalGP (both regulation & global memory) vs. regulation- and memory-disabled SignalGP**  \n")
  cat("```\n")
  print(none_sgp_table)
  print(none_sgp_fishers)
  cat("p.adjusted = ", none_sgp_fishers$p.adjusted, 
      " (method = ", correction_method,")", "  \n")
  cat("```\n")
  cat("\n")
}

2 -signal task - statistical analysis of solution counts

Significant comparisons (Fisher’s exact test, p < 0.05 , bonferroni correction for multiple comparisons):

Regulation-augmented SignalGP (both regulation & global memory) vs. global-memory-only SignalGP (original version of SignalGP)
Regulation-augmented SignalGP (both regulation & global memory) vs. regulation- and memory-disabled SignalGP control

Results - Regulation-augmented SignalGP (both regulation & global memory) vs. global-memory-only SignalGP (original version of SignalGP)

         success fail
both         100    0
mem-only       1   99

    Fisher's Exact Test for Count Data

data:  mem_sgp_table
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 830.2776      Inf
sample estimates:
odds ratio 
       Inf 

p.adjusted =  6.692545e-57  (method =  bonferroni )

Results - Regulation-augmented SignalGP (both regulation & global memory) vs. regulation-only SignalGP (no global memory access)

         success fail
both         100    0
reg-only     100    0

    Fisher's Exact Test for Count Data

data:  reg_sgp_table
p-value = 1
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
   0 Inf
sample estimates:
odds ratio 
         0 

p.adjusted =  1  (method =  bonferroni )

Results - Regulation-augmented SignalGP (both regulation & global memory) vs. regulation- and memory-disabled SignalGP

     success fail
both     100    0
none       0  100

    Fisher's Exact Test for Count Data

data:  none_sgp_table
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1302.151      Inf
sample estimates:
odds ratio 
       Inf 

p.adjusted =  6.626282e-59  (method =  bonferroni )

4 -signal task - statistical analysis of solution counts

Significant comparisons (Fisher’s exact test, p < 0.05 , bonferroni correction for multiple comparisons):

Regulation-augmented SignalGP (both regulation & global memory) vs. global-memory-only SignalGP (original version of SignalGP)
Regulation-augmented SignalGP (both regulation & global memory) vs. regulation- and memory-disabled SignalGP control

Results - Regulation-augmented SignalGP (both regulation & global memory) vs. global-memory-only SignalGP (original version of SignalGP)

         success fail
both         100    0
mem-only       0  100

    Fisher's Exact Test for Count Data

data:  mem_sgp_table
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1302.151      Inf
sample estimates:
odds ratio 
       Inf 

p.adjusted =  6.626282e-59  (method =  bonferroni )

Results - Regulation-augmented SignalGP (both regulation & global memory) vs. regulation-only SignalGP (no global memory access)

         success fail
both         100    0
reg-only     100    0

    Fisher's Exact Test for Count Data

data:  reg_sgp_table
p-value = 1
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
   0 Inf
sample estimates:
odds ratio 
         0 

p.adjusted =  1  (method =  bonferroni )

Results - Regulation-augmented SignalGP (both regulation & global memory) vs. regulation- and memory-disabled SignalGP

     success fail
both     100    0
none       0  100

    Fisher's Exact Test for Count Data

data:  none_sgp_table
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1302.151      Inf
sample estimates:
odds ratio 
       Inf 

p.adjusted =  6.626282e-59  (method =  bonferroni )

8 -signal task - statistical analysis of solution counts

Significant comparisons (Fisher’s exact test, p < 0.05 , bonferroni correction for multiple comparisons):

Regulation-augmented SignalGP (both regulation & global memory) vs. global-memory-only SignalGP (original version of SignalGP)
Regulation-augmented SignalGP (both regulation & global memory) vs. regulation- and memory-disabled SignalGP control

Results - Regulation-augmented SignalGP (both regulation & global memory) vs. global-memory-only SignalGP (original version of SignalGP)

         success fail
both         100    0
mem-only       0  100

    Fisher's Exact Test for Count Data

data:  mem_sgp_table
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1302.151      Inf
sample estimates:
odds ratio 
       Inf 

p.adjusted =  6.626282e-59  (method =  bonferroni )

Results - Regulation-augmented SignalGP (both regulation & global memory) vs. regulation-only SignalGP (no global memory access)

         success fail
both         100    0
reg-only      99    1

    Fisher's Exact Test for Count Data

data:  reg_sgp_table
p-value = 1
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.02564066        Inf
sample estimates:
odds ratio 
       Inf 

p.adjusted =  1  (method =  bonferroni )

Results - Regulation-augmented SignalGP (both regulation & global memory) vs. regulation- and memory-disabled SignalGP

     success fail
both     100    0
none       0  100

    Fisher's Exact Test for Count Data

data:  none_sgp_table
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1302.151      Inf
sample estimates:
odds ratio 
       Inf 

p.adjusted =  6.626282e-59  (method =  bonferroni )

16 -signal task - statistical analysis of solution counts

Significant comparisons (Fisher’s exact test, p < 0.05 , bonferroni correction for multiple comparisons):

Regulation-augmented SignalGP (both regulation & global memory) vs. global-memory-only SignalGP (original version of SignalGP)
Regulation-augmented SignalGP (both regulation & global memory) vs. regulation- and memory-disabled SignalGP control

Results - Regulation-augmented SignalGP (both regulation & global memory) vs. global-memory-only SignalGP (original version of SignalGP)

         success fail
both          41   59
mem-only       0  100

    Fisher's Exact Test for Count Data

data:  mem_sgp_table
p-value = 5.029e-15
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 16.94046      Inf
sample estimates:
odds ratio 
       Inf 

p.adjusted =  1.508617e-14  (method =  bonferroni )

Results - Regulation-augmented SignalGP (both regulation & global memory) vs. regulation-only SignalGP (no global memory access)

         success fail
both          41   59
reg-only      46   54

    Fisher's Exact Test for Count Data

data:  reg_sgp_table
p-value = 0.5684
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.4484092 1.4831343
sample estimates:
odds ratio 
 0.8166077 

p.adjusted =  1  (method =  bonferroni )

Results - Regulation-augmented SignalGP (both regulation & global memory) vs. regulation- and memory-disabled SignalGP

     success fail
both      41   59
none       0  100

    Fisher's Exact Test for Count Data

data:  none_sgp_table
p-value = 5.029e-15
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 16.94046      Inf
sample estimates:
odds ratio 
       Inf 

p.adjusted =  1.508617e-14  (method =  bonferroni )

Takeaway

Indeed, our analyses (below), confirm our visual intuitions: conditions with access to function regulation outperform those without access to function regulation. These data imply that genetic regulation is more readily adaptable than global memory usage (as it is currently designed/implemented).

Scores by condition

Here, we visualize the raw task scores for the highest-fitness organisms from each run across all environments/conditions.

ggplot(max_fit_org_data, aes(x=condition, y=score, color=condition)) +
  geom_boxplot() +
  ylab("Score (# correct responses)") +
  scale_color_discrete(name="Condition",
                       breaks=c("regulation", "memory", "none", "both"),
                       labels=c("Regulation-only (R)", "Global-memory-only (M)", "Neither (N)", "Both (B)")) +
  scale_x_discrete(name="Condition",
                   breaks=c("regulation", "memory", "none", "both"),
                   labels=c("R", "M", "N", "B")) +
  facet_wrap(~ NUM_SIGNAL_RESPONSES, 
             scales="free_y", 
             labeller=labeller(NUM_SIGNAL_RESPONSES=label_lu)) +
 theme(legend.position="bottom", 
      axis.title.x=element_blank()) +
  ggtitle("Task Scores") +
  ggsave("./imgs/repeated-signal-scores.png", width=16,height=8)

How many generations elapse before solutions arise?

Do some conditions lead to the evolution of effective signal-response plasticity than other conditions? Here, we compare the generation at which solutions arise. Note that it is not a truly fair comparison to make between conditions where an unequal number of replicates produce solutions. Thus, these comparisons are purely exploratory, and we can make no strong claims based on them.

If we wanted to make strong claims based on the time it took for solutions to evolve, we should pick an arbitrary threshold, say 50%. Then, re-run a number of replicates for each condition for as long as it takes 50% of them to produce a solution. We could then compare the generations at which solutions evolved in the first 50% of the replicates to produce a solution.

ggplot(sol_data,
       aes(x=condition, y=update, color=condition)) +
  geom_boxplot() +
  ggtitle("Time to solution") +
  scale_color_discrete(name="Condition: ",
                       breaks=c("regulation", "memory", "none", "both"),
                       labels=c("Regulation (R)", "Global memory (M)", "Neither (N)", "Both (B)")) +
  scale_x_discrete(breaks=c("regulation", "memory", "none", "both"),
                   labels=c("R", "M", "N", "B")) +
  scale_y_continuous(name="Generation first solution evolved \n(log scale)",
                     limits=c(0, 10000), 
                     breaks=c(0, 10, 100, 1000, 10000), 
                     trans="pseudo_log") +
  facet_wrap(~ NUM_SIGNAL_RESPONSES, 
             nrow=1, 
             labeller=labeller(NUM_SIGNAL_RESPONSES=label_lu)) +
  theme(legend.position="bottom", 
        axis.title.x=element_blank()) +
  ggsave("./imgs/repeated-signal-solve-time.png",
         width=8,
         height=4)

ggplot(max_fit_org_data,
       aes(x=condition, y=update, color=condition)) +
  geom_boxplot() +
  ggtitle(paste("Time to solution \n(non-solutions assigned time of ", generations, ")", sep="")) +
  scale_color_discrete(name="Condition: ",
                       breaks=c("regulation", "memory", "none", "both"),
                       labels=c("Regulation (R)", "Global memory (M)", "Neither (N)", "Both (B)")) +
  scale_x_discrete(breaks=c("regulation", "memory", "none", "both"),
                   labels=c("R", "M", "N", "B")) +
  scale_y_continuous(name="Generation first solution evolved \n(log scale)",
                     limits=c(0, 10000), 
                     breaks=c(0, 10, 100, 1000, 10000), 
                     trans="pseudo_log") +
  facet_wrap(~ NUM_SIGNAL_RESPONSES, 
             nrow=1, 
             labeller=labeller(NUM_SIGNAL_RESPONSES=label_lu)) +
  theme(legend.position="bottom", 
        axis.text.x=element_blank(), 
        axis.title.x=element_blank()) +
  ggsave("./imgs/repeated-signal-solve-time-all.png",
         width=8,
         height=4)

Two-signal task

Comparing time to solutions evolving (only in successful replicates) in the two-signal task using a Kruskal Wallis test and, if significant, a post-hoc wilcoxon rank-sum test.

env_2_sol_data <- filter(max_fit_org_data, 
                         solution=="1" & NUM_SIGNAL_RESPONSES==2)
# First, kruskal-wallis to test for significant differences among groups
env2_kt <- kruskal.test(update ~ condition,
                        data=env_2_sol_data)
print(env2_kt)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  update by condition
## Kruskal-Wallis chi-squared = 200, df = 2, p-value < 2.2e-16

# Next, if kruskal-wallis is significant, perform a pairwise wilcoxon rank-sum test
if (env2_kt$p.value < alpha) {
  # x: response vector; g: grouping vector
  env2_wt <- pairwise.wilcox.test(x=env_2_sol_data$update, 
                                  g=env_2_sol_data$condition, 
                                  p.adjust.method=correction_method, 
                                  exact=FALSE)
  print(env2_wt)
} else {
  print("No significant difference.")
}

## 
##  Pairwise comparisons using Wilcoxon rank sum test 
## 
## data:  env_2_sol_data$update and env_2_sol_data$condition 
## 
##        regulation memory
## memory <2e-16     -     
## both   -          <2e-16
## 
## P value adjustment method: bonferroni

Solutions evolved with access to genetic regulation arise in fewer generations than thsoe evolved without access to genetic regulation.

Four-signal task

Comparing time to solutions evolving (only in successful replicates) in the four-signal task using a Kruskal Wallis test and, if significant, a post-hoc wilcoxon rank-sum test.

env_4_sol_data <- filter(max_fit_org_data, 
                         solution=="1" & NUM_SIGNAL_RESPONSES==4)
# First, kruskal-wallis to test for significant differences among groups
env4_kt <- kruskal.test(update ~ condition, 
                        data=env_4_sol_data)
print(env4_kt)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  update by condition
## Kruskal-Wallis chi-squared = 0.00065947, df = 1, p-value = 0.9795

# Next, if kruskal-wallis is significant, perform a pairwise wilcoxon rank-sum test
if (env4_kt$p.value < alpha) {
  # x: response vector; g: grouping vector
  env4_wt <- pairwise.wilcox.test(x=env_4_sol_data$update, 
                                  g=env_4_sol_data$condition, 
                                  p.adjust.method=correction_method, 
                                  exact=FALSE)
  print(env4_wt)
} else {
  print("No significant difference.")
}

## [1] "No significant difference."

There is no significant difference in time for solutions to arise in conditions with access to both genetic regulation and global memory versus access to regulation only.

Eight-signal task

Comparing time to solutions evolving (only in successful replicates) in the eight-signal task using a Kruskal Wallis test and, if significant, a post-hoc wilcoxon rank-sum test.

env_8_sol_data <- filter(max_fit_org_data,
                         solution=="1" & NUM_SIGNAL_RESPONSES==8)
# First, kruskal-wallis to test for significant differences among groups
env8_kt <- kruskal.test(update ~ condition, 
                        data=env_8_sol_data)
print(env8_kt)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  update by condition
## Kruskal-Wallis chi-squared = 0.012548, df = 1, p-value = 0.9108

# Next, if kruskal-wallis is significant, perform a pairwise wilcoxon rank-sum test
if (env8_kt$p.value < alpha) {
  # x: response vector; g: grouping vector
  env8_wt <- pairwise.wilcox.test(x=env_8_sol_data$update, 
                                  g=env_8_sol_data$condition, 
                                  p.adjust.method=correction_method, 
                                  exact=FALSE)
  print(env8_wt)
} else {
  print("No significant difference.")
}

## [1] "No significant difference."

There is no significant difference in time for solutions to arise in conditions with access to both genetic regulation and global memory versus access to regulation only.

Sixteen-signal task

Comparing time to solutions evolving (only in successful replicates) in the eight-signal task using a Kruskal Wallis test and, if significant, a post-hoc wilcoxon rank-sum test.

# First, kruskal-wallis to test for significant differences among groups
env_16_sol_data <- filter(max_fit_org_data, 
                          solution=="1" & NUM_SIGNAL_RESPONSES==16)
env16_kt <- kruskal.test(update ~ condition, 
                         data=env_16_sol_data)
print(env16_kt)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  update by condition
## Kruskal-Wallis chi-squared = 1.1662, df = 1, p-value = 0.2802

# Next, if kruskal-wallis is significant, perform a pairwise wilcoxon rank-sum test
if (env16_kt$p.value < alpha) {
  # x: response vector; g: grouping vector
  env16_wt <- pairwise.wilcox.test(x=env_16_sol_data$update, 
                                   g=env_16_sol_data$condition, 
                                   p.adjust.method=correction_method, 
                                   exact=FALSE)
  print(env16_wt)
} else {
  print("No significant difference.")
}

## [1] "No significant difference."

There is no significant difference in time for solutions to arise in conditions with access to both genetic regulation and global memory versus access to regulation only.

Teasing apart evolved digital organism strategies

Do solutions rely on genetic regulation or global memory access for signal-response plasticity?

Here, we take a closer at the strategies employed by solutions evolved across environment complexities. For each evolved solution, we independently knocked out (disabled) function regulation and global memory access and measured the fitness effects knocking each out. If a knockout resulted in a decrease in fitness, we labeled that organism as relying on that functionality (global memory or genetic regulation) for success.

The graph below gives the proportion of solutions that rely exclusively on regulation, exclusively on global memory, on both global memory and regulation, and on neither functionality.

# Do a little bit of data processing...
reg_and_mem_solutions <- filter(max_fit_org_data, 
                                solution=="1")
get_strategy <- function(use_reg, use_mem) {
  if (use_reg=="0" && use_mem=="0") {
    return("use neither")
  } else if (use_reg=="0" && use_mem=="1") {
    return("use memory")
  } else if (use_reg=="1" && use_mem=="0") {
    return("use regulation")
  } else if (use_reg=="1" && use_mem=="1") {
    return("use both")
  } else {
    return("UNKNOWN")
  }
}

# Specify experimental conditions (to make labeling easier).
reg_and_mem_solutions$strategy <- 
  mapply(get_strategy,
         reg_and_mem_solutions$relies_on_regulation,
         reg_and_mem_solutions$relies_on_global_memory)

reg_and_mem_solutions$strategy <-
  factor(reg_and_mem_solutions$strategy, 
         levels=c("use regulation", 
                  "use memory", 
                  "use neither", 
                  "use both"))

ggplot(reg_and_mem_solutions, 
       mapping=aes(x=NUM_SIGNAL_RESPONSES, fill=strategy)) +
  geom_bar(position="fill", stat="count") +
  geom_text(stat='count', 
            mapping=aes(label=..count..), 
            position=position_fill(vjust=-0.03)) +
  ylab("% of Solutions") +
  xlab("Environment Complexity") +
  scale_fill_discrete(name="Strategy:", 
                      breaks=c("use regulation", 
                               "use memory", 
                               "use neither", 
                               "use both"),
                      labels=c("Use regulation (only)", 
                               "Use global memory (only)", 
                               "Use neither", 
                               "Use both")) +
  facet_wrap(~condition) +
  theme(legend.position = "bottom") +
  ggsave("./imgs/repeated-signal-strategies.png",
         width=8,
         height=4)

We can see that in conditions where organisms have access to regulation, evolved solutions rely exclusively on regulation for signal-response plasticity. In conditions where memory is the only mechanism for solving the repeated signal task, we see that all evolved solutions (only 1) rely exclusively on global memory access for signal-response plasticity.

Are evolved programs relying on stochastic strategies?

To confirm that evolved organisms are not relying on stochastic approaches to solve the repeated signal task, we tested the most fit individual from each replicate at the end of each run three times. If an organism’s phenotype was not identical across each of the three trials, we labeled is as using a stochastic strategy.

max_fit_org_data$stochastic <- factor(max_fit_org_data$stochastic,
                                      levels=c(0, 1))
ggplot(max_fit_org_data, aes(x=condition, fill=stochastic)) +
  geom_bar() +
  ggtitle("Stochastic Strategies?") +
  ylab("# Replicates") +
  ylim(0, replicates) +
  scale_fill_discrete(name="Strategy",
                       limits=c(0, 1),
                       labels=c("Deterministic", "Stochastic")) +
  scale_x_discrete(name="Condition",
                   breaks=c("regulation", "memory", "none", "both"),
                   labels=c("Regulation", "Global\nMemory", "Neither", "Both")) +
  facet_wrap(~ NUM_SIGNAL_RESPONSES, 
             labeller=labeller(NUM_SIGNAL_RESPONSES=label_lu))

We see no evidence of evolved organisms relying on stochastic strategies to solve the repeated signal task: all organisms responded consistently across trials. Note, this is unsurprising, as we did not give programs access to instructions capable of generating random values and ensured that the version of SignalGP virtual hardware used in this work operated in a deterministic manner.

What forms of genetic regulation do evolved digital organisms rely on?

We used two approaches to tease apart forms of genetic regulation that evolved SignalGP digital organisms rely on:

We traced program execution step-by-step (including each function’s regulatory state) during evaluation on the repeated signal task and extracted regulatory interactions between executing functions as a directed graph. We draw a directed edge from function A to function B if B’s regulatory state changes while A is executing. We label each edge as up- or down-regulation.
We independently knockout up-regulation and down-regulation and record the fitness of knockout-variants. If fitness decreases when a target functionality is knocked out, we categorize the program as relying on that functionality.

The knockout data more directly indicates which forms of regulation a digital organism relies on, as the gene regulation networks may include neutral and non-adaptive regulatory interactions.

Gene regulatory network edges

Below we give the distribution of edge types found in all evolved digital organisms (from both the regulation-augmented treatment and regulation-only control) that solve the repeated signal task using genetic regulation.

# The regulatory network data is keyed by run id (or the run's random number seed)
temp <- filter(max_fit_org_data, 
               solution=="1" & (condition=="both" | condition=="regulation"))
# Make a lookup function to get each run's environment complexity level.
get_num_sig_resps <- function(seed) {
  return(filter(temp, SEED==seed)$NUM_SIGNAL_RESPONSES)
}

# Process/cleanup the network data
melted <- melt(filter(reg_network_data, 
                      run_id %in% temp$SEED), 
                variable.name = "reg_edge_type",
                value.name = "reg_edges_cnt",
                measure.vars=c("repressed_edges_cnt", "promoted_edges_cnt"))

melted$NUM_SIGNAL_RESPONSES <- mapply(get_num_sig_resps, 
                                      melted$run_id)
melted$NUM_SIGNAL_RESPONSES <- factor(melted$NUM_SIGNAL_RESPONSES)

ggplot(melted, aes(x=NUM_SIGNAL_RESPONSES, y=reg_edges_cnt, color=reg_edge_type)) +
  geom_boxplot() +
  xlab("Environmental Complexity") +
  ylab("# Edges") +
  scale_color_discrete(name="Edge type:", 
                       limits=c("repressed_edges_cnt", "promoted_edges_cnt"),
                       labels=c("Repressing edges", "Promoting edges")) +
  theme(legend.position="bottom",
        legend.text=element_text(size=9),
        legend.title=element_text(size=10),
        axis.title.x=element_text(size=12)) +
  ggsave("./imgs/repeated-signal-regulation-edges.png", width=4, height=3)

for (env in env_complexities) {
  print(paste("Environment", env))
  wt <- wilcox.test(formula=reg_edges_cnt ~ reg_edge_type, data=filter(melted, NUM_SIGNAL_RESPONSES==env), exact=FALSE, conf.int=TRUE)
  print(wt)
}

## [1] "Environment 2"
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  reg_edges_cnt by reg_edge_type
## W = 26202, p-value = 4.99e-08
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  0.9999591 1.0000792
## sample estimates:
## difference in location 
##               0.999981 
## 
## [1] "Environment 4"
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  reg_edges_cnt by reg_edge_type
## W = 31786, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  2.000072 3.000052
## sample estimates:
## difference in location 
##               2.999982 
## 
## [1] "Environment 8"
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  reg_edges_cnt by reg_edge_type
## W = 38308, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  6.000056 6.999951
## sample estimates:
## difference in location 
##               6.000049 
## 
## [1] "Environment 16"
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  reg_edges_cnt by reg_edge_type
## W = 7569, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  14.00004 14.99996
## sample estimates:
## difference in location 
##               14.99996

The difference between the number of up- and down-regulation edges is statistically significant for each environmental complexity, and the magnitude of the difference increases as the number of repeated signals increases.

Knockout experiments

How successful programs rely on:

neither up- nor down-regulation?
either up- or down-regulation interchangeably?
only on down-regulation?
only on up-regulation?

# Limit the genotypes we're looking at to just solutions from the 'both' and 'regulation' conditions.
regulation_orgs <- filter(max_fit_org_data, 
                          solution=="1" & 
                            (condition=="both" | condition=="regulation"))

Note that there are 686 total organisms represented in the graphs below.

# Data processing/clean up
get_reg_relies_on <- function(uses_down, uses_up, uses_reg) {
  if        (uses_down == "0" && uses_up == "0" && uses_reg == "0") {
    return("neither")
  } else if (uses_down == "0" && uses_up == "0" && uses_reg == "1") {
    return("either")
  } else if (uses_down == "0" && uses_up == "1") {
    return("up-regulation-only")
  } else if (uses_down == "1" && uses_up == "0") {
    return("down-regulation-only")
  } else if (uses_down == "1" && uses_up == "1") {
    return("up-and-down-regulation")
  } else {
    return("UNKNOWN")
  }
}

regulation_orgs$regulation_type_usage <- 
  mapply(get_reg_relies_on, 
         regulation_orgs$relies_on_down_reg,
         regulation_orgs$relies_on_up_reg,
         regulation_orgs$relies_on_regulation)

regulation_orgs$regulation_type_usage <-
  factor(regulation_orgs$regulation_type_usage, 
         levels=c("neither", 
                  "either",
                  "up-regulation-only", 
                  "down-regulation-only", 
                  "up-and-down-regulation"))

ggplot(regulation_orgs, aes(x=regulation_type_usage, fill=regulation_type_usage)) +
  geom_bar() +
  geom_text(stat="count", 
          aes(label=..count..), 
          position=position_dodge(0.9), vjust=0) +
  ylim(0, 2*replicates) +
  scale_x_discrete(name="Regulation Usage",
                   limits=c("neither", 
                            "either",
                            "up-regulation-only", 
                            "down-regulation-only", 
                            "up-and-down-regulation"),
                   labels=c("None", 
                            "Either",
                            "Up\n(only)", 
                            "Down\n(only)", 
                            "Both")) +
  scale_fill_discrete(name="Regulation Usage",
                      limits=c("neither", 
                               "either",
                               "up-regulation-only", 
                               "down-regulation-only", 
                               "up-and-down-regulation"),
                      labels=c("None", 
                               "Either",
                               "Up\n(only)", 
                               "Down\n(only)", 
                               "Both")) +
  facet_wrap(~NUM_SIGNAL_RESPONSES) +
  theme(legend.position="none") +
  ggtitle("Regulation usage by environment") +
  ggsave("./imgs/rst-reg-usage-by-env.png", width=8, height=6)

ggplot(regulation_orgs, aes(x=regulation_type_usage, fill=regulation_type_usage)) +
  geom_bar() +
  geom_text(stat="count",
            aes(label=..count..), 
            position=position_dodge(0.9), vjust=0) +
  ylim(0, 2*replicates*length(env_complexities)) +
  scale_x_discrete(name="Regulation Usage",
                   limits=c("neither",
                            "either",
                            "up-regulation-only", 
                            "down-regulation-only", 
                            "up-and-down-regulation"),
                   labels=c("None", 
                            "Either", 
                            "Up\n(only)", 
                            "Down\n(only)", 
                            "Both")) +
  scale_fill_discrete(name="Regulation Usage",
                      limits=c("neither", 
                               "either", 
                               "up-regulation-only", 
                               "down-regulation-only", 
                               "up-and-down-regulation"),
                      labels=c("None", 
                               "Up\n(only)", 
                               "Down\n(only)", 
                               "Both")) +
  theme(legend.position="none") +
  ggtitle("Regulation usage across all environments") +
  ggsave("./imgs/rst-reg-usage-total.png", width=8, height=6)

Overall (aggregating all environment levels), the majority of digital organisms rely exclusively on down-regulation, followed by organisms that rely on both up- and down-regulation. There were only four programs that relied exclusively on up-regulation and this was found only in the two-signal version of the repeated signal task.

Case study: visualizing regulation in an evolved digital organism

Let’s take a closer look at the behavioral/regulatory profile of a representative digital organism that solves the four-signal version of the repeated signal task.

trace_id <- 301003

Specifically, we’ll be looking at the solution evolved in run id 3.0100310^{5}.

# Extract relevant information about solution of interest.
org_info <- filter(max_fit_org_data, 
                   SEED==trace_id)
num_envs <- org_info$NUM_SIGNAL_RESPONSES
score <- org_info$score
condition <- org_info$condition
is_sol <- org_info$solution
num_modules <- org_info$num_modules

# Load trace file associated with this solution.
trace_file <- paste(data_dir, "traces/trace_update-10000_run-id-", trace_id, ".csv", sep="")
trace_data <- read.csv(trace_file, na.strings="NONE")
trace_data$similarity_score <- 1 - trace_data$match_score

# Data cleanup/summarizing
trace_data$triggered <- (trace_data$env_signal_closest_match == trace_data$module_id) & (trace_data$cpu_step == "0")
trace_data$is_running <- trace_data$is_running > 0 | trace_data$triggered | trace_data$is_cur_responding_function == "1"

# Extract which modules responded and when
response_time_steps <- levels(factor(filter(trace_data, is_cur_responding_function=="1")$time_step))
responses_by_env_update <- list()
for (t in response_time_steps) {
  env_update <- levels(factor(filter(trace_data, time_step==t)$env_cycle))
  if (env_update %in% names(responses_by_env_update)) {
    if (as.integer(t) > as.integer(responses_by_env_update[env_update])) {
      responses_by_env_update[env_update] = t
    } 
  } else {
    responses_by_env_update[env_update] = t
  }
}

# Build a list of modules that were triggered & those that responded to a signal
triggered_ids <- 
  levels(factor(filter(trace_data, triggered==TRUE)$module_id))

response_ids <- 
  levels(factor(filter(trace_data, is_cur_responding_function=="1")$module_id))

trace_data$is_ever_active <- 
  trace_data$is_ever_active=="1" | 
  trace_data$is_running | 
  trace_data$module_id %in% triggered_ids | 
  trace_data$module_id %in% response_ids

trace_data$is_cur_responding_function <- 
  trace_data$is_cur_responding_function=="1" &
  trace_data$time_step %in% responses_by_env_update

# function to categorize each regulatory state as promoted, neutral, or repressed
# remember, the regulatory states in our data file operate with tag DISTANCE in mind
# as opposed to tag similarity, so: promotion => reg < 0, repression => reg > 0
categorize_reg_state <- function(reg_state) {
  if (reg_state == 0) {
    return("neutral")
  } else if (reg_state < 0) {
    return("promoted")
  } else if (reg_state > 0) {
    return("repressed")
  } else {
    return("unknown")
  }
}
trace_data$regulator_state_simplified <- 
  mapply(categorize_reg_state, trace_data$regulator_state)

# Omit all in-active rows
# Extract only rows that correspond with modules that were active during evaluation.
active_data <- filter(trace_data, is_ever_active==TRUE)

# Do some work to have module ids appear in a nice order along axis.
active_module_ids <- levels(factor(active_data$module_id))
active_module_ids <- as.integer(active_module_ids)
module_id_map <- as.data.frame(active_module_ids)
module_id_map$order <- order(module_id_map$active_module_ids) - 1
get_module_x_pos <- function(module_id) {
  return(filter(module_id_map, active_module_ids==module_id)$order)
}
active_data$mod_id_x_pos <- mapply(get_module_x_pos, active_data$module_id)

Function regulation over time

First, let’s omit all non-active funcitons.

out_name <- paste("./imgs/case-study-trace-id-", 
                  trace_id, 
                  "-regulator-state-vertical.pdf", 
                  sep="")
reg_plot <- 
  ggplot(active_data, 
         aes(x=mod_id_x_pos, y=time_step, fill=regulator_state_simplified)) +
  scale_fill_viridis(name="Regulation:",
                     limits=c("promoted", 
                              "neutral", 
                              "repressed"),
                     labels=c("+", 
                              "\u00F8", 
                              "-"),
                     discrete=TRUE, direction=-1) +
  scale_x_discrete(name="Function ID",
                   limits=seq(0, length(active_module_ids)-1, 1),
                   labels=active_module_ids) +
  scale_y_discrete(name="Time Step",
                   limits=seq(0, 30, 5)) +
  # Background tile color
  geom_tile(color="white",
            size=0.2,
            width=1,
            height=1,
            alpha=0.75) +
  # Highlight actively running functions
  geom_tile(data=filter(active_data, is_running==TRUE | triggered==TRUE), 
            color="black", size=0.8, width=1, height=1) +
  # Environment delimiters
  geom_hline(yintercept=filter(active_data, cpu_step==0)$time_step - 0.5, 
             size=1.25, color="black") +
  # Draw points on triggered modules
  geom_point(data=filter(active_data, triggered==TRUE),
             shape=8, colour="black", fill="white", stroke=0.5, size=1.5,
             position=position_nudge(x = 0, y = 0.01)) +
  geom_point(data=filter(active_data, is_cur_responding_function==TRUE),
             shape=21, colour="black", fill="white", stroke=0.5, size=1.5,
             position=position_nudge(x = 0, y = 0.01)) +
  theme(legend.position = "top",
        legend.text = element_text(size=9),
        legend.title=element_text(size=8),
        axis.text.y = element_text(size=8),
        axis.title.y = element_text(size=8),
        axis.text.x = element_text(size=8),
        axis.title.x = element_text(size=8),
        plot.title = element_text(hjust = 0.5)) +
  ggsave(out_name, height=3.5, width=2.25)
reg_plot

For fun, we can look at the same visualization without omitting unexecuted functions.

reg_plot <- 
  ggplot(trace_data, 
         aes(x=module_id, y=time_step, fill=regulator_state_simplified)) +
  scale_fill_viridis(name="Regulation:",
                     limits=c("promoted", 
                              "neutral", 
                              "repressed"),
                     labels=c("+", 
                              "\u00F8", 
                              "-"),
                     discrete=TRUE, direction=-1) +
  xlab("Function ID") +
  scale_y_discrete(name="Time Step",
                   limits=seq(0, 30, 5)) +
  # Background tile color
  geom_tile(color="white",
            size=0.2,
            width=1,
            height=1,
            alpha=0.75) +
  # Highlight actively running functions
  geom_tile(data=filter(trace_data, is_running==TRUE | triggered==TRUE), 
            color="black", size=0.8, width=1, height=1) +
  # Environment delimiters
  geom_hline(yintercept=filter(trace_data, cpu_step==0)$time_step - 0.5, 
             size=1.25, color="black") +
  # Draw points on triggered modules
  geom_point(data=filter(trace_data, triggered==TRUE),
             shape=8, colour="black", fill="white", stroke=0.5, size=1.5,
             position=position_nudge(x = 0, y = 0.01)) +
  geom_point(data=filter(trace_data, is_cur_responding_function==TRUE),
             shape=21, colour="black", fill="white", stroke=0.5, size=1.5,
             position=position_nudge(x = 0, y = 0.01)) +
  theme(legend.position = "top",
        legend.text = element_text(size=9),
        legend.title=element_text(size=8),
        axis.text.y = element_text(size=8),
        axis.title.y = element_text(size=8),
        axis.text.x = element_text(size=8),
        axis.title.x = element_text(size=8),
        plot.title = element_text(hjust = 0.5))
reg_plot

Environmental signal tag-match score over time

Again, we’ll omit unexecuted functions.

out_name <- paste("./imgs/case-study-trace-id-", trace_id, "-similarity-score.pdf", sep="")
match_plot <- ggplot(active_data, 
                     aes(x=mod_id_x_pos, 
                         y=time_step, 
                         fill=similarity_score)) +
  scale_fill_viridis(option="plasma",
                     name="Score:  ",
                     limits=c(0, 1.0),
                     breaks=c(0, 0.25, 0.50, 0.75, 1.0),
                     labels=c("0%", "25%", "50%", "75%", "100%")) +
  scale_x_discrete(name="Function ID",
                   limits=seq(0, length(active_module_ids)-1, 1),
                   labels=active_module_ids) +
  scale_y_discrete(name="Time Step",
                   limits=seq(0, 30, 10)) +
  # Background
  geom_tile(color="white", 
            size=0.2, 
            width=1, 
            height=1) +
  # Module is-running highlights
  geom_tile(data=filter(active_data, is_running==TRUE | triggered==TRUE), 
            color="black", 
            width=1, 
            height=1, 
            size=0.8) +
  # Environment delimiters
  geom_hline(yintercept=filter(active_data, cpu_step==0)$time_step-0.5, size=1) +
  # Draw points on triggered modules
  geom_point(data=filter(active_data, triggered==TRUE),
             shape=8, colour="black", fill="white", stroke=0.5, size=1.5,
             position=position_nudge(x = 0, y = 0.01)) +
  geom_point(data=filter(active_data, is_cur_responding_function==TRUE),
             shape=21, colour="black", fill="white", stroke=0.5, size=1.5,
             position=position_nudge(x = 0, y = 0.01)) +
  theme(legend.position = "top",
        legend.text = element_text(size=8),
        axis.text.y = element_text(size=8),
        axis.text.x = element_text(size=8)) +
  guides(fill = guide_colourbar(barwidth = 10, barheight = 0.5)) +
  ggtitle("Function Match Scores") +
  ggsave(out_name, height=3, width=4)
match_plot

match_plot <- ggplot(trace_data, 
                     aes(x=module_id, 
                         y=time_step, 
                         fill=similarity_score)) +
  scale_fill_viridis(option="plasma",
                     name="Score:  ",
                     limits=c(0, 1.0),
                     breaks=c(0, 0.25, 0.50, 0.75, 1.0),
                     labels=c("0%", "25%", "50%", "75%", "100%")) +
  xlab("Function ID") +
  scale_y_discrete(name="Time Step",
                   limits=seq(0, 30, 10)) +
  # Background
  geom_tile(color="white", 
            size=0.2, 
            width=1, 
            height=1) +
  # Module is-running highlights
  geom_tile(data=filter(trace_data, is_running==TRUE | triggered==TRUE), 
            color="black", 
            width=1, 
            height=1, 
            size=0.8) +
  # Environment delimiters
  geom_hline(yintercept=filter(trace_data, cpu_step==0)$time_step-0.5, size=1) +
  # Draw points on triggered modules
  geom_point(data=filter(trace_data, triggered==TRUE),
             shape=8, colour="black", fill="white", stroke=0.5, size=1.5,
             position=position_nudge(x = 0, y = 0.01)) +
  geom_point(data=filter(trace_data, is_cur_responding_function==TRUE),
             shape=21, colour="black", fill="white", stroke=0.5, size=1.5,
             position=position_nudge(x = 0, y = 0.01)) +
  theme(legend.position = "top",
        legend.text = element_text(size=8),
        axis.text.y = element_text(size=8),
        axis.text.x = element_text(size=8)) +
  guides(fill = guide_colourbar(barwidth = 10, barheight = 0.5)) +
  ggtitle("Function Match Scores")
match_plot

Evolved regulatory network

We use the igraph package to draw this organism’s gene regulatory network.

# Networks!
graph_nodes_loc <- paste(data_dir, 
                         "igraphs/reg_graph_id-", trace_id, "_nodes.csv", sep="")
graph_edges_loc <- paste(data_dir,
                         "igraphs/reg_graph_id-", trace_id, "_edges.csv", sep="")
graph_nodes_data <- read.csv(graph_nodes_loc, na.strings="NONE")
graph_edges_data <- read.csv(graph_edges_loc, na.strings="NONE")

network <- graph_from_data_frame(d=graph_edges_data, 
                                 vertices=graph_nodes_data, 
                                 directed=TRUE)

# Setup edge styling
E(network)$color[E(network)$type == "promote"] <- "#FCE640"
E(network)$lty[E(network)$type == "promote"] <- 1
E(network)$color[E(network)$type == "repress"] <- "#441152"
E(network)$lty[E(network)$type == "repress"] <- 1

network_out_name <- paste("./imgs/case-study-id-", 
                          trace_id, 
                          "-network.svg", sep="")

draw_network <- function(net, write_out, out_name) {
  if (write_out) {
    svg(out_name, width=4,height=1.5)
    # bottom, left, top, right
    par(mar=c(0.2,0,1,0.5))
  }
  plot(net, 
     # main="Regulation Network",
     edge.arrow.size=0.4, 
     edge.arrow.width=0.75,
     edge.width=2,
     vertex.size=40,
     vertex.label.cex=0.65,
     curved=TRUE,
     vertex.color="grey99",
     vertex.label.color="black",
     vertex.label.family="sans",
     layout=layout.circle(net))
  legend(x = "bottomleft",      ## position, also takes x,y coordinates
         legend = c("Promoted", "Repressed"),
         pch = 19,              ## legend symbols see ?points
         col = c("#FCE640", "#441152"),
         bty = "n",
         border="black",
         xpd=TRUE,
         title = "Edges")
  if (write_out) {
    dev.flush()
    dev.off()
  }
}

draw_network(network, TRUE, network_out_name)

## quartz_off_screen 
##                 2

draw_network(network, FALSE, "")

We can see that this organism primarily relies on self-repression genes for signal-response plasticity.