This document is a companion to our submission to the 2020 Artificial Life conference, ‘Genetic regulation facilitates the evolution of signal-response plasticity in digital organisms’. Here, we give an overview of the directional signal diagnostic task, and we provide our data analyses for related experiments. All of our source code for statistical analyses and data visualizations is embedded in this document. Please file an issue or make a pull request on github to report any mistakes or request more explanation.

Overview

# Experimental parameters referenced in-text all in one convenient place.
time_steps <- 128
replicates <- 100
population_size <- 1000
generations <- 10000
env_complexities <- c(4)

The directional signal diagnostic task is a more challenging version of the repeated signal diagnostic. The directional signal task requires organisms to not only track that the environment changed, but also respond to how the environment changed. Specifically, we added a second environmental signal that shifts the required response backward to the ‘previous’ response. For example, if response-three is currently required then a forward-signal would indicate that response-four is required next, while a backward-signal would instead indicate that a response-two is required next. In contrast to the repeated signal task where there is one possible sequence of environmental transitions, there are \(2^K\) possible environmental sequences where \(K\) is the number of times the environment changes. We expect that the simple regulation that evolved to solve the original repeated signal task is insufficient to cope with the branching nature of possible response sequences in the directional signal task.

We intend to use this task to explore whether a more challenging task (the directional signal task) promotes the evolution of more complex regulatory networks than a simpler task (the repeated signal task).

We evolved 100 replicate populations of 1000 SignalGP digital organisms for 10^{4} generations on the directional signal task where \(K\) (the number of possible responses and the number of times the environment changes) is equal to four. Thus, 16 response sequences are possible.

This document looks only at the digital organisms evolved in the directional signal task. For comparisons of directional signal regulatory networks and repeated signal regulatory networks, see ./regulatory-network-comparison.html.

Analysis Dependencies

library(ggplot2)   # (Wickham, 2016)
library(tidyr)     # (Wickham and Henry, 2020)
library(dplyr)     # (Wickham et al., 2020)
library(reshape2)  # (Wickham, 2007)
library(cowplot)   # (Wilke, 2019)
library(viridis)   # (Garnier, 2018)
library(igraph)    # (Csardi and Nepusz, 2006)
library(patchwork) # (Pederson, 2020)

These analyses were conducted in the following computing environment:

print(version)
##                _                           
## platform       x86_64-apple-darwin15.6.0   
## arch           x86_64                      
## os             darwin15.6.0                
## system         x86_64, darwin15.6.0        
## status                                     
## major          3                           
## minor          6.2                         
## year           2019                        
## month          12                          
## day            12                          
## svn rev        77560                       
## language       R                           
## version.string R version 3.6.2 (2019-12-12)
## nickname       Dark and Stormy Night

Setup

Load data, initial data cleanup, configure some global settings.

data_dir <- "../data/balanced-reg-mult/reg-graph-comps/dir-sig/"
max_fit_orgs_loc <- paste(data_dir, "max_fit_orgs.csv", sep="")
data <- read.csv(max_fit_orgs_loc, na.strings="NONE")
data <- subset(data, select = -c(program))

# Specify factors (not all are relevant to these runs).
data$matchbin_thresh <-
  factor(data$matchbin_thresh,
         levels=c(0, 25, 50, 75))
data$NUM_ENV_UPDATES <-
  factor(data$NUM_ENV_UPDATES,
         levels=c(2, 4, 8, 16, 32))
data$NUM_ENV_STATES <-
  factor(data$NUM_ENV_STATES,
         levels=c(2, 4, 8, 16, 32))
data$TAG_LEN <-
  factor(data$TAG_LEN,
         levels=c(32, 64, 128))

# Define function to summarize regulation/memory configurations.
get_con <- function(reg, mem) {
  if (reg == "0" && mem == "0") {
    return("none")
  } else if (reg == "0" && mem=="1") {
    return("memory")
  } else if (reg=="1" && mem=="0") {
    return("regulation")
  } else if (reg=="1" && mem=="1") {
    return("both")
  } else {
    return("UNKNOWN")
  }
}

# Specify experimental condition for each datum.
data$condition <-
  mapply(get_con,
         data$USE_FUNC_REGULATION,
         data$USE_GLOBAL_MEMORY)
data$condition <-
  factor(data$condition,
         levels=c("regulation", "memory", "none", "both"))

# Settings for statistical analyses.
alpha <- 0.05
correction_method <- "bonferroni"

# Configure our default graphing theme
theme_set(theme_cowplot())

How many solutions evolve?

A solution is a digital organism capable of correctly responding to all environmental signals in all possible sequences of environmental sequences.

data$solution <- factor(data$solution, levels=c(0, 1))
ggplot(data, aes(x=solution, fill=solution)) +
  geom_bar() +
  geom_text(stat="count",
            mapping=aes(label=..count..),
            position=position_dodge(0.9), vjust=0) +
  scale_x_discrete(name="Solution evolved?",
                   breaks=c(0, 1),
                   labels=c("No", "Yes")) +
  ylim(0, 102) +
  ylab("# replicates") +
  ggtitle("Solution counts") +
  theme(legend.position="none") +
  ggsave("./imgs/directional-signal-solultion-cnts.png", width=8,height=3)

We see that all replicates (1) yielded a program capable of perfectly solving the directional signal task.

How long did it take for solutions to arise?

Not comparing this to anything, just curious.

sol_data <- filter(data, solution==1)
ggplot(sol_data, aes(y=update)) +
  geom_boxplot() +
  ggtitle("Time to solution") +
  ylab("Generation of first solution\n(log scale)") +
  scale_y_continuous(limits=c(0, 10000), breaks=c(0, 10, 100, 1000, 10000), trans="pseudo_log") +
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank())

Case study: visualizing regulation in an evolved digital organism

Let’s take a closer look at the behavioral/regulatory profile of a representative digital organism that solves the four-signal version of the repeated signal task.

# Set trace/test id's we want to use as case study
trace_id <- 4020
env_seq_id <- 0
generation <- generations

# Extract some organism information
org_info <- filter(data, SEED==trace_id)
num_env_states <- org_info$NUM_ENV_STATES
num_env_updates <- org_info$NUM_ENV_UPDATES
score <- org_info$aggregate_score
condition <- org_info$condition
is_sol <- org_info$solution

trace_file <- paste(data_dir, "traces/",
                    "trace_update-", generation,
                    "_run-id-", trace_id, ".csv", sep="")
trace_data <- read.csv(trace_file, na.strings="NONE")
trace_data$left_similarity_score <- 1 - trace_data$X0_match_score
trace_data$right_similarity_score <- 1 - trace_data$X1_match_score

trace_data$triggered <- (trace_data$is_match_cur_dir=="1") &
                        (trace_data$cpu_step == "0")

trace_data$is_running <- trace_data$is_running > 0 |
                         trace_data$triggered |
                         trace_data$is_cur_responding_module == "1"

# Extract only trace information for this environment sequence.
test_data <- filter(trace_data, test_id==env_seq_id)

# correct response ids
response_time_steps <-
  levels(factor(filter(test_data, is_cur_responding_module=="1")$time_step))
responses_by_env_update <- list()
for (t in response_time_steps) {
  env_update <- levels(factor(filter(test_data, time_step==t)$env_update))
  if (env_update %in% names(responses_by_env_update)) {
    if (as.integer(t) > as.integer(responses_by_env_update[env_update])) {
      responses_by_env_update[env_update] = t
    }
  } else {
    responses_by_env_update[env_update] = t
  }
}

# Build a list of modules that were triggered
triggered_ids <-
  levels(factor(filter(test_data, triggered==TRUE)$module_id))
response_ids <-
  levels(factor(filter(test_data, is_cur_responding_module=="1")$module_id))

test_data$is_ever_active <- test_data$is_ever_active=="1" |
                              test_data$is_running |
                              test_data$module_id %in% triggered_ids |
                              test_data$module_id %in% response_ids

test_data$is_cur_responding_module <- test_data$is_cur_responding_module=="1" &
                                        test_data$time_step %in% responses_by_env_update

categorize_reg_state <- function(reg_state) {
  if (reg_state == 0) {
    return("neutral")
  } else if (reg_state < 0) {
    return("promoted")
  } else if (reg_state > 0) {
    return("repressed")
  } else {
    return("unknown")
  }
}
test_data$regulator_state_simplified <-
  mapply(categorize_reg_state,
         test_data$regulator_state)

# Filter down to only rows that correspond with modules that were active during evaluation.
test_data <- filter(test_data, is_ever_active==TRUE)

# Do some work to have module ids appear in a nice order along axis.
active_module_ids <- levels(factor(test_data$module_id))
active_module_ids <- as.integer(active_module_ids)
module_id_map <- as.data.frame(active_module_ids)
module_id_map$order <- order(module_id_map$active_module_ids) - 1
get_module_x_pos <- function(module_id) {
  return(filter(module_id_map, active_module_ids==module_id)$order)
}
test_data$mod_id_x_pos <- mapply(get_module_x_pos, test_data$module_id)

Function regulation over time

# Plot!
max_y <- 50
out_name <-
  paste("./imgs/case-study-trace-id-", trace_id, "-env-id-", env_seq_id, "-regulator-state.pdf", sep="")
reg_state_plot <-
  ggplot(test_data, aes(x=mod_id_x_pos,
                        y=time_step,
                        fill=regulator_state_simplified)) +
  scale_fill_viridis(name="State:",
                     limits=c("promoted",
                              "neutral",
                              "repressed"),
                     labels=c("Promoted",
                              "Neutral",
                              "Repressed"),
                     discrete=TRUE,
                     direction=-1) +
  scale_x_discrete(name="Function ID",
                   limits=seq(0, length(active_module_ids)-1, 1),
                   labels=active_module_ids) +
  scale_y_discrete(name="Time Step",
                     limits=seq(0, max_y, 10)) +
  # Background
  geom_tile(color="white",
            size=0.2,
            width=1,
            height=1,
            alpha=0.75) +
  # Module is-running highlights
  geom_tile(data=filter(test_data, is_running==TRUE),
            color="black",
            width=1,
            height=1,
            size=0.25) +
  # Environment delimiters
  geom_hline(yintercept=filter(test_data, cpu_step==0)$time_step - 0.5,
             size=0.5) +
  # Draw points on triggered modules
  geom_point(data=filter(test_data, triggered==TRUE),
             shape=8, colour="black", fill="white", stroke=0.25, size=0.5,
             position=position_nudge(x = 0, y = 0.01)) +
  geom_point(data=filter(test_data, is_cur_responding_module==TRUE),
             shape=21, colour="black", fill="white", stroke=0.25, size=0.75,
             position=position_nudge(x = 0, y = 0.01)) +
  theme(#legend.position = "none",
        legend.text = element_text(size=9),
        legend.title=element_text(size=10),
        axis.text.y = element_text(size=8),
        # axis.title.y = element_blank(),
        axis.title.y = element_text(size=10),
        axis.text.x = element_text(size=8),
        axis.title.x = element_text(size=10),
        plot.title = element_text(hjust = 0.5)) +
  # ggtitle("Regulation Over Time") +
  ggsave(out_name, height=4, width=2.75)
reg_state_plot

Environment signal tag-match scores over time

Forward-signal tag similarity

left_sim_plot <-
  ggplot(test_data, aes(x=mod_id_x_pos,
                        y=time_step,
                        fill=left_similarity_score)) +
  scale_fill_viridis(option="plasma",
                     name="Score:  ",
                     limits=c(0, 1.0),
                     breaks=c(0, 0.25, 0.50, 0.75, 1.0),
                     labels=c("0%", "25%", "50%", "75%", "100%")) +
  scale_x_discrete(name="Function ID",
                   limits=seq(0, length(active_module_ids)-1, 1),
                   labels=active_module_ids) +
  ylab("Time step") +
  # Background
  geom_tile(color="white",
            size=0.2,
            width=1,
            height=1) +
  # Module is-running highlights
  geom_tile(data=filter(test_data, is_running==TRUE | triggered==TRUE),
            color="black",
            width=1,
            height=1,
            size=0.8) +
  # Environment delimiters
  geom_hline(yintercept=filter(test_data, cpu_step==0)$time_step-0.5, size=1) +
  # Draw points on triggered modules
  geom_point(data=filter(test_data, triggered==TRUE),
             shape=21, colour="black", fill="white", stroke=0.33, size=0.75,
             position=position_nudge(x = 0, y = 0.01)) +
  geom_point(data=filter(test_data, is_cur_responding_module==TRUE),
             shape=21, colour="white", fill="tomato", stroke=0.33, size=0.75,
             position=position_nudge(x = 0, y = 0.01)) +
  theme(legend.position = "left",
        legend.text = element_text(size=10),
        axis.text.y = element_text(size=10),
        axis.text.x = element_text(size=10)) +
  # guides(fill = guide_colourbar(barwidth = 10, barheight = 0.5)) +
  ggtitle("Backward-signal\ntag-match score")

right_sim_plot <-
    ggplot(test_data,
           aes(x=mod_id_x_pos,
               y=time_step,
               fill=right_similarity_score)) +
      scale_fill_viridis(option="plasma",
                         name="Score:  ",
                         limits=c(0, 1.0),
                         breaks=c(0, 0.25, 0.50, 0.75, 1.0),
                         labels=c("0%", "25%", "50%", "75%", "100%")) +
      scale_x_discrete(name="Function ID",
                       limits=seq(0, length(active_module_ids)-1, 1),
                       labels=active_module_ids) +
      # Background
      geom_tile(color="white",
                size=0.2,
                width=1,
                height=1) +
      # Module is-running highlights
      geom_tile(data=filter(test_data, is_running==TRUE | triggered==TRUE),
                color="black",
                width=1,
                height=1,
                size=0.8) +
      # Environment delimiters
      geom_hline(yintercept=filter(test_data, cpu_step==0)$time_step-0.5, size=1) +
      # Draw points on triggered modules
      geom_point(data=filter(test_data, triggered==TRUE),
                 shape=21, colour="black", fill="white", stroke=0.33, size=0.75,
                 position=position_nudge(x = 0, y = 0.01)) +
      geom_point(data=filter(test_data, is_cur_responding_module==TRUE),
                 shape=21, colour="white", fill="tomato", stroke=0.33, size=0.75,
                 position=position_nudge(x = 0, y = 0.01)) +
      theme(legend.position = "none",
            legend.text = element_text(size=10),
            axis.text.y = element_blank(),
            axis.title.y = element_blank(),
            axis.text.x = element_text(size=10)) +
      # guides(fill = guide_colourbar(barwidth = 10, barheight = 0.5)) +
      ggtitle("Forward-signal\ntag-match score")

left_sim_plot | right_sim_plot

Evolved regulatory network

We use the igraph package to draw this organism’s gene regulatory network as a directed graph.

graph_nodes_loc <-
  paste(data_dir, "igraphs/reg_graph_id-", trace_id,"_env-", env_seq_id, "_nodes.csv", sep="")
graph_edges_loc <-
  paste(data_dir, "igraphs/reg_graph_id-", trace_id,"_env-", env_seq_id, "_edges.csv", sep="")
graph_nodes_data <- read.csv(graph_nodes_loc, na.strings="NONE")
graph_edges_data <- read.csv(graph_edges_loc, na.strings="NONE")

network <- graph_from_data_frame(d=graph_edges_data, vertices=graph_nodes_data, directed=TRUE)

E(network)$color[E(network)$type == "promote"] <- "#FCE640"
E(network)$lty[E(network)$type == "promote"] <- 1
E(network)$color[E(network)$type == "repress"] <- "#441152"
E(network)$lty[E(network)$type == "repress"] <- 1


# vertex_order <- as.character(c(0,  9,  10, 14, 1, 11, 5, 2))
network_out_name <- paste("./imgs/case-study-id-",
                          trace_id,
                          "-env-", env_seq_id, "-network.svg", sep="")

draw_network <- function(net, write_out, out_name, layout_order) {
  if (write_out) {
    svg(out_name, width=4,height=3)
    # bottom, left, top, right
    par(mar=c(0.2,0,1,0.5))
  }
  plot(net,
     edge.arrow.size=0.4,
     edge.arrow.width=0.75,
     edge.width=2,
     vertex.size=30,
     vertex.label.cex=1.25,
     curved=TRUE,
     vertex.color="grey99",
     vertex.label.color="black",
     vertex.label.family="sans",
     layout=layout_in_circle(net,order=layout_order))
  legend(x = "bottomleft",      ## position, also takes x,y coordinates
         legend = c("Promoted", "Repressed"),
         pch = 19,              ## legend symbols see ?points
         col = c("#FCE640", "#441152"),
         bty = "n",
         border="black",
         xpd=TRUE,
         title = "Edges")
  if (write_out) {
    dev.flush()
    dev.off()
  }
}

draw_network_paper <- function(net, write_out, out_name, layout_order) {
  if (write_out) {
    svg(out_name, width=4,height=3)
    # bottom, left, top, right
    par(mar=c(0.2,0,1,0.5))
  }
  plot(net,
     edge.arrow.size=0.4,
     edge.arrow.width=0.75,
     edge.width=2,
     vertex.size=30,
     vertex.label.cex=1.25,
     curved=TRUE,
     vertex.color="grey99",
     vertex.label.color="black",
     vertex.label.family="sans",
     layout=layout_in_circle(net,order=layout_order))
  if (write_out) {
    dev.flush()
    dev.off()
  }
}

vertex_order <- as.character(c(4, 0, 10, 6, 5, 8, 1, 2))
draw_network_paper(network, TRUE, network_out_name, vertex_order)
## quartz_off_screen 
##                 2
draw_network(network, FALSE, network_out_name, vertex_order)