This post is meant to serve as a living bestiary for nifty tools and resources that I use (in practice or aspirationally) to compose supplemental material for academic papers (or really whatever I might want supplemental material for).

Apparently, I’m just really into supplemental material; this is the second blog post on ❤️supplemental material❤️ that I’ve scraped together, and blog posts aren’t really my thing!

Most often, I see supplemental material provided as a pdf file with links to code and/or data hosted somewhere on the internet (or worse, pasted directly into the file 😱). PDFs just don’t cut it for me. The pdf is not an expressive enough medium 🎨 to hold all of the miscellaneous artifacts that should go along with a publication.

Who is this post for?

  • Mostly future me
  • Anyone looking for inspiration for building beautiful supplemental material

Disclaimer: this bestiary isn’t meant to be exhaustive! Because my primary audience is future me, I’m only adding tools/services that I regularly use. There’s lots of awesome stuff out there that you might not find here. For example, You’ll notice that I don’t include jupyter notebooks at the moment; those are great, I just don’t use them very often because I do most of my analyses with R + R Markdown.

☎️ I’m always curious about other folks’ workflows and favorite tools for putting together manuscripts/supplemental material! Reach out, send me suggestions!

Git + GitHub

backbone versioning code data

Git repositories are the backbone of my entire workflow for every project. Git is your friendly neighborhood version control system. Track (and show) the history of your project, tag versions to easily access older versions of your project, make collaborating with other folks on your source code so much easier. Want to know when (and maybe why) exactly you added that line of code? Git blame is your friend!

GitHub gives your git repository a really nice home; there are other services out there that give git repositories a home in the cloud, but nothing has pulled me away from GitHub. I won’t list all the advantages/disadvantages of these tools here; google (or duckduckgo) around and you’ll find an abundance of relevant blog posts/tech articles.

In some ways, a comprehensive and well managed GitHub repository is the ultimate supplemental material for any publication on its own; folks can access your project’s history, file issues (e.g., to ask you questions), reproduce your work, or fork and extend your project. I try to put almost everything associated with a paper inside of a designated GitHub repository (minus large amounts of data and usually not my LaTex source files). Anything that I don’t include (like large data files), I am sure to link to in the repository’s README. All of this allows me use a paper’s GitHub repository as a one-stop-shop for all of that paper’s supplemental material. Even better, services like Zenodo allow you to easily cite your repository by assigning it a permanent DOI (check out this guide for setting that up).

Tips

  • 🛑, I don’t recommend storing large data files on GitHub. There are better services for storing data that you can link to from your repo!
  • README files aren’t just for your root directory! Add README.md files to important directories in your repository. If someone clicks into a directory with a README file on GitHub, the contents of the readme are automatically rendered below the directory listing. This is a great way to give folks a quick and convenient guide to the directory’s contents.

Resources

Zenodo

doi

Zenodo is a nifty service that you can use to attach a DOI to a GitHub repository (and to other things). Zenodo makes is super easy to cite GitHub repositories in a paper (see this guide).

Tips

  • For bibtex users, Zenodo will spit out the citation for your repo in bibtex format!
    • BUT, you should always sanity check the reference in your compiled document. Depending on your bibliography format, it may leave important things off like the actual DOI or the url. I can often get around this by adding a note field to the bibtex entry (e.g., note={DOI: xxxx}).
    • Also, if you’re using the DOI’d repository as your supplemental material, consider updating the title field of your bibtex entry to make that obvious because the default title is just the name of the repository on GitHub.

Repository badges

fun informative

Sticking badges on a repository (or any document, really) is just good fun. And useful, too! You often find badges at the top of README files on GitHub repositories. For example,

readme-badges-example

Badges can give you an at-a-glance overview of the state of a repository. Often folks will report whether or not their code is passing tests, testing coverage, et cetera. Even more directly relevant to supplemental material, you can grab a badge from Zenodo that will report your repository’s DOI and link to the associated Zenodo page. Create custom badges to direct folks to where your data are hosted or where they can download your paper.

Places to generate badges

  • https://shields.io/
    • I most commonly use this one. Lots of fantastic options to choose from, and you can generate custom-badge-badge
  • Lot’s of services that you can hook into your GitHub repository (e.g., code coverage, continuous integration, etc) will provide a badge that indicates the status of your repository on their service.

GitHub Pages

accessibility

What’s way cooler than pdf-bound supplemental material? Web-enabled supplemental material!

GitHub pages is a minimal effort (and free!) way to generate and host a static website directly from a GitHub repository. Generating a website for a repository is actually as easy as toggling a switch in your repository’s settings. If you do nothing but flip on the GH pages switch and write a decent readme, you’ll have solid landing page for paper (here’s one I did when I was first figuring out github pages).

By default, GH pages compiles your site using Jekyll, which is really flexible, fairly easy to use, and has a huge community using it. You don’t have to use jekyll; you can use whatever you’d like (e.g., bookdown!).

GH pages really shines for supplemental material in combination with other tools, like R markdown (e.g., R analyses => pretty HTML files) or nbconvert for jupyter notebooks (e.g., python notebook analyses => pretty HTML files). For example, instead of pointing readers to your raw source code, point them to a nicely formatted HTML file generated from your analysis code that weaves readable explanations together with code and output (e.g., stats, visualizations, etc.).

Tips

  • Separate the main branch from the pages branch (gh-pages by default).
    • You can use another service (e.g., github action) to push/deploy your site to the gh-pages anytime your push to the main branch if the changes pass automated testing. This way, your supplemental material will stay in lock-step with working versions of your project 😉
  • GitHub pages is also great for personal websites (e.g., this website, lab websites, conference workshop websites, _et cetera!

Resources

Open Science Framework (OSF)

code data doi

OSF lets you create repositories (which will get a citable DOI) that you can put stuff in (e.g., code, data, documentation, etc.). OSF repositories have all sorts of integrations that let you associate disparate backend components together in a sustainable/citable way.

I’ve primarily used OSF repositories as a place to dump compressed data files (instead of dealing with git/github’s large file storage system). I’ll also link my OSF repository with my project’s associated GitHub repository (and whatever other integrations make sense).

In my aspirational supplemental material, I like to include a ‘Data Availability’ section that links to/cites the OSF repository that holds my experiment data.

Tips

  • osfclient is a nifty python library and command-line tool for uploading files to and downloading files from your OSF projects.
    • e.g., use this to have your experiment software automatically upload data when your experiment finishes, or use it to write convenient scripts that download/extract/organize your data for anyone interested in playing with it (including future you!).

Docker containers

reproducibility

Dockerfiles (and the images/containers they generate) give you a way of fully specifying the requisite development environment for compiling/running your computational experiments. Nüst et al. make a great case for using Docker for reproducible science in their 10 simple rules paper.

Docker Hub is a great place to host your containers. Plus, Docker Hub repositories can be linked with GitHub repositories. You can stick your Dockerfile in your GitHub repo, and Docker Hub will watch for new commits and build your Dockerfile.

Tips

  • Grab a badge from
    • to link to docker hub repo DockerHub link
    • to indicate build status Docker Cloud Build Status
  • When you build your docker image locally (e.g., docker build .), the build process will drop intermediate images as you go (for each RUN directive). If something fails during the build, you can always hop into the last successful image to investigate 🔍.
    • if your build fails, you can use docker image ls -a to see all of the recently created intermediate images
    • once you pick an image to spin up, you can run it interactively docker run -it the_image_id
    • docker system prune is your friend while you debug your dockerfile locally!
  • Pinning versions
    • Great tip from Matthew: you can build it without the pins, then open the container and run apt policy packagename for the things you want to pin
  • Lot’s of great tips here: Ten simple rules for writing Dockerfiles for reproducible data science

Resources

R Markdown

analysis visualization accessibility

R Markdown lets you interweave your R code, visualizations, and explanatory text all together in one document. Then, use knitr to ‘knit’ it all into a single HTML file or pdf (or both!). Output your R Markdown as an HTML page, push it to your GitHub repository, turn on GitHub pages, and ✨! You’ve got a nicely formatted web page with all of your data analyses!

RStudio (which I totally recommend to anyone starting out with R coding) makes this workflow really easy. Just pop open your .Rmd file, and mash the Knit button.

knit

Check out Yihui Xie’s R Markdown: Definitive Guide to see what’s possible with R Markdown (spoiler: a whole bunch of awesome stuff is possible).

Tips

  • Add a table of contents (toc:true) to your output to make it easier to jump around your document
    • e.g. add toc options to your R markdown file’s front matter,
      ---
      title: "My fun analyses with a table of contents"
      output:
        html_document:
          toc: true
          toc_float: true
          toc_depth: 4
      ---
      
  • If you want to go wild and interweave Python and R code, you can use the reticulate package!

Resources

Bookdown

accessibility slick

Bookdown is an R package that can bundle multiple R Markdown (and vanilla markdown) documents into a cohesive and slick ebook. My first exposure to bookdown in the wild was Claus Wilke’s fantastic Fundamentals of Data Visualization, which was written entirely in R Markdown and compiled using bookdown.

I like using bookdown to tie together all of my supplemental material into a single, nifty ebook (e.g., https://lalejini.com/Tag-based-Genetic-Regulation-for-LinearGP/supplemental/).

I highly recommend starting with Yihui Xie’s bookdown: Authoring Books and Technical Documents with R Markdown, which itself is compiled using bookdown! I also recommend playing around with Yihui Xie’s bookdown demo to get a feel for using bookdown.

Resources