FigTree and ggtrees

Introduction

FigTree is one of my go-to tool for phylogenetic tree visualization, but I've found that I need more control over certain aspects of visualization. Recently, I've used ggtrees, an R package for phylogenetic tree visualization. The upsides are much more control over labels, node tips, and branch colors. The main downside is that nearly everything is done by node index--not a problem for small trees but can be cumbersome for large trees.

Goal

FigTree provides tree annotation at three levels: node, clade, and tips. My goal was to have these imported as annotations in the tree object.

Required libraries

library(tidyverse)
library(ggtree)
library(ape)
library(treeio)

The function: read_figtree()

read.beast(), part of the treeio package, does import node and clade annotations but not tips, so we use this as the base. The tip annotations are stored as a separate part of the FigTree nexus file.

Each tip annotation is recorded in the following format: TIP_LABEL1[&annotation1="value",!annotation2="value"] TIP_LABEL2[&annotation1="value",!annotation2="value"]

Quite frankly, I haven't done much rigorous testing to understand why some annotations begin with & or !; however, the overall crux is that annotations can be extracted from each line by string manipulation.

read_figtree = function(file) {

  # Read in file as tree object
  tree = read.beast(file)

  # Empty list to fill with annotations
  # This nested list is formatted as list(annotation1 = list(value1=c(tip_label1, tip_label2)))
  # Example: 
  #     list(species=list(Dog=c("id1", "id2"), Cat=c("id3")),
  #           owner=list(Bob=c("id1"), Alice=c("id2", "id3")))

  taxa_annotations = list()

  # Read file line-by-line.
  # When the line is "taxlabels", tip annotations start on the next line
  #   and stop at ";" or "end;", where we'll break the loop

  con = file(file, open="r")
  annotations_started = FALSE
  while (length(line <- readLines(con, n=1, warn=F)) > 0) {
    line = gsub('\t', '', line) 
    line = gsub('\"', '', line)

    if (line == "taxlabels") {
      annotations_started = TRUE
    }
    else if (annotations_started) {
      line = sub(pattern="[]]$", "", line) # Removes ending bracket
      cols = unlist(strsplit(line, "[[]")) # Splits at start bracket into tip label and annotations
      tip_label = cols[1]
      annotations = unlist(strsplit(cols[2], ","))

      # Loop through annotation strings, extracting the annotation type and value
      for (annotation in annotations) {
        annotation_cols = unlist(strsplit(annotation, "="))
        annotation_type = gsub("[!&]", "", annotation_cols[1])
        annotation_value = annotation_cols[2]

        if (! annotation_type %in% names(taxa_annotations)) {
          taxa_annotations[[annotation_type]] = list()
        }
        if (! annotation_value %in% names(taxa_annotations[[annotation_type]])) {
          taxa_annotations[[annotation_type]][[annotation_value]] = c()
        }

        taxa_annotations[[annotation_type]][[annotation_value]] = 
          c(taxa_annotations[[annotation_type]][[annotation_value]], tip_label)
      }
    }
    if (line == ";" || line == "end;") { 
      break
    }
  }
  close(con)

  # For each annotation type (species, owner, etc), we want to annotate the tree object using 'groupOTU()'
  # However, 'groupOTU' needs node IDs, not tip labels.
  # We can find the node IDs simply by matching the tip labels with the node ids in the tree object

  tree.tibble = as_tibble(tree) # Tree object converted to data table

  for (type in names(taxa_annotations)) {
    # Initialize list in format of ( annotationValueA = c(nodeID1, nodeID2), annotationValueB = ... )
    id_annotations = list() 
    for (value in names(taxa_annotations[[type]])) {
      id_annotations[[value]] = tree.tibble$node[tree.tibble$label %in% taxa_annotations[[type]][[value]]]
    }
    tree = groupOTU(tree, id_annotations, type)
  }

  return(tree)
}