Introduction
FigTree is one of my go-to tool for phylogenetic tree visualization, but I've found that I need more control over certain aspects of visualization. Recently, I've used ggtrees, an R package for phylogenetic tree visualization. The upsides are much more control over labels, node tips, and branch colors. The main downside is that nearly everything is done by node index--not a problem for small trees but can be cumbersome for large trees.
Goal
FigTree provides tree annotation at three levels: node, clade, and tips. My goal was to have these imported as annotations in the tree object.
Required libraries
library(tidyverse)
library(ggtree)
library(ape)
library(treeio)
The function: read_figtree()
read.beast()
, part of the treeio
package, does import node and clade annotations but not tips, so we use this as the base. The tip annotations are stored as a separate part of the FigTree nexus file.
Each tip annotation is recorded in the following format: TIP_LABEL1[&annotation1="value",!annotation2="value"] TIP_LABEL2[&annotation1="value",!annotation2="value"]
Quite frankly, I haven't done much rigorous testing to understand why some annotations begin with & or !; however, the overall crux is that annotations can be extracted from each line by string manipulation.
read_figtree = function(file) {
# Read in file as tree object
tree = read.beast(file)
# Empty list to fill with annotations
# This nested list is formatted as list(annotation1 = list(value1=c(tip_label1, tip_label2)))
# Example:
# list(species=list(Dog=c("id1", "id2"), Cat=c("id3")),
# owner=list(Bob=c("id1"), Alice=c("id2", "id3")))
taxa_annotations = list()
# Read file line-by-line.
# When the line is "taxlabels", tip annotations start on the next line
# and stop at ";" or "end;", where we'll break the loop
con = file(file, open="r")
annotations_started = FALSE
while (length(line <- readLines(con, n=1, warn=F)) > 0) {
line = gsub('\t', '', line)
line = gsub('\"', '', line)
if (line == "taxlabels") {
annotations_started = TRUE
}
else if (annotations_started) {
line = sub(pattern="[]]$", "", line) # Removes ending bracket
cols = unlist(strsplit(line, "[[]")) # Splits at start bracket into tip label and annotations
tip_label = cols[1]
annotations = unlist(strsplit(cols[2], ","))
# Loop through annotation strings, extracting the annotation type and value
for (annotation in annotations) {
annotation_cols = unlist(strsplit(annotation, "="))
annotation_type = gsub("[!&]", "", annotation_cols[1])
annotation_value = annotation_cols[2]
if (! annotation_type %in% names(taxa_annotations)) {
taxa_annotations[[annotation_type]] = list()
}
if (! annotation_value %in% names(taxa_annotations[[annotation_type]])) {
taxa_annotations[[annotation_type]][[annotation_value]] = c()
}
taxa_annotations[[annotation_type]][[annotation_value]] =
c(taxa_annotations[[annotation_type]][[annotation_value]], tip_label)
}
}
if (line == ";" || line == "end;") {
break
}
}
close(con)
# For each annotation type (species, owner, etc), we want to annotate the tree object using 'groupOTU()'
# However, 'groupOTU' needs node IDs, not tip labels.
# We can find the node IDs simply by matching the tip labels with the node ids in the tree object
tree.tibble = as_tibble(tree) # Tree object converted to data table
for (type in names(taxa_annotations)) {
# Initialize list in format of ( annotationValueA = c(nodeID1, nodeID2), annotationValueB = ... )
id_annotations = list()
for (value in names(taxa_annotations[[type]])) {
id_annotations[[value]] = tree.tibble$node[tree.tibble$label %in% taxa_annotations[[type]][[value]]]
}
tree = groupOTU(tree, id_annotations, type)
}
return(tree)
}