A genomic sequencing approach to study wood decay and copper tolerance in the brown rot fungus, Antrodia radiculosa
J D Tang, T Sonstegard, S Burgess, S V Diehl
We used Illumina paired-end short read sequencing (76 nt, 300 bp insert size) to produce a de novo assembly of the genome of Antrodia radiculosa, a copper-tolerant brown rot fungus that is capable of aggressive wood decay. Quality analysis of the base calls in the dataset (8.95 Gb) showed that the majority of the nucleotide sequence was of the highest quality with 5% in the lowest quality group. Low quality scores were cumulative as position in the read increased, indicating that once a read went bad, the rest of the read was also likely to be bad. Analysis of the DNA sequence showed that 1.4% of the bases called was ambiguous. These N's occurred as homopolymers up to 76 nt long. To assess how poor scores and N homopolymers affected the assembly, reads were filtered at increasingly stringent thresholds. Each dataset was then assembled at varying kmer lengths (k) using Velvet 0.7.55. The N50 metric, which describes the size distribution of contigs in the assembly, was then plotted against kmer. Results showed that each dataset was characterized by a max N50 value, but the lower the quality of the dataset, the more dramatic the fluctuation in N50. A second observation was that the larger dataset had greater max N50, despite the presence of the low quality data. The max N50 value of the semi-clean dataset (k = 45) was more than 2x greater than the max N50 value for the very clean dataset (k = 37). Using Genemark-ES v2, we predicted 8000 and 5700 genes from contigs >= 20 kb for these two max N50 assemblies, respectively. Average gene length, percent GC, intron size, number of introns, exon size, and CDS length were very similar for the two assemblies, suggesting that Velvet successfully removed error from the poor scores and the N homopolymers without sacrificing accuracy of the assembly. This work demonstrates that gene prediction from short read sequencing data of fungi is technically feasible and represents a significant step towards accelerating a genome-wide understanding of how brown rot fungi decay wood and tolerate high levels of copper.
Keywords: genomic sequencing, brown rot, wood decay, copper tolerance