The program developed (spyder; available at http://people.uleth.ca/~selibl/Spyder/Spyder.html) was designed in perl and uses the Needleman–Wunsch algorithm (Needleman & Wunsch, 1970) via dynamic programming (Cormen et al., 2009). A similarity matrix (Table 1) was used to generate a scoring matrix based on the alignment of the primer to target. The matrix, a slight modification of a generic scoring matrix for the four bases ‘A C T G’, takes into account the possibility of degenerate bases, which are often encountered
in sequence databases. Degeneracies were assigned scores (Table 1) such that the score of the degenerate base is the summation of the scores for each possible selleck compound combination between the degeneracy and its corresponding bases. For example, the degenerate base ‘H’ could be either base ‘A’, ‘C’ or ‘T’; therefore, the score for ‘H’ is the sum of scores for ‘A’, ‘C’ and ‘T’. The scoring matrix is used to assign scores for all positions in every possible alignment between the primer and the target. Each possible alignment is scored through a trace back of the scoring matrix and the optimal alignment selected (i.e. that with the highest score). Commonly used 16S rRNA gene primers (Table 2) were evaluated against sequences within the RDP database (Cole et al., 2009). Primer–target
regions were selected according to their approximate annealing position relative to Escherichia coli (GenBank Accession J01695) (Fig. 1). ATR inhibitor The antisense (−) strand was selected for forward primers and the sense (+) strand
for reverse primers. Regions were selected such that they ensured the coverage of the primer-binding site while maintaining maximal coverage of the database, which was ID-8 verified by retrospective analysis of the spyder output (instructions provided as Supporting Information, Appendix S1). The spyder output was analyzed manually for indels and substitutions. Those that were abundant relative to the number of available sequences for the searched region were noted and necessary degeneracies or modifications were completed. Updated primers were then reanalyzed using the RDP Probe Match service to determine the effect on target and nontarget sequences. Modified primers were checked using oligocalc (Kibbe, 2007) to ensure no decrease in primer quality (i.e. similar GC content, no self-complementarity, hairpins, or 3′- primer–primer complementarity). The spyder program was able to successfully process over 1 000 000 sequences in a matter of minutes using a relatively modest computer (Core 2 Duo processor at 2.2 GHz with 4 GB of RAM). Conducting such an analysis on an aligned 16S rRNA gene database is not practical due to the size of the current databases, which can easily exceed 100 MB. The primers analyzed matched between 48% and 97% of target sequences currently available with zero mismatches.