Separate mapping and detection of discordant read-pairs to identify. Although there have been no other reports of phage tail inversion in PacBio assemblies to date, others have noted that a,7.5 kb “spurious contig” was produced in the assembly of the E. coli K-12 Wortmannin MG1655 genome. PacBio thus offers a novel solution for studying the mechanism of phage tail fibre switching, and more generally, for the function of DNA invertase and other site-specific recombinases. For example, the DNA invertase gene has been severely truncated in the Phi4 prophage, suggesting that the inversion observed in this study must have been mediated by another enzyme in trans, as has been previously reported. Notably, the Phi1 and Phi4 prophages encode near-identical 26 bp crossover sites at either end of their respective invertible segments, suggesting that the Phi1 DNA invertase may be capable of mediating inversion at heterologous sites within the Phi4 prophage. On a practical level, users should ensure that alternative allele contigs in PacBio assemblies are not integrated into the assembly of the main chromosome, which would lead to artefactual duplications in phage regions. Instead, we have annotated the EC958 chromosome to highlight the DNA invertase binding sites and invertible regions with misc_feature keys according to INSDC guidelines. We have also simplified the annotation of these regions to help avoid propagating genome-rot in E. coli genomes; for example, alternate phage tail gene 39 fragments that contain the Phage Tail Collar domain but lack the Phage Tail Repeat domains are often auto-annotated as “Phage tail repeat domain proteins” due to their similarity to their full-length homologs. For E. coli assemblies, it is relatively straight-forward to determine which contigs are alternate versions of inverted loci as opposed to truly independent contigs, by first aligning all contigs to each other during post-assembly using tools such as ACT. However, care must be taken to ensure that “recombination” is not due to adapter sequences. Due to the high error rates associated with raw PacBio reads, occasionally adapters on the ends of the SMRTbell construct are not correctly identified and removed. Failure to remove adapter sequences can result in chimeric subreads which consist of the insert sequence in the forward orientation followed by the adapter sequence and the insert sequence in the reverse orientation. Adapter sequences occur randomly within the reads and are removed during read correction but aberrant reads can be produced.