Onstruction, different conditions were tested, including varying read lengths (1st column: 36 bases, 2nd column: 75 bases, 3rd column: 150 bases), numbers of reads (1st row: 10,000, 2nd row: 20,000, 3rd row: 50,000), and sequencing error rates (grey: 0.01 , orange: 0.05 , blue: 0.1 per base). The y-axis reports reconstruction performance as the proportion close (Qq), defined as the fraction of reconstructed MC-LR haplotypes that have at most q mismatches with respect to the original clones. The genomic region considered codes for amino acids 10 to 93 of the HIV protease. doi:10.1371/journal.pone.0047046.gThe performance of local haplotype reconstruction is assessed in terms of the number and frequency of haplotypes detected and their identity to the original sequences. A perfect haplotype reconstruction algorithm would detect exactly ten sequences, identical to the original clones. On the non-PCR amplified sample, we detected 13 haplotypes with 454/Roche, five of which are identical to five of the original clones (Table 2). With the Illumina platform, we detected ten haplotypes, nine of which each match perfectly one of the original sequences. For the PCRamplified sample we reconstructed 30 and 10 haplotypes using 454/Roche and Illumina, respectively. In both cases, only six are perfectly matching one of the original clones. Both sensitivity (fraction of true haplotypes detected) and specificity (fraction of predicted haplotypes that are correct) are higher with the Illumina platform, for both the non-amplified and the amplified sample. This trend is also reflected in the per-clone frequency estimates compared with estimates obtained by direct mapping of the reads to the ten clones, which serve as the gold standard in this comparison (see Methods, Table 3). In particular, higher coverage and lower error rate, as achieved with Illumina, allowed for accurate calling of minor variants of frequencies as low as 0.2 that were not found in the 454/Roche data.Global haplotype reconstructionNon-overlapping reads mapping to different positions of the genome can be assembled together to infer viral haplotypes over longer regions, provided that some conditions hold: consecutive reads must overlap significantly and host enough mutations to allow their unambiguous pairing. The first condition can be enforced by increasing coverage, whereas the second depends only on the diversity of the population to be analyzed. The presence of sequencing errors is a confounding factor that complicates inference. Thus, global haplotype reconstruction is expected to be easier with longer reads, larger diversity, and lower sequencing error rate. We assessed reconstruction performance as the proportion close (Qq), which is the fraction of reconstructed haplotypes that have at most q mismatches with respect to the original ones. The accuracy of global haplotype reconstruction was considerably higher when reads were drawn from the original clones which have a mean Fexinidazole site distance of 7.5 (Figure 2), than from the second haplotype set of lower diversity with mean distance 1.9 (Figure 3). Low population diversity renders perfect reconstruction (Q0 = 1) impossible, and the best one can achieve is reconstructing 80 of the haplotypes without mismatch. Coverage did not affectViral Quasispecies ReconstructionFigure 3. Global haplotype reconstruction at low diversity. Same as Figure 2, but the mean distance between clones is 1.9 . doi:10.1371/journal.pone.0047046.gthe performance of quas.Onstruction, different conditions were tested, including varying read lengths (1st column: 36 bases, 2nd column: 75 bases, 3rd column: 150 bases), numbers of reads (1st row: 10,000, 2nd row: 20,000, 3rd row: 50,000), and sequencing error rates (grey: 0.01 , orange: 0.05 , blue: 0.1 per base). The y-axis reports reconstruction performance as the proportion close (Qq), defined as the fraction of reconstructed haplotypes that have at most q mismatches with respect to the original clones. The genomic region considered codes for amino acids 10 to 93 of the HIV protease. doi:10.1371/journal.pone.0047046.gThe performance of local haplotype reconstruction is assessed in terms of the number and frequency of haplotypes detected and their identity to the original sequences. A perfect haplotype reconstruction algorithm would detect exactly ten sequences, identical to the original clones. On the non-PCR amplified sample, we detected 13 haplotypes with 454/Roche, five of which are identical to five of the original clones (Table 2). With the Illumina platform, we detected ten haplotypes, nine of which each match perfectly one of the original sequences. For the PCRamplified sample we reconstructed 30 and 10 haplotypes using 454/Roche and Illumina, respectively. In both cases, only six are perfectly matching one of the original clones. Both sensitivity (fraction of true haplotypes detected) and specificity (fraction of predicted haplotypes that are correct) are higher with the Illumina platform, for both the non-amplified and the amplified sample. This trend is also reflected in the per-clone frequency estimates compared with estimates obtained by direct mapping of the reads to the ten clones, which serve as the gold standard in this comparison (see Methods, Table 3). In particular, higher coverage and lower error rate, as achieved with Illumina, allowed for accurate calling of minor variants of frequencies as low as 0.2 that were not found in the 454/Roche data.Global haplotype reconstructionNon-overlapping reads mapping to different positions of the genome can be assembled together to infer viral haplotypes over longer regions, provided that some conditions hold: consecutive reads must overlap significantly and host enough mutations to allow their unambiguous pairing. The first condition can be enforced by increasing coverage, whereas the second depends only on the diversity of the population to be analyzed. The presence of sequencing errors is a confounding factor that complicates inference. Thus, global haplotype reconstruction is expected to be easier with longer reads, larger diversity, and lower sequencing error rate. We assessed reconstruction performance as the proportion close (Qq), which is the fraction of reconstructed haplotypes that have at most q mismatches with respect to the original ones. The accuracy of global haplotype reconstruction was considerably higher when reads were drawn from the original clones which have a mean distance of 7.5 (Figure 2), than from the second haplotype set of lower diversity with mean distance 1.9 (Figure 3). Low population diversity renders perfect reconstruction (Q0 = 1) impossible, and the best one can achieve is reconstructing 80 of the haplotypes without mismatch. Coverage did not affectViral Quasispecies ReconstructionFigure 3. Global haplotype reconstruction at low diversity. Same as Figure 2, but the mean distance between clones is 1.9 . doi:10.1371/journal.pone.0047046.gthe performance of quas.