Some King the top 100 proteins identified in the first step of analysis rearrangements larger than about 3 Mb [12], and mapped all balanced breakpoints to gene level, but many unbalanced rearrangements had been mapped only to 1 Mb resolution. SNP6 array 10781694 data allowed us to map these unbalanced breakpoints more precisely, to around 10 kb resolution, and detect deletions of less than 3 Mb. Paired end sequencing data identified the junctions of around 40 percent of the known rearrangements to sequence level. Smaller-scale rearrangements, below the resolution of our previous analysis, were also apparent in the SNP6 array data?3 small deletions ranging from 0.26 kb to 2.3 Mb with a median size of 257 kb were predicted. There were also 24 small duplications ranging from 11.7 kb to 2.8 Mb, median size 320 kb. All of these duplications and deletions were absent in the matched normal lymphoblastoid cell line, HCC1187BL. Many of these features were likely to be small interstitial deletions or “head to tail” tandem duplications. Indeed, five of the 13 deletions and 17 of the 24 duplications were confirmed by structural variants detected by the paired-end sequencing [14]. We identified broken genes and possible gene fusions for all these additional structural changes (Tables S1 6 in File S2). (Paired end sequencing also uncovered further apparent structural variations that were below theresolution 16985061 of SNP6 segmentation [14]. These were not included in the present analysis, though we checked that they predicted no additional fusion genes). These structural rearrangements gave rise to at least twelve expressed fusion Title Loaded From File transcripts, confirmed by RT-PCR and Sanger sequencing: RGS22-SYCP1, CTAGE5-SIP1, PLXND1-TMCC1, SEC22B-NOTCH2, KLK5-CDH23, BC041478-EXOSC10, AGPAT5-MCPH1, SUSD1-ROD1/PTBP3, SGK1-SLC2A12, RHOJSYNE2, PUM1-TRERF1 and CTCF-SCUBE2, some of which have been reported previously ([12?5] and Table S4 in File S2). Of these twelve, the first four were predicted to form an in-frame fusion product.HCC1187 Endoreduplicated during its HistoryThe HCC1187 karyotype is hypotriploid and highly rearranged, like most breast cancers. The karyotype is highly likely to have evolved via successive chromosome loss, unbalanced translocation and endoreduplication, since this is the predominant pattern in breast tumors (Fig. 1) [17]. We therefore looked for signs of endoreduplication. The main evidence that endoreduplication had occurred was that a high proportion of the genome had been duplicated precisely once. To make this clearer, we worked out which chromosome segments derived from which parent by analysing how many copies of each genomic segment had the same alleles, using SNP array data (Fig. S1 in File S1). We were able to assign almost all chromosome segments in the karyotype to one or the other allelotype (Fig. 2). This showed that many chromosome segments were present in two copies of the same parental origin, and most of the remainder appear to have evolved from a pair of copies (Fig. 2B). For example, all segments of chromosomes 6 and 7 are present in two copies, while there are two complete copies of chromosome 16 derived from the same parent, one of which has been split by a balanced translocation.Inferring the Genome State before EndoreduplicationHaving clear evidence that endoreduplication had occurred in HCC1187, we were able to infer the state of the genome immediately before it doubled (Fig. 3). To do this, we assumed that the simplest possible sequence of events had happened, in particular that endoreduplication.Some rearrangements larger than about 3 Mb [12], and mapped all balanced breakpoints to gene level, but many unbalanced rearrangements had been mapped only to 1 Mb resolution. SNP6 array 10781694 data allowed us to map these unbalanced breakpoints more precisely, to around 10 kb resolution, and detect deletions of less than 3 Mb. Paired end sequencing data identified the junctions of around 40 percent of the known rearrangements to sequence level. Smaller-scale rearrangements, below the resolution of our previous analysis, were also apparent in the SNP6 array data?3 small deletions ranging from 0.26 kb to 2.3 Mb with a median size of 257 kb were predicted. There were also 24 small duplications ranging from 11.7 kb to 2.8 Mb, median size 320 kb. All of these duplications and deletions were absent in the matched normal lymphoblastoid cell line, HCC1187BL. Many of these features were likely to be small interstitial deletions or “head to tail” tandem duplications. Indeed, five of the 13 deletions and 17 of the 24 duplications were confirmed by structural variants detected by the paired-end sequencing [14]. We identified broken genes and possible gene fusions for all these additional structural changes (Tables S1 6 in File S2). (Paired end sequencing also uncovered further apparent structural variations that were below theresolution 16985061 of SNP6 segmentation [14]. These were not included in the present analysis, though we checked that they predicted no additional fusion genes). These structural rearrangements gave rise to at least twelve expressed fusion transcripts, confirmed by RT-PCR and Sanger sequencing: RGS22-SYCP1, CTAGE5-SIP1, PLXND1-TMCC1, SEC22B-NOTCH2, KLK5-CDH23, BC041478-EXOSC10, AGPAT5-MCPH1, SUSD1-ROD1/PTBP3, SGK1-SLC2A12, RHOJSYNE2, PUM1-TRERF1 and CTCF-SCUBE2, some of which have been reported previously ([12?5] and Table S4 in File S2). Of these twelve, the first four were predicted to form an in-frame fusion product.HCC1187 Endoreduplicated during its HistoryThe HCC1187 karyotype is hypotriploid and highly rearranged, like most breast cancers. The karyotype is highly likely to have evolved via successive chromosome loss, unbalanced translocation and endoreduplication, since this is the predominant pattern in breast tumors (Fig. 1) [17]. We therefore looked for signs of endoreduplication. The main evidence that endoreduplication had occurred was that a high proportion of the genome had been duplicated precisely once. To make this clearer, we worked out which chromosome segments derived from which parent by analysing how many copies of each genomic segment had the same alleles, using SNP array data (Fig. S1 in File S1). We were able to assign almost all chromosome segments in the karyotype to one or the other allelotype (Fig. 2). This showed that many chromosome segments were present in two copies of the same parental origin, and most of the remainder appear to have evolved from a pair of copies (Fig. 2B). For example, all segments of chromosomes 6 and 7 are present in two copies, while there are two complete copies of chromosome 16 derived from the same parent, one of which has been split by a balanced translocation.Inferring the Genome State before EndoreduplicationHaving clear evidence that endoreduplication had occurred in HCC1187, we were able to infer the state of the genome immediately before it doubled (Fig. 3). To do this, we assumed that the simplest possible sequence of events had happened, in particular that endoreduplication.