Presentazione di PowerPoint
Transcript
Presentazione di PowerPoint
Analysis of DNA methylation: bisulfite libraries and SOLiD sequencing Epigen – Udine June 2012 CRIBI, Università di Padova Epigen – Udine June 2012 CRIBI, Università di Padova An easy view of the bisulfite approach CH3 | genome TAGTACGTTGAT TAGTACGTTGAT read TAGTACGTTGAT TAGTATGTTGAT Epigen – Udine June 2012 CRIBI, Università di Padova Epigen – Udine June 2012 CRIBI, Università di Padova Three main problems 1. We need some software specifically designed to align bisulfite reads 2. Loss of sensibility and specificity due to the reduced complexity (3 letters instead than 4) and to the increased size of the reference 3. Need of special strategies for making the shotgun libraries Epigen – Udine June 2012 CRIBI, Università di Padova Three main problems 1. We need some software specifically designed to align bisulfite reads 2. Loss of sensibility and specificity due to the reduced complexity (3 letters instead than 4) and to the increased size of the reference 3. Need of special strategies for making the shotgun libraries Before 5'ATGCTGCACTGACACGTGAT3' 3'TACGACGTGACTGTGCACTA5' After 5'ATGUTGUAUTGAUAUGTGAT3' 3'TAUGAUGTGAUTGTGUAUTA5' Epigen – Udine June 2012 CRIBI, Università di Padova Need of special strategies for making the shotgun libraries Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, Edsall L, Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR: Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 2009, 462:315-322. Epigen – Udine June 2012 CRIBI, Università di Padova CRIBI method for bisulfite libraries preparation - MeSS – Methylome Solid Sequencing Lisa Marchioretto and Robin Targon DNA Nuclei Cells Bisulfite treatment Adaptor ligation PCR Epigen – Udine June 2012 Sequencing CRIBI, Università di Padova Optimization of the fragmentation and bisulfite treatment Epigen – Udine June 2012 CRIBI, Università di Padova Optimization of adaptor ligation Comparing to other Bis-seq methods, MeSS requires ten times less starting genomic DNA, avoids intermediate purification steps between enzymatic reactions, and allows an efficient amplification with fewer PCR cycles. Epigen – Udine June 2012 CRIBI, Università di Padova Loss of sensibility and specificity due to the reduced complexity (3 letters instead than 4) and to the increased size of the reference Directional cloning would half the mapping complexity Before 5'ATGCTGCACTGACACGTGAT3' 3'TACGACGTGACTGTGCACTA5' After 5'ATGUTGUAUTGAUAUGTGAT3' 3'TAUGAUGTGAUTGTGUAUTA5' SOLiD color space maintains the full set of 4 colors after C/U conversion >882_4_710_F3 T12303201320002311102023132033102120101 >882_4_840_F3 T30132200013022300130131231321021133033 >882_4_1657_F3 T33213100102312210311012322012203112333 >882_5_1275_F3 T31201000021203112332021200212201223112 >882_6_553_F3 T31321031020123002032223323001301333313 .................. Epigen – Udine June 2012 CRIBI, Università di Padova software specifically designed to align bisulfite reads Epigen – Udine June 2012 CRIBI, Università di Padova Exaustive approach of bisulfite alignment STEP 1 Virtual bisulfite conversion of the genome Genome ...ATGCTGCACTGACACGTGATGTCGTA... ↓ Converted AGT genome ...ATGTTGTATTGATATGTGATGTTGTA... STEP 2 Virtual bisulfite conversion of any C in the reads, remembering the original Read #1 Read #2 TGTTGTATTG → TGTTGTATTG TGATGTCGTA → TGATGTTGTA … STEP 3 Alignment of three base sequences Converted genome Converted reads STEP 4/5 If original read had any C, check that also genome was C and label as Met Original genome Converted genome Converted read Original read Epigen – Udine June 2012 ...ATGTTGTATTGATATGTGATGTTGTA... TGTTGTATTG TGATGTTGTA CH3 / ...ATGCTGCACTGACACGTGATGTCGTA... ...ATGTTGTATTGATATGTGATGTTGTA... TGATGTTGTA TGATGTCGTA CRIBI, Università di Padova PASS implementation of bisulfite alignment Simulated test set Starting from 3 simulated hg19 reference genome which cytosines was randomly methylated on both DNA strands to obtain 3 cytosines methylation percent level ( 0% , 50% and 100% ) we have generated 6 test sets containing 1 million of reads each one (3 for colorspace and 3 for basespace data) using dwgsim-0.1.8 (ref.) program. The same procedure is applied to obtain the not bisulfite threated DNA simulated test sets except for the unmodified hg19 reference genome as input of dwgsim-0.1.8 program. Used parameters: [ -y 0 -z 0 -d 100 -S 2 -c 0 or 1 (for Illumina or SOLiD data) -1 50 -2 50 -C -1 -N 1000000 ] The per base/color/flow error rate and the rate of mutation is set to the default values (respectively: 0.02 and 0.001). All simulated test sets was produced using the same seed, so they are comparable for number of reads, position and strand to the human reference genome (hg19 ). Epigen – Udine June 2012 CRIBI, Università di Padova PASS implementation of bisulfite alignment General strategy 1. Find seeds in base space 2. Extend alignment in color space Epigen – Udine June 2012 CRIBI, Università di Padova SOLiD chemistry: ligation probes 3’ Ligation site, cleavage site & dye are spatially separated Cleavage site 3’ Ligation site Fluorescent dye interrogates base on 1st + 2nd position 2nd Base A C G T A T n n n z z z – N=degenerate bases, Z=universal bases – 45 = 1024 probes (256 probes per color) es t1as B • Ligation Probes are Octamers A C G T 2-base encoding is based on ligation sequencing rather than sequencing by synthesis. It takes advantage of fluorescent labeled 8-mer probes that distinguish the two 3 prime most bases (AT in the figure). To have a full coverage, repeated cycles of ligation are done, using primers annealing to different positions of the adapter sequence (see next slides). Epigen – Udine June 2012 CRIBI, Università di Padova SOLiD 4-color ligation Ligation reaction universal seq primer 3’ 5’ ligase Y-probe 3’ 5’ 3’ 5’ XXnnnzzz 1µm 1µm bead bead 5’ Epigen – Udine June 2012 P1 Primer 5’ XXnnnzzz X Xn n n z z z B-probe G-probe Template Sequence 3’ R-probe 5’ XXnnnzzz 3’ CRIBI, Università di Padova SOLiD 4-color ligation Ligation reaction ligase Y-probe 3’ 5’ 3’ 5’ XXnnnzzz X Xn n n z z z B-probe G-probe 5’ XXnnnzzz 3’ R-probe 5’ XXnnnzzz ligase universal seq primer 1µm 1µm bead bead p5’ xx 5’ Epigen – Udine June 2012 P1 Primer Template Sequence 3’ CRIBI, Università di Padova SOLiD 4-color ligation Visualization universal seq primer 1µm 1µm bead bead xx 5’ P1 Primer Template Sequence 3’ Y 1-2 Epigen – Udine June 2012 CRIBI, Università di Padova SOLiD ligation-based sequencing chemistry (2) Image Cap unextended strands Cleave-off fluor Epigen – Udine June 2012 CRIBI, Università di Padova SOLiD 4-color ligation Cleavage universal seq primer 1µm 1µm bead bead xx 5’ P1 Primer p5’ Template Sequence 3’ Y 1-2 Epigen – Udine June 2012 CRIBI, Università di Padova SOLiD 4-color ligation Ligation (2nd cycle) ligase Y-probe 3’ 5’ 3’ 5’ XXnnnzzz X Xn n n z z z B-probe G-probe 5’ XXnnnzzz 3’ R-probe 5’ XXnnnzzz ligase universal seq primer 1µm 1µm bead bead xx 5’ Adapter Oligo Sequence xx Template Sequence 3’ Y 1-2 Epigen – Udine June 2012 CRIBI, Università di Padova SOLiD 4-color ligation Visualization (2nd cycle) universal seq primer 1µm 1µm bead bead XX 5’ Epigen – Udine June 2012 xx Adapter Oligo Sequence Template Sequence Y R 1-2 6-7 3’ CRIBI, Università di Padova SOLiD 4-color ligation Cleavage (2nd cycle) universal seq primer 1µm 1µm bead bead XX 5’ Epigen – Udine June 2012 xx Adapter Oligo Sequence p5’ Template Sequence Y R 1-2 6-7 3’ CRIBI, Università di Padova SOLiD 4-color ligation interrogates every 4th-5th base universal seq primer 1µm 1µm bead bead XX 5’ Epigen – Udine June 2012 XX XX Adapter Oligo Sequence XX XX Template Sequence Y R R B 1-2 6-7 11-12 16-17 21-22 3’ G CRIBI, Università di Padova SOLiD 4-color ligation Reset 1µm 1µm bead bead 5’ Epigen – Udine June 2012 Adapter Oligo Sequence Template Sequence 3’ CRIBI, Università di Padova SOLiD 4-color ligation (1st cycle after reset) universal seq primer n-1 3’ p5’ ligase Y-probe 3’ 5’ 3’ 5’ XXnnnzzz X Xn n n z z z B-probe G-probe 5’ XXnnnzzz 3’ R-probe 5’ XXnnnzzz ligase universal seq primer n-1 p5’ 1µm 1µm bead bead xx 5’ Epigen – Udine June 2012 Adapter Oligo Sequence Template Sequence 3’ CRIBI, Università di Padova SOLiD 4-color ligation (1st cycle after reset) universal seq primer n-1 1µm 1µm bead bead xx 5’ Adapter Oligo Sequence Template Sequence 3’ R 0-1 Epigen – Udine June 2012 CRIBI, Università di Padova SOLiD 4-color ligation (2nd Round) universal seq primer n-1 1µm 1µm bead bead XX 5’ Epigen – Udine June 2012 XX XX Adapter Oligo Sequence XX XX Template Sequence R R R B G 01 56 1011 1516 2021 3’ CRIBI, Università di Padova Sequential rounds of sequencing Multiple cycles per round 1µm 1µm bead bead 5’ Adapter Oligo Sequence 3’ Template Sequence universal seq primer 1-2 3’ reset 11-12 16-17 21-22 universal seq primer n-1 0-1 3’ reset 5-6 10-11 15-16 20-21 14-15 19-20 24-25 universal seq primer n+3 3’ reset 4-5 spacer 9-10 universal seq primer n+2 3-4 3’ 8-9 13-14 18-19 23-24 spacer reset universal seq primer n+1 3’ Epigen – Udine June 2012 6-7 spacer 2-3 7-8 12-13 17-18 22-23 CRIBI, Università di Padova 01 02 03 Agenda Item Agenda Item Agenda Item SOLiD™ Chemistry Double Base Encoding Epigen – Udine June 2012 CRIBI, Università di Padova 2 Base Pair Encoding Using 4 Dyes Red-probe 2nd Base A C G 5’ 3’ A T n n n z z z T A Blue-probe C 5’ es t1as B 3’ G T T n n n z z z T Epigen – Udine June 2012 CRIBI, Università di Padova 2 base pair encoding reference alignment in color space A C G G T C G T C G T G T G C G T Base reference Color reference Epigen – Udine June 2012 CRIBI, Università di Padova 2 base pair encoding reference alignment in color space A C G G T C G T C G T G T G C G T reference expected observed A C G G T C G C C G T G T G C G T A SNP to be real must be encoded by two color changes Epigen – Udine June 2012 CRIBI, Università di Padova Advantages of 2 base pair encoding Miscall A C G G T C G T C G T G T G C G T reference expected observed A C G G T C G C T A C A C A T A C 2nd Base A Single color change, represents sequencing error. C G T A es t1as B C Epigen – Udine June 2012 G T CRIBI, Università di Padova But there is more… Only certain transitions are allowed for a real SNP • Consider a triplet of bases, they define 2 colors. C A T • There are only 3 possibilities for a change in the middle base, hence only 3 possibilities for the 2 colors to change to. • Any of the other 6 possibilities for a 2-color change are not allowed and most probably represent measurement errors. Epigen – Udine June 2012 CRIBI, Università di Padova The Only Allowed Transitions C A T CGT Reverse Colors C C T C T T Other two colors (both orientations) Any other transitions would require the outer two bases to change Epigen – Udine June 2012 CRIBI, Università di Padova Not Allowed Transitions 2nd Base A C A T C G T A es t1as B C G T A G T T C T G T T C G C C C A C T G 1/3rd allowed vs 2/3rd not allowed Epigen – Udine June 2012 CRIBI, Università di Padova SOLiD Exact Call Chemistry (ECC) ECC allows to perform an extra run of ligations with 3-base encoding. This is used as a control of the accuracy, thus improving the quality of the sequence in color space. Also, it can return a sequence in base space with a good accuracy. Epigen – Udine June 2012 CRIBI, Università di Padova PASS implementation of bisulfite alignment (Davide Campagna) General strategy 1. Find seeds in base space 2. Extend alignment in color space 3. Rescue unaligned reads using a reference with the combination of methylated patterns Epigen – Udine June 2012 CRIBI, Università di Padova Epigen – Udine June 2012 CRIBI, Università di Padova Epigen – Udine June 2012 CRIBI, Università di Padova