Presentazione di PowerPoint

Transcript

Presentazione di PowerPoint
Analysis of DNA methylation:
bisulfite libraries and SOLiD sequencing
Epigen – Udine June 2012
CRIBI, Università di Padova
Epigen – Udine June 2012
CRIBI, Università di Padova
An easy view of the bisulfite approach
CH3
|
genome
TAGTACGTTGAT TAGTACGTTGAT
read
TAGTACGTTGAT TAGTATGTTGAT
Epigen – Udine June 2012
CRIBI, Università di Padova
Epigen – Udine June 2012
CRIBI, Università di Padova
Three main problems
1. We need some software specifically designed to align bisulfite reads
2. Loss of sensibility and specificity due to the reduced complexity
(3 letters instead than 4) and to the increased size of the reference
3. Need of special strategies for making the shotgun libraries
Epigen – Udine June 2012
CRIBI, Università di Padova
Three main problems
1. We need some software specifically designed to align bisulfite reads
2. Loss of sensibility and specificity due to the reduced complexity
(3 letters instead than 4) and to the increased size of the reference
3. Need of special strategies for making the shotgun libraries
Before
5'­ATGCTGCACTGACACGTGAT­3'
3'­TACGACGTGACTGTGCACTA­5'
After
5'­ATGUTGUAUTGAUAUGTGAT­3'
3'­TAUGAUGTGAUTGTGUAUTA­5'
Epigen – Udine June 2012
CRIBI, Università di Padova
Need of special strategies for making the shotgun libraries
Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, Edsall L,
Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR: Human DNA methylomes at
base resolution show widespread epigenomic differences. Nature 2009, 462:315-322.
Epigen – Udine June 2012
CRIBI, Università di Padova
CRIBI method for bisulfite libraries preparation - MeSS – Methylome Solid Sequencing
Lisa Marchioretto and Robin Targon
DNA
Nuclei
Cells
Bisulfite
treatment
Adaptor
ligation
PCR
Epigen – Udine June 2012
Sequencing
CRIBI, Università di Padova
Optimization of the fragmentation and bisulfite treatment
Epigen – Udine June 2012
CRIBI, Università di Padova
Optimization of
adaptor ligation
Comparing to other Bis-seq methods,
MeSS requires ten times less starting
genomic DNA, avoids intermediate
purification steps between enzymatic
reactions, and allows an efficient
amplification with fewer PCR cycles.
Epigen – Udine June 2012
CRIBI, Università di Padova
Loss of sensibility and specificity due to the reduced complexity
(3 letters instead than 4) and to the increased size of the reference
Directional cloning
would half the
mapping complexity
Before
5'­ATGCTGCACTGACACGTGAT­3'
3'­TACGACGTGACTGTGCACTA­5'
After
5'­ATGUTGUAUTGAUAUGTGAT­3'
3'­TAUGAUGTGAUTGTGUAUTA­5'
SOLiD color space maintains the full
set of 4 colors after C/U conversion
>882_4_710_F3
T12303201320002311102023132033102120101
>882_4_840_F3
T30132200013022300130131231321021133033
>882_4_1657_F3
T33213100102312210311012322012203112333
>882_5_1275_F3
T31201000021203112332021200212201223112
>882_6_553_F3
T31321031020123002032223323001301333313
..................
Epigen – Udine June 2012
CRIBI, Università di Padova
software specifically designed to align bisulfite reads
Epigen – Udine June 2012
CRIBI, Università di Padova
Exaustive approach of bisulfite alignment
STEP 1
Virtual bisulfite conversion of the genome
Genome
...ATGCTGCACTGACACGTGATGTCGTA...
↓
Converted AGT genome ...ATGTTGTATTGATATGTGATGTTGTA...
STEP 2 Virtual bisulfite conversion of any C in the reads, remembering the original
Read #1
Read #2
TGTTGTATTG → TGTTGTATTG TGATGTCGTA → TGATGTTGTA
…
STEP 3
Alignment of three base sequences
Converted genome
Converted reads
STEP 4/5
If original read had any C, check that also genome was C and label as Met
Original genome
Converted genome
Converted read
Original read
Epigen – Udine June 2012
...ATGTTGTATTGATATGTGATGTTGTA...
TGTTGTATTG TGATGTTGTA
CH3
/
...ATGCTGCACTGACACGTGATGTCGTA...
...ATGTTGTATTGATATGTGATGTTGTA...
TGATGTTGTA
TGATGTCGTA
CRIBI, Università di Padova
PASS implementation of bisulfite alignment
Simulated test set
Starting from 3 simulated hg19 reference genome which cytosines was randomly
methylated on both DNA strands to obtain 3 cytosines methylation percent level
( 0% , 50% and 100% ) we have generated 6 test sets containing 1 million of reads
each one (3 for colorspace and 3 for basespace data) using dwgsim-0.1.8 (ref.)
program. The same procedure is applied to obtain the not bisulfite threated DNA
simulated test sets except for the unmodified hg19 reference genome as input of
dwgsim-0.1.8 program.
Used parameters: [ -y 0 -z 0 -d 100 -S 2 -c 0 or 1 (for Illumina or SOLiD data) -1 50 -2 50 -C -1 -N 1000000 ]
The per base/color/flow error rate and the rate of mutation is set to the default values
(respectively: 0.02 and 0.001). All simulated test sets was produced using the same
seed, so they are comparable for number of reads, position and strand to the human
reference genome (hg19 ).
Epigen – Udine June 2012
CRIBI, Università di Padova
PASS implementation of bisulfite alignment
General strategy
1. Find seeds in base space
2. Extend alignment in color space
Epigen – Udine June 2012
CRIBI, Università di Padova
SOLiD chemistry: ligation probes
3’ Ligation site, cleavage site & dye are spatially separated
Cleavage site
3’ Ligation site
Fluorescent dye interrogates
base on
1st + 2nd position
2nd Base
A C G T
A T n n n z z z
– N=degenerate bases, Z=universal bases
– 45 = 1024 probes (256 probes per color)
es t1as B
• Ligation Probes are Octamers
A
C
G
T
2-base encoding is based on ligation sequencing rather than sequencing by synthesis. It takes advantage of fluorescent
labeled 8-mer probes that distinguish the two 3 prime most bases (AT in the figure). To have a full coverage, repeated
cycles of ligation are done, using primers annealing to different positions of the adapter sequence (see next slides).
Epigen – Udine June 2012
CRIBI, Università di Padova
SOLiD 4-color ligation
Ligation reaction
universal seq primer
3’
5’
ligase
Y-probe
3’
5’
3’
5’
XXnnnzzz
1µm 1µm
bead bead
5’
Epigen – Udine June 2012
P1 Primer
5’
XXnnnzzz
X Xn n n z z z
B-probe
G-probe
Template Sequence
3’
R-probe
5’
XXnnnzzz
3’
CRIBI, Università di Padova
SOLiD 4-color ligation
Ligation reaction
ligase
Y-probe
3’
5’
3’
5’
XXnnnzzz
X Xn n n z z z
B-probe
G-probe
5’
XXnnnzzz
3’
R-probe
5’
XXnnnzzz
ligase
universal seq primer
1µm 1µm
bead bead
p5’
xx
5’
Epigen – Udine June 2012
P1 Primer
Template Sequence
3’
CRIBI, Università di Padova
SOLiD 4-color ligation
Visualization
universal seq primer
1µm 1µm
bead bead
xx
5’
P1 Primer
Template Sequence
3’
Y
1-2
Epigen – Udine June 2012
CRIBI, Università di Padova
SOLiD ligation-based sequencing chemistry (2)
Image
Cap unextended strands
Cleave-off fluor
Epigen – Udine June 2012
CRIBI, Università di Padova
SOLiD 4-color ligation
Cleavage
universal seq primer
1µm 1µm
bead bead
xx
5’
P1 Primer
p5’
Template Sequence
3’
Y
1-2
Epigen – Udine June 2012
CRIBI, Università di Padova
SOLiD 4-color ligation
Ligation (2nd cycle)
ligase
Y-probe
3’
5’
3’
5’
XXnnnzzz
X Xn n n z z z
B-probe
G-probe
5’
XXnnnzzz
3’
R-probe
5’
XXnnnzzz
ligase
universal seq primer
1µm 1µm
bead bead
xx
5’
Adapter Oligo Sequence
xx
Template Sequence
3’
Y
1-2
Epigen – Udine June 2012
CRIBI, Università di Padova
SOLiD 4-color ligation
Visualization (2nd cycle)
universal seq primer
1µm 1µm
bead bead
XX
5’
Epigen – Udine June 2012
xx
Adapter Oligo Sequence
Template Sequence
Y
R
1-2
6-7
3’
CRIBI, Università di Padova
SOLiD 4-color ligation
Cleavage (2nd cycle)
universal seq primer
1µm 1µm
bead bead
XX
5’
Epigen – Udine June 2012
xx
Adapter Oligo Sequence
p5’
Template Sequence
Y
R
1-2
6-7
3’
CRIBI, Università di Padova
SOLiD 4-color ligation
interrogates every 4th-5th base
universal seq primer
1µm 1µm
bead bead
XX
5’
Epigen – Udine June 2012
XX
XX
Adapter Oligo Sequence
XX
XX
Template Sequence
Y
R
R
B
1-2
6-7
11-12
16-17 21-22
3’
G
CRIBI, Università di Padova
SOLiD 4-color ligation Reset
1µm 1µm
bead bead
5’
Epigen – Udine June 2012
Adapter Oligo Sequence
Template Sequence
3’
CRIBI, Università di Padova
SOLiD 4-color ligation
(1st cycle after reset)
universal seq primer n-1
3’
p5’
ligase
Y-probe
3’
5’
3’
5’
XXnnnzzz
X Xn n n z z z
B-probe
G-probe
5’
XXnnnzzz
3’
R-probe
5’
XXnnnzzz
ligase
universal seq primer n-1
p5’
1µm 1µm
bead bead
xx
5’
Epigen – Udine June 2012
Adapter Oligo Sequence
Template Sequence
3’
CRIBI, Università di Padova
SOLiD 4-color ligation
(1st cycle after reset)
universal seq primer n-1
1µm 1µm
bead bead
xx
5’
Adapter Oligo Sequence
Template Sequence
3’
R
0-1
Epigen – Udine June 2012
CRIBI, Università di Padova
SOLiD 4-color ligation
(2nd Round)
universal seq primer n-1
1µm 1µm
bead bead
XX
5’
Epigen – Udine June 2012
XX
XX
Adapter Oligo Sequence
XX
XX
Template Sequence
R
R
R
B
G
01
56
1011
1516
2021
3’
CRIBI, Università di Padova
Sequential rounds of sequencing
Multiple cycles per round
1µm 1µm
bead bead
5’
Adapter Oligo Sequence
3’
Template Sequence
universal seq primer
1-2
3’
reset
11-12 16-17 21-22
universal seq primer n-1
0-1
3’
reset
5-6 10-11
15-16
20-21
14-15
19-20 24-25
universal seq primer n+3
3’
reset
4-5
spacer
9-10
universal seq primer n+2
3-4
3’
8-9
13-14
18-19
23-24
spacer
reset
universal seq primer n+1
3’
Epigen – Udine June 2012
6-7
spacer
2-3
7-8
12-13 17-18 22-23
CRIBI, Università di Padova
01
02
03
Agenda Item
Agenda Item
Agenda Item
SOLiD™ Chemistry
Double Base Encoding
Epigen – Udine June 2012
CRIBI, Università di Padova
2 Base Pair Encoding
Using 4 Dyes
Red-probe
2nd Base
A
C
G
5’
3’
A T n n n z z z
T
A
Blue-probe
C
5’
es t1as B
3’
G
T T n n n z z z
T
Epigen – Udine June 2012
CRIBI, Università di Padova
2 base pair encoding reference alignment in color space
A C G G T C G T C G T G T G C G T
Base reference
Color reference
Epigen – Udine June 2012
CRIBI, Università di Padova
2 base pair encoding reference alignment in color space
A C G G T C G T C G T G T G C G T
reference
expected
observed
A C G G T C G C C G T G T G C G T
A SNP to be real must be encoded by two color changes
Epigen – Udine June 2012
CRIBI, Università di Padova
Advantages of 2 base pair encoding Miscall
A C G G T C G T C G T G T G C G T
reference
expected
observed
A C G G T C G C T A C A C A T A C
2nd Base
A
Single color change, represents sequencing error.
C
G
T
A
es t1as B
C
Epigen – Udine June 2012
G
T
CRIBI, Università di Padova
But there is more…
Only certain transitions are allowed for a real SNP
• Consider a triplet of bases, they
define 2 colors.
C A T
• There are only 3 possibilities for a
change in the middle base, hence
only 3 possibilities for the 2 colors
to change to.
• Any of the other 6 possibilities for
a 2-color change are not allowed
and most probably represent
measurement errors.
Epigen – Udine June 2012
CRIBI, Università di Padova
The Only Allowed Transitions
C A T
CGT
Reverse Colors
C C T
C T T
Other two colors (both orientations)
Any other transitions would require the outer two bases to change
Epigen – Udine June 2012
CRIBI, Università di Padova
Not Allowed Transitions
2nd Base
A
C A T
C
G
T
A
es t1as B
C
G
T
A G T
T C T
G T T
C G C
C C A
C T G
 1/3rd allowed vs 2/3rd not allowed
Epigen – Udine June 2012
CRIBI, Università di Padova
SOLiD Exact Call Chemistry (ECC)
ECC allows to perform an extra run of ligations with 3-base encoding. This is
used as a control of the accuracy, thus improving the quality of the sequence in
color space. Also, it can return a sequence in base space with a good accuracy.
Epigen – Udine June 2012
CRIBI, Università di Padova
PASS implementation of bisulfite alignment
(Davide Campagna)
General strategy
1. Find seeds in base space
2. Extend alignment in color space
3. Rescue unaligned reads using a reference
with the combination of methylated patterns
Epigen – Udine June 2012
CRIBI, Università di Padova
Epigen – Udine June 2012
CRIBI, Università di Padova
Epigen – Udine June 2012
CRIBI, Università di Padova