SWIFT
finishing software
Finishing has long been the
bottleneck in the sequencing process. Finishing draft sequences with existing
software requires extensive and time-consuming human intervention. Identifying and resolving
misalignments, choosing templates, and selecting appropriate chemistries are
common tasks that require the attention of the finisher. To streamline this process, a unique
combination of software and laboratory tools has been optimized in our
laboratory to drastically reduce user intervention in finishing. Moreover, this strategy enables us to
readily finish clones drafted at other sites. Our strategy is based on three laboratory techniques and is
glued together by software we have developed. One of the laboratory techniques is a standard application
of in vitro transposon based
sequencing. The second technique
is long range PCR. The third is an
approach that we call “universal template sequencing”.
The SWIFT software is
designed to identify problem areas using input reads assembled by PHRAP and
then generate a worklist to resolve those areas. The software also interacts
with a relational database to maintain a log to track the status of finishing
projects. The software tool is
first run on a BAC clone in order to identify and tag possible regions of
misassembly (i.e. high quality base discrepancies in repeat sequences or
misoriented plasmid read pairs).
The software generates a list of misassembled reads and can automatically
re-run PHRAP to remove high quality discrepancies. After the assembly is sorted, the software
automatically identifies both the gaps requiring additional sequence data and
the sequence regions that are considered to be of low quality (below phred30 quality
value). The software may be run in
two modes: custom oligos can be picked (using the Primer3 software from
MIT/Whitehead) for resolving low quality regions and filling gaps by use of PCR or ”universal template” sequencing
, or plasmid subclones spanning the low quality regions and gaps can be picked
for subsequent transposon based sequencing. A worklist is then generated which
can be viewed and edited on the Finishing webpage. Sequencing is carried out and the new sequences are added to
the assembly (using either a PHRAP re-assembly or by adding a discreet file of
reads to the existing assembly in the case of a repetitive clone). The software may be run again if any
gaps or low quality regions remain unresolved.
In-vitro plasmid transposition is performed using the
Genome Priming System kit from New England Biolabs. To expedite this process,
arraying of plasmid clones and subsequent transposition reaction set-up has
been programmed on the Biomek robotic system. The transposed plasmids are then used to set-up
transformation reactions (in 96 well format) using chemically competent E.
coli on the Biomek. A total of 24-48
colonies per transposed plasmid are then picked using the QPix colony picker
and inoculated in 96 well plates for template preparation using a filter prep
(from where?) which has also been programmed to run on the BioMek. The template DNA is then sequenced
(using the two universal transposon based primers) with Big Dye terminator
chemistry. This method enables us
to resolve multiple clustered low quality regions and to fill large gaps in one
sequencing round with a minimal number of templates. Additionally, the insertion of the transposon into the
sequencing template often helps to disrupt secondary structures that would
otherwise impede sequencing reactions.
For increased efficiency we often pool plasmids prior to
transposition. Pooled
transpositions leads to a reduction of the cost of transposon based finishing
by a factor equal to the number of plasmids pooled. However, repetitive clones may necessitate individual
plasmid transpositions to aid in assembly sorting.
Long range PCR targets
regions of up to 20kb. PCR primers
are chosen flanking gaps and/or low quality regions. The targeted regions are amplified from either purified BAC
DNA or genomic DNA and the PCR products are subsequently subcloned and
sequenced with the transposon method detailed above. This technique has the advantage of obviating the need for a
plasmid subclone library when finishing clones drafted at other sites.
“Universal Template”
sequencing: This title differentiates this process from universal primer
sequencing methods in which one strategy or another (shearing, ExoIII deletion,
etc) is used to bring the vector-born universal sequencing primer adjacent to
the targeted region. In universal
template sequencing we use the same template (a BAC or pools of BACs) for all
finishing reactions. The universal
template is then paired with custom oligos. This approach has tremendous workflow advantages over universal
primer sequencing. Sequencing
directly from high molecular weight fragments has historically been
difficult. To circumvent this
problem we made several changes to typical BAC sequencing protocols. Importantly, the BAC DNA is sheared
into 0.5-1 kb fragments via nebulization.
This sheared DNA is then used as the template for all sequencing
reactions, thereby greatly reducing the reaction set-up time required in the
traditional method of matching each oligo to a specifed set of subclones. Custom oligos are ordered in a 96 well
format in order to expedite re-suspension and reaction set up. Sequencing reactions are performed with
Big Dye terminator chemistry using cycling parameters modified for universal
template sequencing.
To further streamline the
“universal template” finishing technology we have experimented with pooling BAC
DNA preps. We tested pooling of 4,
6 and 8 BACs. We grow the BACs
separately (this allows us to normalize DNA yield of each clone) and mix the
cultures prior to DNA isolation and purification. DNA is isolated, purified and nebulized using our
normal conditions for direct BAC sequencing. Sequencing is carried out on the pools using 1x, or 0.5x of
the recommended amount of ABI Big Dye 3.1 mix.
A standard automated system such as ours will help to reduce the number of highly trained technicians needed to achieve high quality finished sequence. We believe this strategy will greatly increase the efficiency of the finishing process in terms of cost, labor and speed of clone completion.