#README.md (find_differential_primers)
##Overview This repository contains code for finding discriminatory primers among genomes or other biological sequences of interest.
##DEPENDENCIES: The following dependencies have been confirmed to work for running the 'find_differential_primers.py' pipeline, though any later version (and in some cases, some earlier versions) should work:
- Biopython: (forthcoming) v1.63 - but NOTE you may need to clone the current Biopython repo from https://github.com/biopython/biopython (see below).
- bx-python: current Hg (Mercurial) version works as of June 19, 2013
- EMBOSS (for ePrimer3): v6.4.0, v6.6.0
- primer3: v1.1.4 NOTE: primer3 version2 does not play nice with EMBOSS ePrimer3 (see e.g. this, this and this)
- prodigal : v1.20
- BLAST+: v2.2.22+, v2.2.28+, v2.2.29+
There is a point of fragility in choice of EMBOSS, primer3, and Biopython versions, that centres around the following issues:
- primer3 versions newer than v1.1.4 do not work with EMBOSS. This locks us, for now, into a 2008 version of primer3. As I chose to use the EMBOSS tools, I am taking their lead on this - when EMBOSS ePrimer3 changes to v2+, so will this script.
- EMBOSS' ePrimer3 interface is not stable between version numbers. In particular, v6.6.0 changes the -otm flag to -opttm. This means that Biopython version v1.63 or lower does not use the appropriate option for EMBOSS v6.6.0. If you are using an EMBOSS version older than v6.6.0 then Biopython 1.63 should be fine. If you are using EMBOSS v6.6.0+, then note that the appropriate change has been committed at the
git
repository at https://github.com/biopython/biopython (as of Dec 2013), and for now you should install Biopython from the bleeding edge source. It is anticipated that this change will appear in Biopython v1.64.
- EMBOSS v6.6.0+/Biopython cloned from GitHub repository/Primer3 v1.1.4 should work
- EMBOSS pre-v6.6.0/Biopython pre-v1.63/Primer3 v1.1.4 should work
- Primer3 v2+: will not work
If you have downloaded v0.1.0 or greater, and the dependencies above are satisfied, then installation should be as simple as cloning the repository:
$ git clone https://github.com/widdowquinn/find_differential_primers
$ cd find_differential_primers
then issuing:
$ python setup.py install
(or whatever variant you wish, e.g. for a home directory-local installation) from the top directory in the repository, with root permissions, if necessary.
#BASIC USE:
- Collect all biological sequences to be distinguished into a convenient location (e.g. in same directory; this is not essential, but it simplifies things if using a
Makefile
). - Construct a config file similar to the example given in
O104_primers_5.conf
ortest.conf
. This will describe each sequence by name, the classes to which it belongs, and (at least) the location of the FASTA file containing the sequence (or sequences -find_differential_primers.py
will stitch sequences with the spacerNNNNNCATTCCATTCATTAATTAATTAATGAATGAATGNNNNN
, if necessary). - If you need a BLAST database of negative screening examples, construct this with
makeblastdb
(part of BLAST+). - Run the
find_differential_primers.py
script, with suitable command-line options.
These steps are encapsulated in the accompanying makefile
in samples/makefile
. This file can be modified to point to your input sequence file of interest, and run by issuing make
at the command-line. See documentation in the makefile
for more details.
Change directory to tests
, and run the script with the test.conf
config file using default settings:
$ ../find_differential_primers/find_differential_primers.py -i test.conf -v
This should run to completion, and produce the output indicated below:
$ tree differential_primer_results/
differential_primer_results/
├── Erwinia_family-specific_amplicons.fas
├── Erwinia_family-specific_primers.eprimer3
├── Eta_1_99_specific_amplicons.fas
├── Eta_1_99_specific_primers.eprimer3
├── Pba_SCRI1043_specific_amplicons.fas
├── Pba_SCRI1043_specific_primers.eprimer3
├── Pca_PC1_specific_amplicons.fas
├── Pca_PC1_specific_primers.eprimer3
├── Pca_PCC21_specific_amplicons.fas
├── Pca_PCC21_specific_primers.eprimer3
├── Pectobacterium_family-specific_amplicons.fas
├── Pectobacterium_family-specific_primers.eprimer3
├── Pwa_WPP163_specific_amplicons.fas
├── Pwa_WPP163_specific_primers.eprimer3
├── atrosepticum_family-specific_amplicons.fas
├── atrosepticum_family-specific_primers.eprimer3
├── carotovorum_family-specific_amplicons.fas
├── carotovorum_family-specific_primers.eprimer3
├── differential_primer_results-families.tab
├── differential_primer_results.tab
├── tasmaniensis_family-specific_amplicons.fas
├── tasmaniensis_family-specific_primers.eprimer3
├── universal_amplicons.fas
├── universal_primers.eprimer3
├── wasabiae_family-specific_amplicons.fas
└── wasabiae_family-specific_primers.eprimer3
0 directories, 26 files
$ wc differential_primer_results/*
51 68 2621 differential_primer_results/Erwinia_family-specific_amplicons.fas
140 452 4296 differential_primer_results/Erwinia_family-specific_primers.eprimer3
51 68 2621 differential_primer_results/Eta_1_99_specific_amplicons.fas
140 469 4681 differential_primer_results/Eta_1_99_specific_primers.eprimer3
24 32 1257 differential_primer_results/Pba_SCRI1043_specific_amplicons.fas
68 226 2315 differential_primer_results/Pba_SCRI1043_specific_primers.eprimer3
18 24 918 differential_primer_results/Pca_PC1_specific_amplicons.fas
52 172 1737 differential_primer_results/Pca_PC1_specific_primers.eprimer3
21 28 1085 differential_primer_results/Pca_PCC21_specific_amplicons.fas
60 199 2019 differential_primer_results/Pca_PCC21_specific_primers.eprimer3
42 56 2166 differential_primer_results/Pectobacterium_family-specific_amplicons.fas
116 374 3583 differential_primer_results/Pectobacterium_family-specific_primers.eprimer3
18 24 936 differential_primer_results/Pwa_WPP163_specific_amplicons.fas
52 172 1759 differential_primer_results/Pwa_WPP163_specific_primers.eprimer3
24 32 1257 differential_primer_results/atrosepticum_family-specific_amplicons.fas
68 218 2138 differential_primer_results/atrosepticum_family-specific_primers.eprimer3
24 32 1236 differential_primer_results/carotovorum_family-specific_amplicons.fas
68 218 2107 differential_primer_results/carotovorum_family-specific_primers.eprimer3
13 56 1150 differential_primer_results/differential_primer_results-families.tab
15 86 933 differential_primer_results/differential_primer_results.tab
51 68 2621 differential_primer_results/tasmaniensis_family-specific_amplicons.fas
140 452 4301 differential_primer_results/tasmaniensis_family-specific_primers.eprimer3
0 0 0 differential_primer_results/universal_amplicons.fas
4 10 135 differential_primer_results/universal_primers.eprimer3
18 24 936 differential_primer_results/wasabiae_family-specific_amplicons.fas
52 166 1626 differential_primer_results/wasabiae_family-specific_primers.eprimer3
1330 3726 50434 total
$ cat differential_primer_results/differential_primer_results.tab
# Summary information table
# Generated by find_differential_primers
# Columns in the table:
# 1) Query organism ID
# 2) Query organism families
# 3) Count of organism-unique primers
# 4) Count of universal primers
# 5) Query sequence filename
# 6) Query feature filename
# 7) Query ePrimer3 primers filename
Pba_SCRI1043 Pectobacterium,atrosepticum 8 0 sequences/NC_004547.fna sequences/NC_004547.prodigalout sequences/NC_004547.eprimer3
Pca_PC1 Pectobacterium,carotovorum 6 1 sequences/NC_012917.fna sequences/NC_012917.prodigalout sequences/NC_012917.eprimer3
Pwa_WPP163 Pectobacterium,wasabiae 6 2 sequences/NC_013421.fna sequences/NC_013421.prodigalout sequences/NC_013421.eprimer3
Pca_PCC21 Pectobacterium,carotovorum 7 0 sequences/NC_018525.fna sequences/NC_018525.prodigalout sequences/NC_018525.eprimer3
Eta_1_99 Erwinia,tasmaniensis 17 0 sequences/NC_010694.fna sequences/NC_010694.prodigalout sequences/NC_010694.eprimer3
$ cat differential_primer_results/differential_primer_results-families.tab
# Summary information table
# Generated by find_differential_primers
# Columns in the table:
# 1) Family
# 2) Count of family-specific primers
# 3) Family-specific primer file
# 4) Family-specific amplicon file
Erwinia 17 differential_primer_results/Erwinia_family-specific_primers.eprimer3 differential_primer_results/Erwinia_family-specific_amplicons.fas
carotovorum 8 differential_primer_results/carotovorum_family-specific_primers.eprimer3 differential_primer_results/carotovorum_family-specific_amplicons.fas
Pectobacterium 14 differential_primer_results/Pectobacterium_family-specific_primers.eprimer3 differential_primer_results/Pectobacterium_family-specific_amplicons.fas
wasabiae 6 differential_primer_results/wasabiae_family-specific_primers.eprimer3 differential_primer_results/wasabiae_family-specific_amplicons.fas
atrosepticum 8 differential_primer_results/atrosepticum_family-specific_primers.eprimer3 differential_primer_results/atrosepticum_family-specific_amplicons.fas
tasmaniensis 17 differential_primer_results/tasmaniensis_family-specific_primers.eprimer3 differential_primer_results/tasmaniensis_family-specific_amplicons.fas
##FURTHER INFORMATION: Please read the comments contained within the top of each '*.py' file as well as the Supporting Information ('Methods S1' document) of doi:10.1371/journal.pone.0034498.
##CONTRIBUTORS
##CITATIONS Please refer to the following for methodological details:
- Pritchard L et al. (2012) "Alignment-Free Design of Highly Discriminatory Diagnostic Primer Sets for Escherichia coli O104:H4 Outbreak Strains." PLoS ONE 7(4): e34498. doi:10.1371/journal.pone.0034498 - Method description and application to human bacterial pathogens, sub-serotype resolution
- Pritchard L et al. (2013) "Detection of phytopathogens of the genus Dickeya using a PCR primer prediction pipeline for draft bacterial genome sequences." Plant Pathology, 62, 587-596 doi:10.1111/j.1365-3059.2012.02678.x - Application to plant pathogens, species-level resolution