Documentation

(See also the tutorial.)

The aim of PriFi is to suggest a few primer pairs based on a DNA sequence alignment, and to give an account of the quality of the suggested primers.

PriFi lets the user either load a given alignment file (in the .aln format), or, if the user has access to the alignment program Clustalw (by the European Bioinformatics Institute), performs the alignment from a multiple-sequence file (in the Fasta format). If not, the user might obtain the alignment file using the web version of Clustalw: http://www.ebi.ac.uk/clustalw/.

PriFi runs in one of two overall modes, either a general-purpose mode or a so-called intron mode. In intron mode, the program expects one or more of the sequences in the alignment to contain special intron symbols: Before uploading his sequences, the user must substitute the introns of at least one of the input sequences with X'es following this translation code:

XXX	intron, length <= 200 bp
XXXX	intron, length 201 - 500 bp
XXXXX	intron, length 501 - 1000 bp
XXXXXX	intron, length > 1000 bp

This mode is specially designed to follow the methods described in [1]. To run PriFi in general mode where sequences do not contain intron symbols, the user must click the Configure button and set the last parameter (Introns in sequences) to "no", before uploading any datafile.

There are three levels of filtering in the primer pair design process. The first filter operates on the complete alignment by delimiting the regions within which primers must be found. The second filter operates on these single primer candidates; the third on primer pairs. The way the filters work is explained below and is visualized in this figure, each filter represented by a yellow triangle (you might refer to the figure when reading the explanation):

First filter

Any primer is based on some subsection of the complete alignment. Certain alignment columns may be considered unfit to form the basis of a primer. Such columns are disregarded and instead delimit the primer regions within which primers are to be located.

The primer regions are those alignment subsections that have at least the minimum primer length (default 18) and which do not contain columns which

have only one sequence represented,
are highly conserved,
are closer than n nucleotides to the nearest intron (default 5), *
have an intron (marked by X) in at least one of the sequences. *

Less conserved regions contain many mismatch columns, so we look at those to use some of them as delimiters between conserved regions. All valid primers must have a minimum length (l) and a maximum number of mismatches (m). For each mismatch column, we check if it is possible to place a window around it of length at least l such that the part of the alignment covered by this window has at most m mismatches. If no such window can be found, the column can never be part of a valid primer, and it is masked out. After this masking procedure, the conserved regions are identified as those regions which have a length of at least l and contain no masked columns.

Second filter

Within each primer region, all possible valid single primer candidates are identified and evaluated. To avoid keeping too many candidates for further consideration, the set is pruned while we still keep the best candidates.
A valid primer candidate is an alignment subsection which obeys the following rules:

It has a length within the given limits (default 18-35),
it does not have mismatches in both ends,
its melting temperature is within certain limits (default 55-77° C),
it does not have too many mismatches (default maximum is 4),
it does not have mismatches and a low melting temperature (default minimum is 58° C),
if it is shorter than 25 bp, it has at most 2 mismatches,
if it is shorter than 21 bp, it has at most 1 mismatch,
it does not have too high diversity in its mismatches (default maximal diversity is 4, i.e. no restriction),
it does not have too many columns with the maximally allowed diversity (default maximum is 1),
if it has a mismatch with the maximally allowed diversity, then it can have at most one other mismatch which may have diversity 2 only,
if it has four mismatches, the first and last must be a certain distance apart (default is 17 bp),
if only two sequences are represented, there can be at most 2 mismatches,
if at least 67% of the primer candidate is based on two sequences only, there may be at most 3 mismatches in total, the two-sequence part may have at most 2 mismatches, and the candidate must be located within a highly conserved region (default is 90% matches in a 80 bp window),

Third filter

On level three, all possible pairs of primers are ranked. We obtain a primer from a candidate alignment subsection by using the consensus sequence for that subsection, inserting ambiguity codes in mismatch columns. All primer pairs which obey the following rules are kept and scored.

Both primers must have perfect matches in the 3' end (default 2),
of the last n bases (default 8) in the 3' end of both primers there must be at least one G or C,
there must be an intron between the primers, *
if there is an intron of length more than 1000 between them, there can be at most two introns in total between them, *
at least one of the primers must have a certain minimum distance to the nearest intron (default 50), *
the primers' melting temperatures can not be too different (default maximum distance 15° C),
the estimated resulting PCR product must have a certain length (default 450-3000 bp),
if one primer is based on two sequences only, the other must be based on at least three.

Having discarded all invalid primers and primer pairs, the remaining pairs are scored and the four best pairs are reported. In fact, it is not simply the four highest ranking pairs which are reported; typically, then they would all be the practically the same primers differering only by a few nucleotides at the ends. To avoid this and ensure a certain diversity among the suggested primers, the overall best scoring pair is reported first. Then, the highest scoring pair whose primers do not overlap by more than a certain number of nucleotides (default 10) with the already reported pair is reported, etc. Moreover, if one of the primers in a pair overlaps with two individual primers already reported, the pair is disregarded.

Scoring criteria

Pairs are scored accoring to the following criteria:

Length of primers' distances to nearest intron (default optimum is 70-150 bp, other distances are penalized), *
AT content in 3' end tail (default: penalize if a primer has more than 50% AT in the last 8 bp),
number of sequences represented in their alignment subsection (default: reward a primer at least partly based on more than 2 sequences),
melting temperatures (reward high temperatures),
GC content (penalize short primers with GC content above 75%),
a terminal G or C in the 3' end (is rewarded),
length of primers (default optimum is 25-35 bp),
ambiguities near 3' end if they have diversity more than 2 (by default, an ambiguity closer than 5 bp from the end is penalized),
high diversity ambiguities (penalty for having diversity more than 2),
'clusters' of ambiguities close to each other or to the ends (penalty if ambuities are not spread out across the primer),
if a primer does not reside in a highly conserved window or is mostly based on only two sequences, it is penalized,
pairs where both primers have low melting temperatures (penalize!),
estimated product length (default acceptable interval 600-2200 bp, optimum 800-1800 bp),
number of ambiguities (penalize).

(All parameters and criteria marked with * above become void when working in the general mode rather than in intron mode).

A note on self-complementarity (quoting the PriFi paper):

Evaluation of self-complementarity is currently not supported. [..] PriFi is first and foremost an attempt to capture the, to some extent, intuitive yet successful practice of our laboratory for primer design, and here, self-complementarity is not taken into account.

Using the Oligo Calculator by Qing Cao, Warren, and Buehler (http://www.basic.nwu.edu/biotools/oligocalc.html), we found that around 10% of the primers had significant regions of self-complementarity that might in theory result in self-priming during PCR. However all these primers have worked well in the laboratory.

Further, one of PriFi's users (Anne Chenuil from Centre d'Oceanologie de Marseille) has sent me this comment:

Dear Jakob

I just read the results of your paper and find out that you actually do not believe much in the self-complementarity criterion ....which I find interesting as it confirms my personal experience: I used to manually check thoroughly the primers and primer pairs for complementarity and this gave me lots of complications, though I observed that primers supposedly prone to these problems actually worked well (as well as primers with a 3' mismatch nucleotide C, by the way...)

If you have questions or comments, please contact:

Jakob Fredslund, assistant professor
BiRC - Bioinformatics Research Center
University of Aarhus
H�egh-Guldbergs Gade 10, Building 1090
DK-8000 �rhus C
Denmark
(jakobf@birc.au.dk)

[1] Jakob Fredslund, Lene H. Madsen, Birgit K. Hougaard, Anna Marie Nielsen, David Bertioli, Niels Sandal, Jens Stougaard, Leif Schauser
A general pipeline for the development of anchor markers for comparative genomics in plants
BMC Genomics 2006, 7:207

Go to the PriFi main page