SEQL-NRPS

SEQL-NRPS predicts the substrate of non-ribosomal peptide synthetases A-domains using Sequence Learner (SEQL), a discriminative classification method for sequences. The web service was developed at the Bioinformatics Research Centre (BiRC), Aarhus University, Denmark, as part of the work described in [1].

Classify an A-domain

or upload a file with sequences in FASTA format.

Enter your email address to receive an email when your job has completed.

Instructions

Input

The web service accepts input in FASTA format, for example:

>sequence1 some description
YREVNERANQFAHWLIQGPVRVRPGALIGLYLDKSDLGVVATFGIWKSGAAYVPIDPAYPAERIRFLVGDTGLSGIVTN
RRHAERLREVLGDEHASVHVIEVEAVVAGPHPEQARENPGLALSSRDRAYVTYTSGTTGVPKGVPKYHYSVVNSITDLS
ERYDMRRPGTERVALFASYVFEPHLRQTLIALINEQTLVIVPDDVRLDPDLFPEYIERHGVTYLNATGSVLQHFDLRRC
ASLKRLLLVGEELTASGLRQLREKFAGRVVNEYAFTEAAFVTAVKEFGPGVTERRDRSIGRPLRNVKWYVLSQGLKQLP
IGAIGELYIGGCGVAPGYLNRDDLTAERFTANPFQTEEEKARGRNGRLYRTGDLARVLLNGEVEFMGRADFQLKLNGVR
VEPGEIEAQATEFPGVKKCVVV
>sequence2 some other description
YREVNERANQFAHEAVVAGPHPEQARENPGLALSSRDRAYVTYTSGTTGVPKGVPKYHYSVVNSITDLSERYDMRRPGT
ERVALFASYVFEPHLRQTLIALINEQTLVIVPDDVRLDPDLFPEYIERHGVTYLNATGSVLQHFDLRRCASLKRLLLVG
EELTASGLRQLAERFTANPFQTEEEKARGRNGRLYRTGDLARVLLNGEVEFMGRADFQLKLNGVRVEPGEIEAQATEFP
GVKKCVVV

The format consists of a header and a sequence for each entry. The header is a single line which begins with the ">" character. The header is divided into two: the id and description. The id is the contents of the header up to the first space. The description is the rest of the header. The sequence must be an A-domain amino acid sequence (nucelotide sequences are not supported).

Output

The web service outputs a table with the id of each input sequence, the predicted substrate of the A-domain and the probability of the prediction given the model. The probability is colored by confidence. The confidence is computed as follows:

For a given sequence, a probability p is assigned to the sequence for each model. The maximum, mean and standard deviation of the probabilities is computed. The probability is then colored according to the number of standard deviations the maximum is from the mean:

Finally, each input sequence is shown and annotated with the motifs that were found to positively (green) or discriminate the predicted substrate from all other substrates. The sequence is also annotated with the motifs that discriminate against the predicted substrate.

Downloads

References

[1] Computational discovery of specificity-conferring sites in non-ribosomal peptide syntheases. Michael Knudsen, Dan Søndergaard, Claus Olesen, Ditlev Egeskov Brodersen, and Christian N. S. Pedersen. Pending publication. 2015.