SEQL-NRPS predicts the substrate of non-ribosomal peptide synthetases A-domains using Sequence Learner (SEQL), a discriminative classification method for sequences. The web service was developed at the Bioinformatics Research Centre (BiRC), Aarhus University, Denmark, as part of the work described in .
The web service accepts input in FASTA format, for example:
>sequence1 some description YREVNERANQFAHWLIQGPVRVRPGALIGLYLDKSDLGVVATFGIWKSGAAYVPIDPAYPAERIRFLVGDTGLSGIVTN RRHAERLREVLGDEHASVHVIEVEAVVAGPHPEQARENPGLALSSRDRAYVTYTSGTTGVPKGVPKYHYSVVNSITDLS ERYDMRRPGTERVALFASYVFEPHLRQTLIALINEQTLVIVPDDVRLDPDLFPEYIERHGVTYLNATGSVLQHFDLRRC ASLKRLLLVGEELTASGLRQLREKFAGRVVNEYAFTEAAFVTAVKEFGPGVTERRDRSIGRPLRNVKWYVLSQGLKQLP IGAIGELYIGGCGVAPGYLNRDDLTAERFTANPFQTEEEKARGRNGRLYRTGDLARVLLNGEVEFMGRADFQLKLNGVR VEPGEIEAQATEFPGVKKCVVV >sequence2 some other description YREVNERANQFAHEAVVAGPHPEQARENPGLALSSRDRAYVTYTSGTTGVPKGVPKYHYSVVNSITDLSERYDMRRPGT ERVALFASYVFEPHLRQTLIALINEQTLVIVPDDVRLDPDLFPEYIERHGVTYLNATGSVLQHFDLRRCASLKRLLLVG EELTASGLRQLAERFTANPFQTEEEKARGRNGRLYRTGDLARVLLNGEVEFMGRADFQLKLNGVRVEPGEIEAQATEFP GVKKCVVV
The format consists of a header and a sequence for each entry. The header is a single line which begins with the ">" character. The header is divided into two: the id and description. The id is the contents of the header up to the first space. The description is the rest of the header. The sequence must be an A-domain amino acid sequence (nucelotide sequences are not supported).
The web service outputs a table with the id of each input sequence, the predicted substrate of the A-domain and the probability of the prediction given the model. The probability is colored by confidence. The confidence is computed as follows:
For a given sequence, a probability p is assigned to the sequence for each model. The maximum, mean and standard deviation of the probabilities is computed. The probability is then colored according to the number of standard deviations the maximum is from the mean:
Finally, each input sequence is shown and annotated with the motifs that were found to positively (green) or discriminate the predicted substrate from all other substrates. The sequence is also annotated with the motifs that discriminate against the predicted substrate.
 Computational discovery of specificity-conferring sites in non-ribosomal peptide syntheases. Michael Knudsen, Dan Søndergaard, Claus Olesen, Ditlev Egeskov Brodersen, and Christian N. S. Pedersen. Pending publication. 2015.