IMPALA at the Blocks Server

IMPALA (Integrating Matrix Profiles And Local Alignments) is a program that searches a protein query sequence against a multiple alignment database represented as a collection of PSI-BLAST checkpoint files. IMPALA has been implemented on the Blocks Server to search a blocks database, such as Blocks. Although the Blocks Searcher performs a similar type of search and both utilize position-specific scoring matrix (PSSM) representations of the Blocks Database, there are differences between IMPALA and the Blocks Searcher in the PSSMs used, in the alignments reported, and in the calculation of statistics that can lead to somewhat different results. Therefore, any marginal similarity detected with one searching program should be confirmed using the other. We have found that both programs generally detect true positive hits but they tend to report different false positives, and so any hit not detected by both searching programs should be regarded with caution.

Blocks are ungapped multiple alignments representing the most highly conserved regions of proteins. Whereas the Block Searcher scores individual blocks separately and then combines the scores for blocks in a family, IMPALA scores the set of blocks for a family as a whole so a hit is for the whole family, not for an individual block. For instance, the Block Searcher aligns and scores the query sequence versus each of the six PSSMs representing the six blocks (IPB001525A-F) for the C5 DNA methyltransferase family (IPB001525) in Blocks, while IMPALA aligns and scores a single PSSM representing the full set of blocks.

To make the checkpoint file PSSM representation of a family for IMPALA on the Block Server, PSI-BLAST is applied to the full set of sequences for the family using the COBBLER (Consensus Biasing By Locally Embedding Residues) embedded sequence as query, iterating until convergence. The COBBLER sequence is a single representative sequence for the family stretching from 10 residues upstream of the first block to 10 residues downstream of the last block, with consensus residues in block positions.

Since IMPALA scores not only the blocks but also regions between them, its alignments may extend beyond the blocks. The resulting BLAST-like output gives scores and expected values, This differs from the Block Searcher which provides expected values for individual blocks as well as an overall expected value for the family, and alignments to the individual blocks. In theory, the Block Searcher should be more specific and IMPALA more sensitive. However, other differences complicate any simple comparison, including differences in implementation of the searching algorithms, and differences in the methods used for calculating PSSM column scores.

We thank the NCBI BLAST group, especially Alejandro Schäffer for making IMPALA available for searching the Blocks databases.

Reference:
Alejandro A. Schaffer, Yuri I. Wolf, Chris P. Ponting, Eugene V. Koonin, L. Aravind, Stephen F. Altschul, "Software to Match a Protein Sequence Against a Collection of PSI-BLAST-Constructed Position-Specific Score Matrices", manuscript.


[Blocks Home] [Impala Searcher] [Block Searcher]