> Home > Publications > Protein sequence pattern mining with constraints.
Document Actions

Protein sequence pattern mining with constraints.


2005-33

Considering the characteristics of biological sequence databases, which typically have a small alphabet, a very long length and a relative small size (several hundreds of sequences), we propose a new sequence mining algorithm (gIL). gIL was developed for linear sequence pattern mining and results from the combination of some of the most efficient techniques used in sequence and itemset mining. The algorithm exhibits a high adaptability, yielding a smooth and direct introduction of various types of features into the mining process, namely the extraction of rigid and arbitrary gap patterns. Both breadth or a depth first traversal are possible. The experimental evaluation, in synthetic and real life protein databases, has shown that our algorithm has superior performance to state-of-the art algorithms. The use of constraints has also proved to be a very useful tool to specify user interesting patterns.

@inproceedings{DBLP:conf/pkdd/FerreiraA05,
  author    = {Pedro Gabriel Ferreira and
               Paulo J. Azevedo},
  title     = {Protein Sequence Pattern Mining with Constraints.},
  booktitle = {PKDD},
  year      = {2005},
  pages     = {96-107},
  ee        = {http://dx.doi.org/10.1007/11564126_14},
  crossref  = {DBLP:conf/pkdd/2005},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
@proceedings{DBLP:conf/pkdd/2005,
  editor    = {Al\'{\i}pio Jorge and
               Lu\'{\i}s Torgo and
               Pavel Brazdil and
               Rui Camacho and
               Jo{\~a}o Gama},
  title     = {Knowledge Discovery in Databases: PKDD 2005, 9th European
               Conference on Principles and Practice of Knowledge Discovery
               in Databases, Porto, Portugal, October 3-7, 2005, Proceedings},
  booktitle = {PKDD},
  publisher = {Springer},
  series    = {Lecture Notes in Computer Science},
  volume    = {3721},
  year      = {2005},
  isbn      = {3-540-29244-6},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

http://dx.doi.org/10.1007/11564126_14


Pedro Gabriel Ferreira, Paulo J. Azevedo, Protein Sequence Pattern Mining with Constraints, Lecture Notes in Computer Science, Volume 3721, Nov 2005, Page 96

Springer Berlin / Heidelberg

In Proceedings

Web of Science, DBLP

Bioinformatics, Databases

Paulo Jorge Azevedo