Fulltext Systems

Semestr: Winter

Range: 14+4s

Completion:

Credits: 5

Programme type: Undefined

Study form: Parttime

Course language:

Summary:

Classification of textual information systems, string searching methods, KMP algorithm, AC algorithm, finite automata (FA), BM and CW algorithms, two-way jumping automata, approximate string searching, index methods, text analysis, thesaurus, signature methods, data compression-models and coding, statistical methods, dictionary methods, spell-checkers, hypertext.

Keywords:

Course syllabus:

1. Basic notions and a classification of information systems
2. Pattern matching, models of pattern matching algorithms
3. Simulation of non-deterministic FA, dynamic programming and bit-wise parallelism
4. Pattern matching engines, KMP and AC algorithms
5. Backward string matching, BM and CW algorithms
6. Two-way finite automata with jump
7. Factor automata
8. Indexing, text analysis, thesaurus
9. Signature methods
10. Data compression, modeling and coding
11. Statistical methods of data compression
12. Dictionary methods of data compression
13. Syntactical methods of data compression
14. Spell-checking

Seminar syllabus:

1. Finite automata - recall
2. Finite automata for string matching
3. Finite automata for sequence matching
4. Simulation of finite automata, dynamic programming
5. Simulation of finite automata, bit parallelism
6. Boyer-Moore algorithm and its variants
7. Two-way automata with jump
8. Factor automata
9. Fulltext system with indexing
10. Index construction
11. Signature methods
12. Data compression, statistical methods
13. Data compression, dictionary methods
14. Models of data for data compression

Literature:

1. M. Crochemore and W. Rytter, Text Algorithms, Oxford University Press,
New York, 1994.

Examiners:

Lecturers:

Instructors: