Textual Information Systems

Semestr: Winter

Range: 2+2s


Credits: 4

Programme type: Undefined

Study form:

Course language:


Classification of textual information systems, string searching methods, KMP algorithm, AC algorithm, finite automata (FA), BM and CW algorithms, two-way jumping automata, approximate string searching, index methods, text analysis, thesaurus, signature methods, data compression-models and coding, statistical methods, dictionary methods, spell-checkers, hypertext.


stringology, pattern matching, string, sequence, datacompression, algorithms

Course syllabus:

1. Basic notions and a classification of information systems
2. Pattern matching, models of pattern matching algorithms
3. Simulation of non-deterministic FA, dynamic programming and bit-wise parallelism
4. Pattern matching engines, KMP and AC algorithms
5. Backward string matching, BM and CW algorithms
6. Two-way finite automata with jump
7. Factor automata
8. Indexing, text analysis, thesaurus
9. Signature methods
10. Data compression, modeling and coding
11. Statistical methods of data compression
12. Dictionary methods of data compression
13. Syntactical methods of data compression
14. Spell-checking

Seminar syllabus:

1. LaTeX, basic notions
2. LaTeX, mathematical typesetting
3. LaTeX, graphics
4. Finite automata for string matching
5. Finite automata for sequence matching
6. Simulation of finite automata, dynamic programming
7. Simulation of finite automata, bit-wise parallelism
8. Boyer-Moore algorithm and its variants
9. Two-way automata with jump
10. Fulltext system with indexing
11. Data compression, statistical methods
12. Data compression, dictionary methods
13. Models of data for data compression


B. Melichar, J. Holub, T. Polcar: Text Searching Algorithms. Unpublished manuscript.
B. Melichar: Textové informační systémy. Textbook, in Czech, CTU FEE, 1996.
M. Crochemore, W. Rytter: Jewels of Stringology. World Scientific, 2002.
D. Salomon: Data Compression. Springer, 2004.