Predicting protein properties with spatial statistical machine learning

Proteins are molecules vital for life. There is a vast number of different proteins but their functions are known only for some of them. Prediction of protein function is one of the most important goals of current computational biology. Function and other protein properties follow mainly from their 3D spatial structure (on atom or amino acid level). In this project funded by the Czech Science Foundations, we explore algorithms able to learn predictive models for protein functions from such spatial data complemented with auxiliary data (e.g. sequential, phylogeny, interaction and expression data, evolutionarily conserved motifs). A salient achievement supported by this project is the software suite TreeLiker implementing our machine learning-algorithms that find important spatial motifs in protein structures and represents them using the language of first-order predicate logic with a special syntactic bias (tree-like structure of logical clauses). Lately, we have explored further related topics, namely genomic/proteomic sequence assembly algorithms which find discriminative sub-sequences.

Involved: Filip Železný, Ondřej Kuželka (alumni), Andrea Szabóová (alumni), Andrea Fuksová (alumni), Radomír Černoch, Karel Jalovec, František Malinka