CS seminar series presents
Transformation of User Traffic Structure into Vectorial Form
by Tomas Komarek
Thursday, April 27 at 14:00 in 205.
The presentation will be concerned with detection of infected users in computer networks based on supervised learning paradigm applied to network telemetry data.
In such setting, there is a fundamental incompatibility between standard learning algorithms and the structure that represents user's behaviors in the network data. Standard learning algorithms require input samples in the form of vectors, whereas samples of user's behaviors, which are subject of the classification task (infected/clean), have inherently a more complex structure. The structure reflects the fact that a user can establish a communication with an arbitrary number of end-points (i.e., servers or domains) and each such communication is described by another variable number of feature vectors that are extracted from corresponding log records (e.g., proxy logs or NetFlows). Formally, the structure has the form of a set of sets of vectors instead of a single vector as expected by the learning algorithms.
Ability to express user's traffic as a single vector would significantly reduce labeling costs and could improve efficacy results.