Predicting recurring concepts on data-streams by means of a meta-model and a fuzzy similarity function

Miguel Ángel Abad, João Bártolo Gomes, Ernestina Menasalvas Ruiz


Stream-mining approach is defined as a set of cutting-edge techniques designed to process streams of data in real time, in order to extract knowledge. In the particular case of classification, stream-mining has to adapt its behavior to the volatile underlying data distributions, what has been called concept drift. It is important to note that concept drift may lead to situations where predictive models become invalid and have therefore to be updated to represent the actual concepts that data poses. In this context, there is a specific type of concept drift, known as recurrent concept drift, where the concepts represented by data have already appeared in the past. In those cases the learning process could be saved or at least minimized by applying a previously trained model. To deal with the aforementioned scenario, meta-models can be used in the process of enhancing the drift detection mechanisms used by data stream algorithms, by representing and predicting when the change will occur. There are some real-world situations where a concept reappears, as in the case of intrusion detection systems (IDS), where the same incidents or an adaptation of them usually reappear over time. In these environments the early prediction of drift by means of a better knowledge of past models can help to anticipate to the change, thus improving efficiency of the model regarding the training instances needed. Furthermore, as a complement of meta-models, a mechanism to assess the similarity between classification models is also needed when dealing with recurrent concepts. In this context, when reusing a previously trained model a rough comparison between concepts is usually made, applying boolean logic. The introduction of fuzzy logic comparisons between models could lead to a better efficient reuse of previously seen concepts, by applying not just equal models, but also similar ones. This work faces the aforementioned open issues by means of the MM-PRec system, that integrates a meta-model mechanism and a fuzzy similarity function. The theoretical proposal of MM-PRec is also validated in this paper by means of different experiments using both synthetic and real datasets.

Published in: Expert Systems with Applications





Miguel Ángel Abad, João Bártolo Gomes, Ernestina Menasalvas Ruiz. (2016) "Predicting recurring concepts on data-streams by means of a meta-model and a fuzzy similarity function" In Expert Systems with Applications. p. 87-105.