Incremental On-line Learning：A Review and Comparison of State of the Art Algorithms

转自：https://blog.csdn.net/qq_33880788/article/details/80496385

翻译论文：Incremental On-line Learning：A Review and Comparison of State of the Art Algorithms
作者：Viktor Losing, Barbara Hammer, Heiko Wersing
发表在 2018 neurocomputing

摘要

最近，增量式和在线学习受到更多关注，特别是在大数据和从数据流中学习的背景下，与传统的完整数据可用性假设相冲突。尽管有各种不同的方法可供使用，但通常还不清楚哪些方法适用于特定的任务，以及它们如何相互比较。我们分析了代表不同算法类的八种流行增量方法的关键属性。因此，我们对他们的在线分类错误以及他们在极限情况下的行为进行评估。此外，我们讨论了每种方法专门针对超参数优化常常被忽视的问题，并且基于一小组示例来测试它可以如何强健地完成。我们对具有不同特性的数据集进行了广泛的评估，从而提供了有关精度，收敛速度和模型复杂性的性能概览，便于为给定应用选择最佳方法。

1 引言

如今，所有可以想象的信息中的大部分都以数字形式收集和存储，积累到巨大的日增量。 Google每天收到35亿次搜索查询; 近2亿活跃用户的Facebook共享45亿条内容; 亚马逊在全球范围内销售约1300万件产品。收集各种客户信息，原始交易数据以及个人点击行为，以提供诸如个性化推荐的服务。估计亚马逊销售额的35％净销售额达到1070亿美元，归功于其推荐引擎。这些开创性的公司表明，信息可以成为数十亿美元业务的中心支柱。即使是小公司也采用这种方法，现在数字化他们参与的每一次交易，以提高他们的营业额。
数据收集也通过手机，智能手表和智能手机等移动设备完成，并持续跟踪各种用户信息，如通话记录，GPS位置，心率和活动。它在科学领域也是无所不在的：天文观测台，地球传感卫星和气候观测网每天产生数TB的数据。同时，数据产生的速度进一步迅速增加 - 全球所有数据的90％是在过去两年中产生的。
机器学习方法被用来挖掘所收集的相关信息的数据和/或通过生成的模型预测未来的发展。然而，当所有数据同时到达时，经典批量机器学习方法并不能满足在给定时间内处理纯粹数量的需求，导致未处理数据越来越多。此外，他们不会不断地将新的信息整合到已经构建的模型中，而是定期从头开始重新构建新的模型。这不仅非常耗时，而且还会导致潜在的过时模型。
克服这种情况需要将流式方案中的顺序数据处理转变为范式。这不仅可以在可用时立即使用信息，从而随时更新最新的模型，而且还可以降低数据存储和维护的成本。
增量和在线算法自然适合该方案，因为它们不断将信息纳入其模型，并且传统上旨在最小化处理时间和空间。由于其持续大规模和实时处理的能力，他们最近，特别在大数据背景下，获得了更多关注[1]。
增量算法也非常适合超越生产阶段的学习，使设备能够适应个人客户的习惯和环境。这对智能家居产品特别有用[2,3]。这里主要的挑战不是大规模的处理，而是从少数数据中持续有效地学习。尽管在这种情况下，增量学习可以通过在云中进行重复性批量学习来取代，但此解决方案具有严重的缺点。需要与云建立永久连接以提供任何时间的模型，这可能并不总是可行的。此外，由于隐私原因，客户可能不愿意提供他们日常生活的数据。因此，以有效方式直接在设备上学习仍然是非常需要的。关于文献中增量和在线学习的定义涉及很多含糊不清的问题。有些作者可以互换地使用它们，而有些则以不同的方式区分它们。诸如终身学习或进化学习等附加术语也被同义使用。我们将增量学习算法定义为一个在给定的训练数据流 s1,s2,⋯,st

上生成一系列模型 h1,h2,⋯,ht 的算法。在我们的例子中，si 被标记为训练数据 si=(xi,yi)∈ℝn×{1,⋯,C} 和 hi:ℝn{1,⋯,C} 是仅取决于 hi−1 和最近的 p 个例子 si，⋯,si−p 的模型函数，其中 p 被严格限制。我们将在线学习算法指定为增量式学习算法，这些算法在模型复杂性和运行时间方面有所限制，能够在资源有限的设备上进行无尽/终身学习。增量学习算法面临以下挑战：
• 该模型必须逐渐适应，即 hi+1基于 hi 构建而没有完全重新训练。
• 保存以前获得的知识，并且没有灾难性遗忘的影响[4]。
• 只允许有限数量的 p 个训练实例为主要内容
我们明确地假定数据要被标记，并且不关注从未标记或部分标记的数据流中学习的非关键情况。监督增量学习的设置可以应用于大多数预测场景。在这些系统做出预测之后，真正的标签通常可以推迟一些推断。例如，考虑汽车司机在过路处采取的行动路线。一旦汽车通过道口，记录的数据可以自动分析和标记。监督设置还包括明确提供标签的任务。例如，单个用户将电子邮件标记为用于垃圾邮件分类的垃圾邮件，但是在人与机器人交互中标签可能被明确要求。
一个算法必须根据给定任务的先决条件来选择，因为不存在一种在每个场景中都能最优执行的方法[5]。到目前为止，已经发布了不同有趣的增量学习算法，并具有各种优势和弱点。然而，由于基本上没有比较深入的研究，所以只有少数几个来源提供关于它们的信息，根据最相关的标准通过实验比较最常用的方法是可用的。在文献中的广泛研究通常导致认为算法的原始出版物由于以下原因而仅在一定程度上有所帮助：
作者自然集中于展示其方法的优点，因此将它们应用于特定设置（特别是算法设计的设置）。提议的算法通常与几个数据集上的一个或两个其他方法进行比较，仅提供有限的整体算法质量图。即使人们接受重现结果的努力，由于专有数据集或未知的超参数设置，通常也是不可能的。最后，人们可以根据自己的经验选择一种方法，通常只包含一小部分可用的算法，或者简单地投入大量资源来尝试几种方法。
在本文中，我们通过分析八种常用方法的核心属性来填补这一空白。我们的研究旨在对算法整体性能进行基本比较，而不受限于特定场景，例如资源非常有限的平台。但是，特定设置的性能可以从本文提供的一般结果中推断出来。我们根据预先通常可用的基本信息（例如维度/样本数量）指导算法选择1。我们的离线和在线设置评估可在精度，收敛速度和模型复杂度方面进行广泛比较。不同数据集上的实验评估各自方法的优缺点，并就其对特定任务的适用性提供指导。此外，我们分析了超参数优化（HPO）的过程，并研究了如何基于一小组示例来估计它们的强健程度。
我们的重点在于增量/在线算法的监督学习下的分类。我们主要对固定数据集进行评估（即我们假设流s1,s2,⋯是 i.i.d.

）。但是，我们在概念漂移的背景下简要评估和讨论这些方法。文献[6]给出了最近对特别设计用于处理非平稳环境的方法的综述。这篇文章的结构安排如下。在第2节中，我们将讨论相关的贡献，特别是那些针对增量学习领域的贡献。第3节简要介绍了所考虑的算法。第4节介绍了由离线和在线方案分析组成的评估框架。第5部分详细介绍了我们主要的工作重点，并详细介绍了所进行的实验。在这里，我们分析不同设置的算法，并讨论时间效率，终身学习适用性，HPO等属性。最后，第6节简要总结了我们的结果，并以表格形式将其压缩。

2 相关工作

许多增量和在线算法已经发表，通常将现有的批处理方法调整为增量设置[7,8]。已经完成了大量的理论工作来评估它们在静止环境中的泛化能力和收敛速度[9,10]，通常伴随着假设，例如线性可分数据[11]。
虽然增量和在线学习领域已经很成熟，特别是在大数据或物联网技术的背景下应用[12]，但只有少数出版物以一般方式针对该领域。其中大多数是调查描述可用的方法和一些应用领域[13,14]。
Giraud-Carrier和Christophe [15]给出了增量学习的一些动机，并为学习任务定义了渐进性的概念。他们主张增量学习方法适用于增量任务，但也指出诸如排序效应或可信度问题等问题。一项研究最近由Gepperth和Hammer [16]发表。他们正式形成增量学习，并讨论理论以及实践中出现的实际挑战。此外，还给出了具有相应的真实世界应用的常用算法的概述。
在流式场景设置中，尽管大多数工作都是针对概念漂移的[19,20,6]，但增量学习更经常使用[17,18]。 Domingos和Hulten定义了增量算法的关键属性，这些算法需要跟上快速增长的数据输出速率[21]。他们强调将严格限制在处理时间和空间方面的模型与理论性能保证结合起来的必要性。
在增量学习领域，具有实践重点的出版物非常罕见。 Read等人在概念漂移的背景下完成了其中的一个，比较和分析了实例增量算法和批量增量方法的优缺点。得出的结论是，实例增量算法同样精确，但使用较少的资源，并且具有滑动窗口的惰性方法表现得非常好。
Fernandez等人[23]完成了一项大规模的研究，其中包括对121个数据集的179批处理分类器的评估。这种定量研究考虑了不同语言和工具箱中的不同实现。最好的结果是通过随机森林[24]算法紧随其后的是具有高斯核的支持向量机（SVM）[25]。
但是，对于增量算法而言，这样的工作仍然非常糟糕。在本文中，我们追求更定性的方法，而不是大规模的比较，提供对固定环境中主要方法的深入评估。除了准确性之外，我们还检查模型的复杂性，从而可以根据时间和空间推断所需资源。对收敛速度和HPO等相当被忽视的方面的考虑使我们的分析更加完善。

3 算法

我们的方法比较涵盖了广泛的算法家族。代表了贝叶斯，线性和基于实例的模型以及树集合和神经网络。诸如增量支持向量机之类的依赖于模型的方法由首字母缩略词（SVM）表示，而与模型无关的方法（随机梯度下降）由具有附加索引（SGDLin）的首字母缩写表示，指定所应用的模型。下面简要描述这些方法。
增量支持向量机（ISVM）是SVM中最流行的精确增量版本，并在[7]中被引入。除了这组支持向量之外，维护所谓的“候选向量”的有限数量的样本。这些样本可能根据未来样本提升为新的支持向量。候选向量集越小，丢失潜在支持向量的概率越大。如果候选向量集包含所有先前看到的数据，ISVM是一种无损算法 - 它会生成与相应批处理算法相同的模型。最近的应用可以在[26,27]中找到。
LASVM是一种在线近似SVM求解器，在[28]中提出。以另一种方式，它检查当前处理的样本是否是支持向量，并删除过时的支持向量。对于这两个步骤，它都大量使用顺序方向搜索，因为它也是在顺序最小优化（SMO）算法中完成的[29]。与ISVM不同，它不保留一组候选向量，而是仅考虑当前样本作为支持向量的可能。因此出现了近似的解决方案，但显着减少了训练时间。它最近在[30,31]中被应用。
在线随机森林（ORF）[32]是随机森林算法的增量版本。只要在一片叶子内收集到足够的样本，预定数量的树就会不断增加分割。根据Extreme Random Trees [33]的方案测试预定数量的随机值，而不是计算局部最优分割。选择优化基尼指数最多的分割值。由于它们的高精度，简单性和并行化能力，tree-ensemble非常受欢迎。此外，它们对特征缩放不敏感，并且可以在实践中轻松应用。这种方法最近已在[34,35]中应用。
增量学习向量量化（ILVQ）是静态广义学习向量量化（GLVQ）[36]的一种动态增长模型，它在需要时插入新的原型。插入率由错误分类样本的数量决定。我们使用[37]中的版本，该版本引入了原型布局策略，以最小化最近样本的滑动窗口上的损失。如[38,39]中所述的度量学习也可以用于进一步扩展分类能力。
Learn ++（LPPCART

）[40]以预定义大小的块处理输入样本。对于每个块，训练基本分类器的集合并通过加权多数投票的方式组合成“集合中的集合（ensemble of ensembles）”。与AdaBoost [41]算法相似，每个分类器都使用根据分布绘制的块样本的子集进行训练，从而确保误分类输入的样本概率更高。 LPP是一种独立于模型的算法，作者已经成功地应用了几种不同的基本分类器，如SVM，分类和回归树[42]（CART）和多层感知器[43]。作为原作者，我们使用流行的CART作为基础分类器。块式的训练模型固有地根据大块尺寸包含适应延迟。该算法最近在[44,45]中使用。
参考文献
[1] M. Chen, S. Mao, Y. Liu, Big data: A survey, Mobile Netw. and Appl. 19 (2). doi:10.1007/s11036-013-0489-0. URL http://dx.doi.org/10.1007/s11036-013-0489-0
[2] R. Yang, M. W. Newman, Learning from a learning thermostat: Lessons
for intelligent systems for the home, UbiComp ’13, ACM, 2013, pp. 93-102. 23doi:10.1145/2493432.2493489.URL http://doi.acm.org/10.1145/2493432.2493489
[3] B. D. Carolis, S. Ferilli, D. Redavid, Incremental learning of daily routines
as workflows in a smart home environment, ACM 4 (4) (2015) 20:1-20:23. doi:10.1145/2675063. URL http://doi.acm.org/10.1145/2675063
[4] R. M. French, Catastrophic forgetting in connectionist networks, Trends in
cognitive sciences 3 (4) (1999) 128-135.
[5] D. H. Wolpert, The supervised learning no-free-lunch theorems, in: Soft Computing and Industry, Springer, 2002, pp. 25-42.
[6] G. Ditzler, M. Roveri, C. Alippi, R. Polikar, Learning in nonstationary environments: A survey, Computational Intelligence Magazine 10 (4) (2015)
12-25.
[7] G. Cauwenberghs, T. Poggio, Incremental and decremental support vector
machine learning, in: Proc. NIPS, 2001.
[8] N.-Y. Liang, G.-B. Huang, P. Saratchandran, N. Sundararajan, A fast and
accurate online sequential learning algorithm for feedforward networks, NN
17 (6) (2006) 1411{1423. doi:10.1109/TNN.2006.880583.
[9] N. Cesa-Bianchi, G. Lugosi, Prediction, learning, and games, Cambridge
university press, 2006.
[10] T. L. Watkin, A. Rau, M. Biehl, The statistical mechanics of learning a
rule, Reviews of Modern Physics 65 (2) (1993) 499.
[11] N. A, On convergence proofs of perceptrons, Proc. Symp.Mathematical
Theory of Automata XII (2) (1962) 615-622.
[12] L. Atzori, A. Iera, G. Morabito, The internet of things: A survey, Computer
networks 54 (15) (2010) 2787-2805.
[13] R. Ade, P. Deshmukh, Methods for incremental learning: a survey, International Journal of Data Mining & Knowledge Management Process 3 (4)
(2013) 119.
[14] P. Joshi, P. Kulkarni, Incremental learning: areas and methods-a survey,
International Journal of Data Mining & Knowledge Management Process 2 (5) (2012) 43.
[15] C. Giraud-Carrier, A note on the utility of incremental learning, Ai Communications 13 (4) (2000) 215-223.
[16] A. Gepperth, B. Hammer, Incremental learning algorithms and applications, in: European Sympoisum on Artificial Neural Networks (ESANN),2016.
[17] M. M. Gaber, A. Zaslavsky, S. Krishnaswamy, Mining data streams: a
review, ACM Sigmod Record 34 (2) (2005) 18-26.
[18] C. C. Aggarwal, Data Classification: Algorithms and Applications, 1st Edition, Chapman & Hall/CRC, 2014.
[19] I. Zliobaite, Learning under concept drift: an overview, CoRR abs/1010.4784.
URL http://arxiv.org/abs/1010.4784
[20] J. Gama, I. Zliobait_e, A. Bifet, M. Pechenizkiy, A. Bouchachia, A survey on ˇ
concept drift adaptation, ACM Computing Surveys (CSUR) 46 (4) (2014) 44.
[21] P. Domingos, G. Hulten, A general framework for mining massive data streams, Journal of Computational and Graphical Statistics 12 (4) (2003) 945-949.
[22] J. Read, A. Bifet, B. Pfahringer, G. Holmes, Batch-incremental versus
instance-incremental learning in dynamic and evolving data, in: International Symposium on Intelligent Data Analysis, Springer, 2012, pp. 313-323.
[23] M. Fern´andez-Delgado, E. Cernadas, S. Barro, D. Amorim, Do we need hundreds of classifiers to solve real world classification problems, J. Mach. Learn. Res 15 (1) (2014) 3133-3181.
[24] L. Breiman, Random forests, Machine learning 45 (1) (2001) 5-32.
[25] C. Cortes, V. Vapnik, Support-vector networks, Machine learning 20 (3)(1995) 273-297.
[26] B. Biggio, I. Corona, B. Nelson, B. I. Rubinstein, D. Maiorca, G. Fumera,G. Giacinto, F. Roli, Security evaluation of support vector machines in adversarial environments, in: Support Vector Machines Applications, Springer, 2014, pp.105-153.
[27] Y. Lu, K. Boukharouba, J. Boonært, A. Fleury, S. Lecoeuche, Application of an incremental svm algorithm for on-line human recognition from video surveillance using texture and color features, Neurocomputing 126 (2014) 132-140.
[28] A. Bordes, S. Ertekin, J. Weston, L. Bottou, Fast kernel classifiers with online and active learning, Journal of Machine Learning Research 6 (2005) 1579-1619.
URL http://leon.bottou.org/papers/bordes-ertekin-weston-bottou-2005
[29] J. Platt, Sequential minimal optimization: A fast algorithm for training
support vector machines, Tech. rep. (April 1998).
URL https://www.microsoft.com/en-us/research/publication/sequential-minimal-optimization-a-fast-algorithm-for-training-support-vector-machines/
[30] C.-J. Hsieh, S. Si, I. S. Dhillon, A divide-and-conquer solver for kernel
support vector machines., in: ICML, 2014, pp. 566-574.
[31] Z. Cai, L. Wen, Z. Lei, N. Vasconcelos, S. Z. Li, Robust deformable and
occluded object tracking with dynamic graph, IEEE Transactions on Image
Processing 23 (12) (2014) 5497-5509.
[32] A. Saffari, C. Leistner, J. Santner, M. Godec, H. Bischof, On-line random
forests, in: ICCV Workshops 2009 IEEE 12th International Conference on,
2009.
[33] P. Geurts, D. Ernst, L. Wehenkel, Extremely randomized trees, ML 63 (1).
doi:10.1007/s10994-006-6226-1.
URL http://dx.doi.org/10.1007/s10994-006-6226-1
[34] B. Lakshminarayanan, D. M. Roy, Y. W. Teh, Mondrian forests: Efficient
online random forests, in: Advances in neural information processing systems, 2014, pp. 3140{3148.
[35] F. Pernici, A. Del Bimbo, Object tracking by oversampling local features,
IEEE transactions on pattern analysis and machine intelligence 36 (12)
(2014) 2538-2551.
[36] A. Sato, K. Yamada, Generalized learning vector quantization., in: NIPS,
MIT Press, 1995.
[37] V. Losing, B. Hammer, H. Wersing, Interactive online learning for obstacle
classification on a mobile robot, in: IJCNN 2015, 2015, pp. 1{8. doi:10.1109/IJCNN.2015.7280610.
[38] P. Schneider, M. Biehl, B. Hammer, Adaptive relevance matrices in learning
vector quantization, Neural Comput. 21 (12) (2009) 3532{3561. doi:10.
1162/neco.2009.11-08-908.
URL http://dx.doi.org/10.1162/neco.2009.11-08-908
[39] K. Bunte, P. Schneider, B. Hammer, F.-M. Schleif, T. Villmann, M. Biehl,
Limited rank matrix learning, discriminative dimension reduction and visualization, Neural Networks 26 (2012) 159-173.
[40] R. Polikar, L. Upda, S. Upda, V. Honavar, Learn++: an incremental learning algorithm for supervised neural networks, SMC 31 (4) (2001) 497-508.
doi:10.1109/5326.983933.
[41] Y. Freund, R. E. Schapire, A short introduction to boosting, in: In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, Morgan Kaufmann, 1999, pp. 1401-1406.
[42] L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and Regression Trees, Wadsworth and Brooks, Monterey, CA, 1984, new edition.
26[43] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning internal representations by error propagation, Tech. rep., DTIC Document (1985).
[44] M. De-la Torre, E. Granger, P. V. Radtke, R. Sabourin, D. O. Gorodnichy,
Partially-supervised learning from facial trajectories for face recognition in
video surveillance, Information Fusion 24 (2015) 31-53.
[45] J. F. G. Molina, L. Zheng, M. Sertdemir, D. J. Dinter, S. Sch¨onberg,
M. R¨adle, Incremental learning with svm for multimodal classification of
prostatic adenocarcinoma, PloS one 9 (4) (2014) e93600.
[46] J. Tang, C. Deng, G.-B. Huang, Extreme learning machine for multilayer
perceptron, IEEE transactions on neural networks and learning systems
27 (4) (2016) 809-821.
[47] J. Tang, C. Deng, G.-B. Huang, B. Zhao, Compressed-domain ship detection on spaceborne optical image using deep neural network and extreme
learning machine, IEEE Transactions on Geoscience and Remote Sensing
53 (3) (2015) 1174-1185.
[48] H. Zhang, The Optimality of Naive Bayes., in: V. Barr, Z. Markov (Eds.),
FLAIRS Conference, AAAI Press, 2004.
URL http://www.cs.unb.ca/profs/hzhang/publications/FLAIRS04ZhangH.pdf
[49] C. Salperwyck, V. Lemaire, Learning with few examples: An empirical
study on leading classifiers, in: Neural Networks (IJCNN), The 2011 International Joint Conference on, IEEE, 2011, pp. 1010-1019.
[50] V. Metsis, I. Androutsopoulos, G. Paliouras, Spam filtering with naive
bayes-which naive bayes?, in: CEAS, 2006, pp. 27-28.
[51] S. Ting, W. Ip, A. H. Tsang, Is naive bayes a good classifier for document classification?, International Journal of Software Engineering and Its
Applications 5 (3) (2011) 37-46.
[52] W. Lou, X. Wang, F. Chen, Y. Chen, B. Jiang, H. Zhang, Sequence based
prediction of dna-binding proteins based on hybrid feature selection using
random forest and gaussian naive bayes, PLoS One 9 (1) (2014) e86703.
[53] J. C. Griffis, J. B. Allendorfer, J. P. Szaflarski, Voxel-based gaussian na¨ıve
bayes classification of ischemic stroke lesions in individual t1-weighted mri
scans, Journal of neuroscience methods 257 (2016) 97-108.
[54] T. Zhang, Solving large scale linear prediction problems using stochastic
gradient descent algorithms, in: Proceedings of the twenty-first international conference on Machine learning, ACM, 2004, p. 116.
[55] L. Bottou, Large-scale machine learning with stochastic gradient descent,
in: Proceedings of COMPSTAT’2010, Springer, 2010, pp. 177-186.
27[56] P. Richt´arik, M. Tak´aˇc, Parallel coordinate descent methods for big data
optimization, Mathematical Programming 156 (1-2) (2016) 433-484.
[57] Z. Akata, F. Perronnin, Z. Harchaoui, C. Schmid, Good practice in largescale learning for image classification, IEEE Transactions on Pattern Analysis and Machine Intelligence 36 (3) (2014) 507-520.
[58] M. Sapienza, F. Cuzzolin, P. H. Torr, Learning discriminative space{time
action parts from weakly labelled videos, International journal of computer
vision 110 (1) (2014) 30-47.
[59] S. Ertekin, L. Bottou, C. L. Giles, Nonconvex online support vector machines, IEEE Transactions on Pattern Analysis and Machine Intelligence
33 (2) (2011) 368-381.
[60] R. Elwell, R. Polikar, Incremental learning of concept drift in nonstationary
environments, IEEE Transactions on Neural Networks 22 (10) (2011) 1517-1531.
[61] G. Ditzler, R. Polikar, Incremental learning of concept drift from streaming imbalanced data, ieee transactions on knowledge and data engineering 25 (10) (2013) 2283-2301.
[62] J. Zhao, Z. Wang, D. S. Park, Online sequential extreme learning machine
with forgetting mechanism, Neurocomputing 87 (2012) 79-89.
[63] Y. Ye, S. Squartini, F. Piazza, Online sequential extreme learning machine
in nonstationary environments, Neurocomputing 116 (2013) 94-101.
[64] R. Johnson, T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in: Advances in Neural Information Processing Systems, 2013, pp. 315-323.
[65] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825-2830.
[66] M. Lichman, UCI machine learning repository (2013). URL http://archive.ics.uci.edu/ml
[67] C.-C. Chang, C.-J. Lin, LIBSVM: A library for support vector machines,
ACM Transactions on Intelligent Systems and Technology 2 (2011) 27:1-27:27.
[68] J. Bergstra, B. Komer, C. Eliasmith, D. Yamins, D. D. Cox, Hyperopt: a python library for model selection and hyperparameter optimization, Computational Science & Discovery 8 (1) (2015) 014008. URL http://stacks.iop.org/1749-4699/8/i=1/a=014008
[69] J. S. Bergstra, R. Bardenet, Y. Bengio, B. K´egl, Algorithms for hyperparameter optimization, in: Advances in Neural Information Processing Systems, 2011, pp. 2546-2554.
[70] H. He, S. Chen, K. Li, X. Xu, Incremental learning from stream data,Neural Networks, IEEE Transactions on 22 (12) (2011) 1901-1914.
[71] M. Grbovic, S. Vucetic, Learning vector quantization with adaptive prototype addition and removal, in: IJCNN 2009, IEEE, 2009, pp. 994-1001.
[72] R. Elwell, R. Polikar, Incremental learning in nonstationary environments with controlled forgetting, in: IJCNN 2009, IEEE, 2009, pp. 771-778.
[73] T. Downs, K. Gates, A. Masters, Exact simplification of support vector solutions, J. Mach. Learn. Res. 2 (2002) 293-297.URL http://dl.acm.org/citation.cfm?id=944790.944814
[74] J. Gama, P. Medas, G. Castillo, P. Rodrigues, Learning with drift detection, in: Advances in artificial intelligence{SBIA 2004, Springer, 2004, pp. 286-295.
[75] J. Z. Kolter, M. A. Maloof, Dynamic weighted majority: An ensemble method for drifting concepts, The Journal of Machine Learning Research 8 (2007) 2755-2790.
[76] R. Elwell, R. Polikar, Incremental learning of concept drift in nonstationary environments, IEEE Transactions on Neural Networks 22 (10) (2011) 1517-1531. doi:10.1109/TNN.2011.2160459.
[77] M. Harries, U. N. cse tr, N. S. Wales, Splice-2 comparative evaluation: Electricity pricing, Tech. rep. (1999).
[78] M. Baena-Garcıa, J. del Campo-Avila, R. Fidalgo, A. Bifet, R. Gavalda, R. Morales-Bueno, Early drift detection method, in: Fourth international workshop on knowledge discovery from data streams, Vol. 6, 2006, pp. 77-86.
[79] L. I. Kuncheva, C. O. Plumpton, Adaptive learning rate for online linear discriminant classifiers, in: Structural, Syntactic, and Statistical Pattern Recognition, Springer, 2008, pp. 510-519.
[80] I. Zliobaite, How good is the electricity benchmark for evaluating concept drift adaptation, CoRR abs/1301.3524.
[81] A. Bifet, G. Holmes, R. Kirkby, B. Pfahringer, Moa: Massive online analysis, The Journal of Machine Learning Research 11 (2010) 1601-1604.
[82] A. Bifet, B. Pfahringer, J. Read, G. Holmes, Efficient data stream classification via probabilistic adaptive windows, in: Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC ’13, ACM, New York, NY, USA, 2013, pp. 801{806. doi:10.1145/2480362.2480516.URL http://doi.acm.org/10.1145/2480362.2480516
[83] J. Gama, R. Rocha, P. Medas, Accurate decision trees for mining highspeed data streams, in: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2003,pp. 523-528.
[84] N. C. Oza, S. Russell, Experimental comparisons of online and batch versions of bagging and boosting, in: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2001, pp. 359-364.
[85] G. Widmer, M. Kubat, Learning in the presence of concept drift and hidden contexts, Machine learning 23 (1) (1996) 69-101.
[86] A. Bifet, R. Gavalda, Learning from time-changing data with adaptive windowing., in: SIAM International Conference on Data Mining (SDM), Vol. 7, SIAM, 2007, p. 2007.
[87] V. Losing, B. Hammer, H. Wersing, Knn classifier with self adjusting memory for heterogeneous concept drift, in: 2016 IEEE 16th International Conference on Data Mining (ICDM), 2016, pp. 291-300. doi:10.1109/ICDM.2016.0040.