There are a lot of data mining algorithms and methodologies for various fields and various problematic. Each data mining researcher/practitioner is faced with assessing the performance of his own solution(s) in order to make comparisons with state of the art approaches. He should also describe the intrinsic quality of the discovered patterns. Which methodology, which benchmarks, which measures of performance, which tools, which measures of interest, etc., should be used, and why? Every one should answer the previous questions. Assessing the quality and the performance is a critical issue for classical situations and even more in the Big Data era where we can not manage this issue with classical and current approaches. Clearly, large scale data, complex data and streaming data bring new challenges (e.g. large scale inference, fake correlations, necessity of perpetual validation).
The fourth Quality issues, measures of interestingness and evaluation of data mining models workshop (QIMIE'15) will focus on these questions and should be of great interest for a large panel of data miners. As a whole, QIMIE'15 intend to be a forum for a community-wide discussion of these issues and to contribute to a deep cross-fertilization within a large panel of researchers/practitioners attending PAKDD'15. Thus we strongly encourage interested peoples to propose topics and main themes that should be discussed within QIMIE'15.
Following QIMIE'09 (@PAKDD'09, April 2009, Bangkok, Thailand) and QIMIE'11 (@PAKDD'11, May 2011, Shenzen, China), and QIMIE'13 (@PAKDD'13, April 2013, Gold Coast, Australia) the main themes of QIMIE'15 will focus on the theory, the techniques and the practices that can ensure the discovered knowledge is of quality. It will thus cover the problem of measuring quality of patterns, the evaluation of data mining models and the links between the discovery stage and the quality assessment stage.
QIMIE'15 is organized in association with the PAKDD'15 conference (19th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Ho Chi Minh City, Vietnam, May 19-22, 2015), a major international conference in the areas of data mining and knowledge discovery.
Major topics will include but are not limited to the following:
- objective measures of interest (for individual rules or rules basis, patterns, graph, data streams, clusters, etc.)
- subjective measures of interest and quality based on human knowledge, quality of ontologies, actionable rules
- algorithmic properties of measures of interest especially in the context of big data
- comparison of algorithms: issues with benchmarks, experiments and parameters tuning questioning also reproducibility of data mining results, the need of new data sets which match new problems, methodologies, statistical tests, etc.
- robustness evaluation and statistical evaluation
- graphical tools like ROC, cost curves, user-friendly visualizations tools
- special issues: imbalanced data, very large data set, very high dimensional data, changing environments and dynamic data, data streams, big data, lack of training data, graph data, annotated data and semi-supervised learning, methods for performance evaluation with no ground truth data, etc.
- special issues in specialized domains: bioinformatics, security, information retrieval, sequential and times series data, social networks, geo-localized data, etc.
From the previous list of key topics, although not exhaustive, and from recent publications and related workshops questioning the usefulness of research in machine learning and data mining, one can identify four major themes (to be extended, all propositions are welcome):
- properties of objective measures of interest (for individual rules, for rules basis) which leads to the problem on how to choose, depending of the user's goal and other factors, an appropriate interestingness measure in order to filter the huge amount of individual rules or to evaluate a set of rules; this theme is of great interest to machine learning and data mining community and has focussed a lot of works; properties of subjective measures of interest, integration of domain knowledge, quality based on human knowledge, quality of ontologies, actionable rules. QIMIE'15 intend to continue and enlarge this theme.
- algorithmic properties of interestingness measures which leads to the problem of how to mine efficiently interesting patterns i.e. can we use interestingness measures as soon as possible (as well as the well-known support) to reduce both the time to mine databases and the number of founded patterns? This question has attracted a lot of works but is still a very challenging problem. QIMIE'15 intend to develop this original theme that, from the best of our knowledge, is not deeply treated in any event.
- challenges with new data and new problems; these challenges are generally the subjects of fruitful specialized workshops (e.g. very large and very high dimensional data, imbalanced data, etc. and related specialized domains like bio-informatics, life sciences in a broad manner, etc.). QIMIE'15 intend to extent the relations between these challenges that are generally treated in a separate way.
- evaluation and comparison of algorithms which lead to debate on how an algorithm should be evaluated, on which properties (e.g. accuracy, conciseness, specificity, sensitivity, etc.), on which trade-off between the different type of errors for multiple simultaneous hypothesis testing, on how to construct new evaluation measures?, etc.; clearly the debate within the machine learning and data mining community into how we evaluate new algorithms is not close. QIMIE'15 intend to extent the relations between these challenges and the three previous themes.