QIMIE'13 Description
There are a lot of data mining algorithms and methodologies for various fields and various problematic. Each data mining researcher/practitioner is faced with assessing the performance of his own solution(s) in order to make comparisons with state of the art approaches. He should also describe the intrinsic quality of the discovered patterns. Which methodology, which benchmarks, which measures of performance, which tools, which measures of interest, etc., should be used, and why? Every one should answer the previous questions, and assessing the quality and the performance is a critical issue.
The third Quality issues, measures of interestingness and evaluation of data mining models workshop (QIMIE'13) will focus on these questions and should be of great interest for a large panel of data miners. As a whole, QIMIE'13 intend to be a forum for a community-wide discussion of these issues and to contribute to a deep cross-fertilization within a large panel of researchers/practitioners attending PAKDD'13. Thus we strongly encourage interested peoples to propose topics and main themes that should be discussed within QIMIE'13.
Following QIMIE'09 (organized in conjunction with the 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, April 2009, Bangkok, Thailand) and QIMIE'11 (organized in conjunction with the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, May 2011, Shenzen, China), the main themes of QIMIE'13 will focus on the theory, the techniques and the practices that can ensure the discovered knowledge is of quality. It will thus cover the problem of measuring quality of patterns, the evaluation of data mining models and the links between the discovery stage and the quality assessment stage.
QIMIE'13 is organized in association with the PAKDD'13 conference (17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Gold Coast, Australia, April 14-17, 2013), a major international conference in the areas of data mining and knowledge discovery.
Major topics will include but are not limited to the following:
- objective measures of interest (for individual rules or rules basis, patterns, graph, data streams, clusters, etc.)
- subjective measures of interest and quality based on human knowledge, quality of ontologies, actionable rules
- algorithmic properties of measures of interest
- comparison of algorithms: issues with benchmarks, experiments and parameters tuning questioning also reproducibility of data mining results, the need of new data sets which match new problems, methodologies, statistical tests, etc.
- robustness evaluation and statistical evaluation
- graphical tools like ROC, cost curves
- special issues: imbalanced data, very large data set, very high dimensional data, changing environments, lack of training data, sample selection bias, graph data, etc.
- special issues in specialized domains: bioinformatics, security, information retrieval, sequential and times series data, social networks, geo-localized data, etc.
- etc.
From the previous list of key topics, although not exhaustive, and from recent publications and related workshops questioning the usefulness of research in machine learning and data mining, one can identify five major themes (to be extended, all propositions are welcome):
- properties of objective measures of interest (for individual rules, for rules basis) which leads to the problem on how to choose, depending of the user's goal and other factors, an appropriate interestingness measure in order to filter the huge amount of individual rules or to evaluate a set of rules; this theme is of great interest to machine learning and data mining community and has focussed a lot of works. QIMIE'13 intend to continue and enlarge this theme.
- algorithmic properties of interestingness measures which leads to the problem of how to mine efficiently interesting patterns i.e. can we use interestingness measures as soon as possible (as well as the well-known support) to reduce both the time to mine databases and the number of founded patterns? This question has attracted a lot of works but is still a very challenging problem. QIMIE'13 intend to develop this original theme that, from the best of our knowledge, is not deeply treated in any event.
- properties of subjective measures of interest, integration of domain knowledge, quality based on human knowledge, quality of ontologies, actionable rules
- challenges with new data and new problems; these challenges are generally the subjects of fruitful specialized workshops (e.g. very large and very high dimensional data, imbalanced data, etc. and related specialized domains like bio-informatics, life sciences in a broad manner, etc.). QIMIE'13 intend to extent the relations between these challenges that are generally treated in a separate way.
- evaluation and comparison of algorithms which lead to debate on how an algorithm should be evaluated, on which properties (e.g. accuracy, conciseness, specificity, sensitivity, etc.), on which trade-off between the different type of errors for multiple simultaneous hypothesis testing, on how to construct new evaluation measures?, etc.; clearly the debate within the machine learning and data mining community into how we evaluate new algorithms is not close. QIMIE'13 intend to extent the relations between these challenges and the three previous themes.