mining changes of classification by correspondence tracing
play

Mining Changes of Classification by Correspondence Tracing Ke Wang - PDF document

Mining Changes of Classification by Correspondence Tracing Ke Wang Senqiang Zhou Chee Ada Fu Jeffrey Xu Yu Abstract alert the organization about a potential lose of cus- tomers and trigger actions to retain such customers. We


  1. Mining Changes of Classification by Correspondence Tracing Ke Wang ∗ Senqiang Zhou † Chee Ada Fu ‡ Jeffrey Xu Yu § Abstract alert the organization about a potential lose of cus- tomers and trigger actions to retain such customers. We study the problem of mining changes of classification In this paper, we study the change mining problem characteristics as the data changes. Available are an old in the context of classification [15]. The classification classifier, representing previous knowledge about classifica- refers to extracting characteristics called a classifier tion characteristics, and a new data. We want to find the from a sample of pre-classified examples, and the goal changes of classification characteristics in the new data. An is to assign classes, as accurately as possible, for other example of such changes is “members with a large family no examples that follow the same class distribution as longer shop frequently, but they used to”. Finding this kind the sample examples. In the change mining problem, of changes holds the key for the organization to adopt to the we have an old classifier, representing some previous changed environment and stay ahead of competitors. The knowledge about classification, and a new data set that challenge is that it is difficult to see what has really changed has a changed class distribution. We want to find the from comparing the old and new classifiers that could be very changes of classification characteristics in the new data large and different. In this paper, we propose a technique to set. identify such changes. The idea is tracing the characteris- For changes to be understandable to the user, two tics, in the old and new classifiers, that correspond to each requirements are essential. First, changes must be other by classifying the same examples. We describe sev- described explicitly . Simply returning the pair of old eral ways to present changes so that the user can focus on a and new classifiers does not work because it is not small number of important ones. We evaluate the proposed reasonable to expect the user to extract the changes method on real life data sets. from comparing two classifiers that are potentially large and dissimilar. For example, a decision tree classifier 1 Introduction can easily have several dozens (if not hundreds) of rules, Changes can be opportunities to some people (organi- and a change at the top levels will make the classifier zations) and curses to others. A key to staying ahead in look very different. Second, the user should be told what the changing world is knowing important changes and changes are important because often more changes are devising strategies for adopting to them. There are found than what a human user can possibly handle. three steps in this process: detecting changes, identi- Change mining is a difficult problem. First of fying the causes of changes, and acting upon the causes all, it is not clear how the change of classification to respond to the changes. Detecting changes in a form should be measured. Simply measuring the number understandable to the user is the most important step of rules added and deleted does not work because because it alerts opportunities and challenges ahead and a similar classification can be produced by dissimilar trigger the other steps. For example, by mining changes rules. Moreover, a small change in rules could account the user may find that many members with a large fam- for most changes in classification accuracy. There are ily no longer shop frequently. This information could a few studies on this issue in the literature (see Section 2 for related work). In [11], to extract and understand changes, a new classifier is required to resemble the old ∗ Simon Fraser University, wangk@cs.sfu.ca. Supported in part classifier to some extent, i.e., follow a similar splitting by a research grant from the Natural Science and Engineering in the decision tree construction. This restriction makes Research Council of Canada and by a research grant from Networks of Centres of Excellence/Institute for Robotics and it less likely to find important changes. For example, Intelligent Systems important attributes often occur at top levels of the † Simon Fraser University, szhoua@cs.sfu.ca decision tree, and if such attributes change, the method ‡ The Chinese University of Hong Kong, adafu@cs.cuhk.edu.hk. in [11] cannot be used. In [9], the change between two Supported by the RGC (the Hong Kong Research Grants Council) classifiers is measured by the amount of work required grant UGC REF.CUHK 4179/01E. § The Chinese University of Hong Kong, yu@se.cuhk.edu.hk. to transform them into some common specialization. In Supported in part by the Research Grants Council of the Hong the real life, the human user hardly thinks of changes in Kong, China (CUHK4229/01E)

Recommend


More recommend