VISUAL DATA MINING MODELS e-business intelligence lab FOR ENHANCING THE www.e-bi.gr KNOWLEDGE EXTRACTION Dr. Ioannis Kopanakis kopanak@e-bi.gr Assistant Professor Head, Dept. of Commerce & Marketing Technological Educational Institute of Crete Scientific Director e-Business Intelligence Lab Center for Technological Research - Crete 1
Visual Senses • The visual senses for humans have a unique status, offering a very broadband channel for information flow. • Visual approaches to analysis and mining attempt to take advantage of our abilities to perceive pattern and structure in visual form and to make sense of, or interpret, what we see. • Visual data mining techniques have proven to be of high value in exploratory data analysis and they also have a high potential for mining large databases. • In this presentation, we try to investigate the area of visual data mining. 2
From Data to Information • Having the right information at the right time is crucial for making the right decisions. • Because of the fast technological progress, the amount of information, which may be of interest for making decisions, increases very fast [Kei96a]. • One reason for the ever increasing stream of data is the automation of activities in all areas, including business, engineering, science, and government. • But finding the valuable information hidden in them is like searching a pin in a haystack. • The process of searching and analyzing large amounts of data is called “data mining”. • The large collections of data are the potential lodes of valuable information but like in real mining, the search and extraction can be a difficult and exhaustive process [Kei94a]. 3
Data Mining • Data mining is a knowledge discovery process of extracting previously unknown, actionable information from very large databases. • In details it is the nontrivial extraction of implicit, previously unknown and potentially useful information from data. • In other words, it is the search from relationships and global patterns that exist in large databases, but are “hidden” among the vast amounts of data. • These relationships represent valuable knowledge about the database and objects in the world [Fra92]. 4
Data Mining II • Data mining is the efficient and possibly unsupervised discovery of interesting, useful and previously unknown patterns in a data warehouse, which is a historical database, designed to facilitate analysis and knowledge discovery [Gan96]. • Common patterns of interest include classification, associations, clustering, and sequential patterns. • The success of the data mining process is critically dependent upon the availability of user insights and biases, even though the process may use unsupervised learning algorithms. • In some sense, data mining is like the work of radiologists. It is like scanning the database to identify phenomena that need to be looked at, showing the regular structure of the data but also helping to find anomalies. 5
Data Mining Process 6
Data Preparation Stage • improving the data quality • summarizing the data to facilitate the analysis and discovery process. • on either operational databases or on a data warehouse • The quality of the data in the data warehouse is constantly monitored by data analysts. • Due to the heterogeneity and non-standard policies enforced on data quality at the different source databases, the warehouse data is usually cleaned or standardized via data scrubbing. 7
Model Derivation Stage • focuses on choosing learning samples, testing samples and learning algorithms • a suitable sample set is selected which forms the training data for the data mining algorithm. • The data mining process in this stage is viewed as the derivation of an interesting representative knowledge model. • The algorithm for model derivation, together with the guidance provided by the user, will generally produce several models of the information contained in the data • The data mining algorithms use guidance from the analyst to decide various parameters of the model derived, such as its accuracy and prevalence, controlling the computational complexity of the learning process 8
Validation Stage • monitoring of database updates and continued validation of patterns learned in the past. • not all the knowledge models generated will have business applications. • continuously monitor the validity of the knowledge models in the context of changes to data in the warehouse. • When the population in the warehouse shifts significantly, the previously learned models will no longer be applicable, and new models will have to be derived. • We may also be able to learn new models incrementally from the new data. 9
Visual Data Mining • Visual data mining could be related with all three data mining sub-modules. Visual data mining involves the invention of visual representations that would enhance information and knowledge flow throughout each module of the data mining process. 10
VDM on the Data Preparation Stage • enhance or carry out tasks of the pre-processing module in a visual manner • visual manipulation of data and handle problems such as missing data fields, data transformations, data sampling and pruning • visualization can be used as a summarization tool for the human user to gain an overview of data sets • enables him to formulate accurate hypotheses and objectives • select carefully only the relevant and useful data to be sampled and extracted for data pre-processing • Thus, visual data mining in the field of data preparation could serve as a channel for the inflow of relevant domain knowledge and human decisions, which could help optimize these otherwise laborious data pre-processing activities [Law01]. 11
VDM on the Model Derivation Stage • utilization of visualization techniques for enhancing the constituting steps of this module: – visual evaluation, monitoring and guidance of the model derivation module. • Evaluation – validation of training samples, test samples, and learned models against the data in the database plus the appropriateness of data and learning algorithms for specific data mining situations. • Monitoring – tracking the progress of the data mining algorithms, evaluating the continued relevance of learned patterns in the context of database updates, etc. • Guidance – user initiated biasing or altering inputs, learned patterns and other system decisions 12
VDM on the Model Derivation Stage • getting the insights needed for true understanding and comprehension of the actual model derived • That can only be obtained by ensuring that humans can actually visualize models in order to help understand the “black box” functions learned with neural nets and complex rule based classifiers [Fay01]. • Visualizing a model should allow a user to discuss and explain the logic behind the model with colleagues, customers, and other users. • Getting by in the logic or rationale is part of building users’ trust in the results. • If the user can understand what has been discovered in the context of business issues, he/she will trust it and put it into use. • Unfortunately, users are often forced to trade off accuracy of a model for understandability. 13
VDM on the Model Derivation Stage • Advanced visualization techniques can greatly expand the range of models that can be understood by domain experts • Three components are essential for understanding a model: representation, interaction and integration. • Representation refers to the visual forming in which the model appears. A good representation displays the model in terms of visual components that are already familiar to the user. • Interaction refers to the ability to see the model in action in real time, to let the user play with the model as if it was a machine. • Integration refers to the ability to display relationships between the model and alternate views of the data on which it is based. • Integration provides the user context [The01]. 14
VDM on the Validation Stage • assist the knowledge engineer to acquire enhanced knowledge by the visualization of outcomes produced by data mining processes. • the results of data mining algorithms representing associations, relevancies and classifications are in a form difficult to be understood by humans • In that context, visual data mining on the validation stage could be defined as the graphical presentation of data, whether the data is base data, summary data, or mined outcomes extracted from data. • This is a type of visual data analysis, where the analytic component is offloaded to human perception [Kei95a]. 15
Recommend
More recommend