An Approach For Post Mining of Combined Patterns
An Approach For Post Mining of Combined Patterns
An Approach For Post Mining of Combined Patterns
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com Volume 3, Issue 1, January 2014 ISSN 2319 - 4847
Department of Computer Science, PG Student,Jayavantrao Sawant College of Engineering,Pune,India. 2 Department of Computer Science, Faculty, Jayavantrao Sawant College of Engineering,Pune,India.
ABSTRACT
Over last 10-15 years data mining practices have briskly employed in various fields. Input data for such miscellaneous applications not only varies but is also complex i.e. huge as well as heterogeneous in nature. Many data mining applications are restricted to mine only a specific type of data. Moreover, these applications use solitary technique of data mining to discover knowledge. Combined mining is a novel approach to mine complex data. It embraces three different framework viz. multi-source combined mining, multi-method combined mining, and multi-feature combined mining. The outcome of combined mining process is termed as combined patterns, which are indisputably more informative than simple association rules. Domain knowledge plays vital role in data mining applications. Out of various ways of representing domain knowledge, ontology is effective one. The combined patterns can be converted into more practicable information if one processes them with the help of domain knowledge. The purpose of this work is to study the impact of incorporating domain knowledge in post mining process of combined patterns. So we will initially formalize users knowledge & goals using rules, schema and then filter the extracted combined patterns using various operators.
Keywords: Combined Mining, Multi Source Combined Mining, Multi Method Combined Mining, Multi Feature Combined Mining, Ontology, Rule schema, Operators.
1. INTRODUCTION
Data mining by now is being widely used in many areas such as public services, telecom, share market, shopping malls, health care and many more. In up-to-the-minute epoch, developers are also concerned about the various types of data sources involved in applications. These days, data sources entail heterogeneous data for example transactional data, XML documents, text files etc. Also the transactional data sources may hold multiple features. To handle such multifeatured and heterogeneous data sources, the process of mining needs to be generalized. The combined mining approach does the same. Following are some scenarios which elaborate need for novel approaches of mining. The transactional data sources involved in data mining applications may have multiple features. Combined mining selects the features from all sources which has more importance, and incorporate them into resultant patterns. Such patterns are known as combined patterns (as they contain multiple features from heterogeneous data sources). Sometimes the data to be mined can be distributed or volume of data can be so large that it is impossible to scan the whole data. Combined mining scans each data source separately and then combine the generated patterns. As known, there are many methods of data mining for example association rule mining, classification, clustering, summarization, prediction, etc. But many times outcome of a single method may not be useful in required perspective. Combined mining make use of multiple methods to generate patterns which reveal real meaning of data by taking up the advantages of multiple method. These scenarios form base for three important frameworks of combined mining, which are multi-feature mining, multi source mining and multi method mining [1]. The patterns extracted by using one or more of these frameworks on ad hoc basis can be made more informative, only if they are post processed against the domain knowledge. To make combined patterns more practicable here we are implementing schemas, rules and operators along with ontology representation of domain knowledge. The remaining part of this paper is organized as follows. In section 3, brief idea of multi-source mining has been reviewed. Section 4 and 5 helps to elaborate multi-feature and multi-method combined mining approaches. A preamble to
Page 393
2. LITERATURE SURVEY
Combined mining concept introduced in [1] enforce to make use of more than one mining techniques at the same time. Integrated use of classification and association rule mining was also done in [6]. A new concept CAR, Class Rule Analysis was commenced. The integration of classification and done by mining a special subset of association rules, i.e. CARs. Classifier built is more accurate than that produced by the simple classification system. To work with complex, huge, and heterogeneous data many efforts has been taken. Association rules are generally extracted from transactional data with a single set. In [2], a novel approach for extracting combined association rules was proposed. Combined association rules were prearranged as rule sets, each of which is composed of a number of single combined association rules. To achieve this, association rule mining was done in two steps, 1) rule generation and 2) definition of new interestingness measures. In rule generation, the frequent item-sets were discovered among item-set groups to improve efficiency. Then new interestingness measures were defined to discover more actionable knowledge. This project uses the concepts of combined association rules, combined rule pairs, and combined rule clusters which are described in [3]. The concepts were proposed to mine actionable patterns from data. Also combined pattern mining is extended to use complex data i.e. multiple, heterogeneous data sets in [1]. Impact-targeted activity data is widely seen in many areas such as insurance, customer debt, social security, and counter terrorism. Some techniques for recognizing impact-targeted activities in imbalanced data had been developed in [4]. The imbalanced data structure differentiates impact targeted activity data from traditional data. Impact-targeted activities may lead to a specific impact of interest to the application domain. This effort introduced approaches to identify three types of frequent activity patterns: 1) impact-oriented activity patterns 2) impact-contrasted activity patterns, and 3) sequential impact-reversed activity patterns. Many times extracted patterns are useful to miner but they may not be of interest to business. In order to satisfy the need of particular business application these patterns need to be treated by various interestingness metrics so that they reveal importance as well as concerns to the required perspective. An effective DDID-PD framework was developed in [5] for improving effectiveness of patterns. Such patterns are known as actionable patterns. Interestingness matrices need to be developed by taking into account many aspects such technical performance, business performance, domain knowledge, end user experience as well as organizational & social factors. Evaluation of interestingness measures may lead to design of an expert system as one of the way of developing them is to learn by past experiences. Association rule mining applications often yields a huge amount of patterns. To use only important ones out of these is usually very difficult. Hence, post-processing has essential role in the mining process. In [7] four post-processing tasks: pruning, summarizing, grouping and visualization subsisted. But these post processing errands focused only on statistics of the patterns. Ontology can play various roles in an organization viz. for communication among people and organizations, for inter-operability among systems, for benefits of system engineering. Also ontology assist in the process of building and maintaining knowledge based software systems. Ontology is introduced late in [8] and since then they are vigorously used even in data mining applications for constructing required knowledge base. In [9], the author had presented a new solution for local association rule mining by integrating user beliefs and expectations. Here the rule schema formalism helps the user to focus the search for interesting rules. The local mining algorithm does not extract all rules and then post-process them, but instead, searches interesting rules in the environs of what the user expects or believes. A novel approach was applied in [10] for post mining of huge set of association rules by integrating domain expert knowledge in the post processing step. Here the quality of the filtered rules was validated by the domain expert at various points in the interactive process.
Page 394
Data set 1
Data set 1
Data set 1
Data set 1
Attribute Set1
Attribute Set2
Attribute Set3
Attribute Set4
Identify next datasets for mining depending on the results of initial mining Merge Atomic patterns into combined pattern set.
Data set 1
Data set 2
Data set 3
Data set n
Generate Cluster Patterns By grouping various pair patterns Mined Pattern 1 Mined Pattern 2 Mined Pattern 3 Mined Pattern n Generate Incremental Pair Patterns (Pattern which is extension of other)
Page 395
Page 396
Ontology
Shirts Trousers Formal Shirts Formal Trousers Casual Trousers Full Sleeves Half Sleeves T -Shirt Casual Shirts
Plated
Flat Front
Jeans
Cargo
7. PROPOSED METHODOLOGY
The combined mining methodology provides three diverse frameworks viz. multi-source combined mining, multi-feature combined mining and multi-method combined mining. Proposed work will effort on multi source and multi feature combined mining frameworks. These frameworks can be applied on heterogeneous, complex datasets data sets on ad hoc basis. This process will discover combined patterns form input data sets. Data set 1 Data set 2.. ..Data set n
Attribute selection
Atomic patterns
Merging
Figure 4: Process of combined Mining. The combined patterns are surely more informative than simple association rules. But many times practically they may not be constructive for specific circumstances. The usefulness of combined patterns for a specific
Page 397
Ontology
Rule Schema
Operators
Post processing
Knowledge specific Combined Patterns Figure 5: Post processing of combined patterns using Ontology. Data set 1 Data set 2.. ..Data set
User
Combined Patterns
Post processing
Figure 6: Role of user in mining, domain knowledge building and post mining.
8. Conclusion
This paper reviews process of combined mining as well as it also introduces idea of incorporating domain knowledge for post processing of Combined patterns. Multi-source combined mining approach handle of huge and heterogeneous input datasets. Multi-feature combined mining framework helps to extract important features from a range of input.
Page 398
REFERENCES
[1]L. Cao, Y. Zhao, and C. Zhang,Combined Mining: Discovering Informative Knowledge in Complex Data, IEEE Transactions on Systems, Man, and CybernaticsPART B: CYBERNETICS, VOL. 41, NO. 3, JUNE 2011, 699 [2]H. Zhang, Y. Zhao, L. Cao, and C. Zhang, Combined association rule mining, in Proc. PAKDD, 2008, pp. 1069 1074. [3] Y. Zhao, H. Zhang, L. Cao, C. Zhang, and H. Bohlscheid, Combined pattern mining: From learned rules to actionable knowledge, in Proc. AI, 2008, pp. 393403. [4]L. Cao, Y. Zhao, and C. Zhang, "Mining impact-targeted activity patterns in imbalanced data," IEEE Trans. Knowl. Data Eng., vol. 20, no. 8, pp. 1053-1066, Aug. 2008. [5] Cao L and Zhang C. Domain-Driven Data Mining: A Practical Methodology, International Journal of Data Warehousing and Mining, 2(4):49-65, 2006. [6]B. Liu, W. Hsu, and Y. Ma, Integrating classification and association rule mining, in Proc. 4th Int. Conf. Knowl. Discov. Data Mining (KDD), 1998, pp. 8086. [7]B. Baesens, S. Viaene, and J. Vanthienen, Post-Processing of Association Rules, Proc. Workshop Post-Processing in Machine Learning and Data Mining: Interpretation, Visualization, Integration, and Related Topics with Sixth ACM SIGKDD, pp. 20-23, 2000. [8] M. Uschold and M. Grul ninger, Ontologies: Principles, Methods, and Applications, Knowledge Eng. Rev., vol. 11, pp. 93-155, 1996. [9] Local Mining of Association Rules with Rule Schemas Andrei Olaru Claudia Marinica Fabrice Guillet LINA, Ecole Polytechnique de l'Universite de Nantes rue Christian Pauc BP 50609 44306 Nantes Cedex 3 E-mail: cs@andreiolaru.ro, fclaudia.marinica, fabrice.guilletg@univ-nantes.fr [10] Claudia Marinica and Fabrice Guillet,Knowledge- Based Interactive Post-mining of association rules using ontologies, IEEE Transactions on knowledge and data engineering, VOL. 22, No. 6, JUNE 2010.
Page 399