Download Data Mining: Concepts And Techniques (2nd Edition)

9/5/2019

Download Data Mining: Concepts And Techniques (2nd Edition)

Read Now

Data Mining Concepts And Techniques By Jiawei - Kamber (2nd Edition) is an updated and improved version as compared to the edition already being published. Jiawei and Kamber have added the coverage of new and important topics, such as mining stream data, mining social networks, and mining spatial, multi-media and other complex data. The text is a good textbook for courses on Data Mining and Knowledge Discovery, Gregory Piatetsky-Shapiro.The 2nd edition Data Mining Concepts And Techniques is the most up-to-date data on the topic.

Data Mining: Concepts and Techniques (2nd edition) Jiawei Han and Micheline Kamber Morgan Kaufmann Publishers, 2006 Bibliographic Notes for Chapter 5 Mining Frequent Patterns, Associations, and Correlations Association rule mining was ﬂrst proposed by Agrawal, Imielinski, and Swami AIS93. The Apriori algorithm. The new release improves the first edition and new chapters have been added to address the recent developments on mining complex data types, including streaming data, sequence data, graph structured data, social networks and data, multinational data. Data Mining Concepts and Techniques pdf eBook free download.

As compared to the thorough coverage of the first edition, Jiawei and Kamber have added the state-of-the-art research results in new topics. This book is a 'must have' for all instructors, researchers, developers and users in the area of data mining and knowledge discovery.

A variation of the algorithm using a similar pruning heuristic was developed independently by Mannila, Tiovonen, and Verkamo MTV94. A joint publication combining these works later appeared in Agrawal, Mannila, Srikant, Toivonen, and Verkamo AMS+ 96. A method for generating association rules from frequent itemsets is described in Agrawal and Srikant AS94a. References for the variations of Apriori described in Section 5.2.3 include the following. The use of hash tables to improve association mining efficiency was studied by Park, Chen, and Yu PCY95a. Transaction reduction techniques are described in Agrawal and Srikant AS94b, Han and Fu HF95, and Park, Chen, and Yu PCY95a. The partitioning technique was proposed by Savasere, Omiecinski, and Navathe SON95.

The sampling approach is discussed in Toivonen Toi96. A dynamic itemset counting approach is given in Brin, Motwani, Ullman, and Tsur BMUT97.

An efficient incremental updating of mined association rules was proposed by Cheung, Han, Ng, and Wong CHNW96. Parallel and distributed association data mining under the Apriori framework was studied by Park, Chen, and Yu PCY95b, Agrawal and Shafer AS96, and Cheung, Han, Ng, et al. Another parallel association mining method, which explores itemset clustering using a vertical database layout, was proposed in Zaki, Parthasarathy, Ogihara, and Li ZPOL97. Other scalable frequent itemset mining methods have been proposed as alternatives to the Apriori-based approach.

FP-growth, a pattern-growth approach for mining frequent itemsets without candidate generation, was proposed by Han, Pei, and Yin HPY00 (Section 5.2.4). An exploration of hyper-structure mining of frequent patterns, called H-Mine, was proposed by Pei, Han, Lu, Nishio, Tang, and Yang PHMA+ 01. OP, a method that integrates top-down and bottom-up traversal of FP-trees in pattern-growth mining, was proposed by Liu, Pan, Wang, and Han LPWH02. An array-based implementation of prefix-tree-structure for efficient pattern growth mining was proposed by Grahne and Zhu GZ03b. ECLAT, an approach for mining frequent itemsets by exploring the vertical data format, was proposed by Zaki Zak00.

A depth-first generation of frequent itemsets was proposed by Agarwal, Aggarwal, and Prasad AAP01. The mining of frequent closed itemsets was proposed in Pasquier, Bastide, Taouil, and Lakhal PBTL99, where an Apriori-based algorithm called A-Close for such mining was presented. CLOSET, an efficient closed itemset mining algorithm based on the frequent-pattern growth method, was proposed by Pei, Han, and Mao PHM00, and further refined as CLOSET+ in Wang, Han, and Pei WHP03.

FPClose, a prefix-tree-based algorithm for mining closed itemsets using a pattern-growth approach, was proposed by Grahne and Zhu GZ03b. An extension for mining closed frequent itemsets with the vertical data format, called CHARM, was proposed by Zaki and Hsiao ZH02. Mining max-patterns was first studied by Bayardo Bay98. Another efficient method for mining maximal frequent itemsets using vertical data format, called MAFIA, was proposed by Burdick, Calimlim, and Gehrke BCG01.

AFOPT, a method that explores a right push operation on FP-trees during the mining process, was proposed by Liu, Lu, Lou, and Yu LLLY03. Pan, Cong, Tung, et al. PCT+ 03 proposed CARPENTER, a method for finding closed patterns in long biological datasets, which integrates the advantages of vertical data formats and pattern-growth methods. A FIMI (Frequent Itemset Mining Implementation) workshop dedicated to the implementation methods of frequent itemset mining was reported by Goethals and Zaki GZ03a.

Frequent itemset mining has various extensions, including sequential pattern mining (Agrawal and Srikant AS95), episodes mining (Mannila, Toivonen, and Verkamo MTV97), spatial association rule mining (Koperski 1Data Mining: Concepts and TechniquesHan and Kamber, 2006¨ and Han KH95), cyclic association rule mining (Ozden, Ramaswamy, and Silberschatz ORS98), negative association rule mining (Savasere, Omiecinski, and Navathe SON98), intertransaction association rule mining (Lu, Han, and Feng LHF98), and calendric market basket analysis (Ramaswamy, Mahajan, and Silberschatz RMS98). Multilevel association mining was studied in Han and Fu HF95, and Srikant and Agrawal SA95. In Srikant and Agrawal SA95, such mining was studied in the context of generalized association rules, and an R-interest measure was proposed for removing redundant rules. A non-grid-based technique for mining quantitative association rules, which uses a measure of partial completeness, was proposed by Srikant and Agrawal SA96.

The ARCS system for mining quantitative association rules based on rule clustering was proposed by Lent, Swami, and Widom LSW97. Techniques for mining quantitative rules based on x-monotone and rectilinear regions were presented by Fukuda, Morimoto, Morishita, and Tokuyama FMMT96, and Yoda, Fukuda, Morimoto, et al. Mining multidimensional association rules using static discretization of quantitative attributes and data cubes was studied by Kamber, Han, and Chiang KHC97. Mining (distance-based) association rules over interval data was proposed by Miller and Yang MY97. Mining quantitative association rules based on a statistical theory to present only those that deviate substantially from normal data was studied by Aumann and Lindell AL99. The problem of mining interesting rules has been studied by many researchers.

The statistical independence of rules in data mining was studied by Piatetski-Shapiro PS91. The interestingness problem of strong association rules is discussed in Chen, Han, and Yu CHY96, Brin, Motwani, and Silverstein BMS97, and Aggarwal and Yu AY99, which cover several interestingness measures including lift. An efficient method for generalizing associations to correlations is given in Brin, Motwani, and Silverstein BMS97. Other alternatives to the support-confidence framework for assessing the interestingness of association rules are proposed in Brin, Motwani, Ullman, and Tsur BMUT97 and Ahmed, El-Makky, and Taha AEMT00. A method for mining strong gradient relationships among itemsets was proposed by Imielinski, Khachiyan, and Abdulghani IKA02. Silverstein, Brin, Motwani, and Ullman SBMU98 studied the problem of mining causal structures over transaction databases.

Some comparative studies of different interestingness measures were done by Hilderman and Hamilton HH01 and by Tan, Kumar and Srivastava TKS02. The use of all confidence as a correlation measure for generating interesting association rules was studied by Omiecinski Omi03 and by Lee, Kim, Cai and Han LKCH03. To reduce the huge set of frequent patterns generated in data mining, recent studies have been working on mining compressed sets of frequent patterns. Mining closed patterns can be viewed as lossless compression of frequent patterns.

Lossy compression of patterns include maximal patterns by Bayardo Bay98), top-k patterns by Wang, Han, Lu, and Tsvetkov WHLT05, and error-tolerant patterns by Yang, Fayyad, and Bradley YFB01. Afrati, Gionis, and Mannila AGM04 proposed to use K itemsets to cover a collection of frequent itemsets. Yan, Cheng, Xin, and Han proposed a profile-based approach YCXH05, and Xin, Han, Yan, and Cheng proposed a clustering-based approach XHYC05 for frequent itemset compression. The use of metarules as syntactic or semantic filters defining the form of interesting single-dimensional association rules was proposed in Klemettinen, Mannila, Ronkainen, et al. Metarule-guided mining, where the metarule consequent specifies an action (such as Bayesian clustering or plotting) to be applied to the data satisfying the metarule antecedent, was proposed in Shen, Ong, Mitbander, and Zaniolo SOMZ96. A relationbased approach to metarule-guided mining of association rules was studied in Fu and Han FH95.

Methods for constraint-based association rule mining discussed in this chapter were studied by Ng, Lakshmanan, Han, and Pang NLHP98, Lakshmanan, Ng, Han, and Pang LNHP99, and Pei, Han, and Lakshmanan PHL01. An efficient method for mining constrained correlated sets was given in Grahne, Lakshmanan, and Wang GLW00. A dual mining approach was proposed by Bucila, Gehrke, Kifer, and White BGKW03. Other ideas involving the use of templates or predicate constraints in mining have been discussed in AK93, DT93, HK91, LHC97, ST96, and SVA97. The association mining language presented in this chapter was based on an extension of the data mining query language, DMQL, proposed in Han, Fu, Wang, et al. HFW+ 96, by incorporation of the spirit of the SQL-like operator for mining single-dimensional association rules proposed by Meo, Psaila, and Ceri MPC96. MSQL, a query language for mining flexible association rules, was proposed by Imielinski and Virmani IV99.

OLE DB for Data Mining (DM), a data mining query language that includes association mining modules, was proposed by Microsoft Corporation Cor00.2Bibliography AAP01R. Aggarwal, and V. A tree projection algorithm for generation of frequent itemsets.

Parallel and Distributed Computing, 61:350–371, 2001.AEMT00K. El-Makky, and Y. A note on “beyond market basket: Generalizing association rules to correlations”. SIGKDD Explorations, 1:46–48, 2000.AGM04F. Gionis, and H.

Approximating a collection of frequent sets. 2004 ACM SIGKDD Int. Knowledge Discovery in Databases (KDD’04), pages 12–19, Seattle, WA, Aug. Imielinski, and A.

Mining association rules between sets of items in large databases. 1993 ACM-SIGMOD Int. Management of Data (SIGMOD’93), pages 207–216, Washington, DC, May 1993.AK93T. Opportunity explorer: Navigating large databases using knowledge discovery templates. AAAI-93 Workshop Knowledge Discovery in Databases, pages 45–51, Washington, DC, July 1993.AL99Y. Aumann and Y.

A statistical theory for quantitative association rules. Knowledge Discovery and Data Mining (KDD’99), pages 261–270, San Diego, CA, Aug. 1999.AMS+ 96R. Arning, and T. The Quest data mining system. Data Mining and Knowledge Discovery (KDD’96), pages 244–249, Portland, OR, Aug. Agrawal and R.

Fast algorithm for mining association rules in large databases. In Research Report RJ 9839, IBM Almaden Research Center, San Jose, CA, June 1994.AS94bR. Agrawal and R. Fast algorithms for mining association rules. Very Large Data Bases (VLDB’94), pages 487–499, Santiago, Chile, Sept.

Agrawal and R. Mining sequential patterns. Data Engineering (ICDE’95), pages 3–14, Taipei, Taiwan, Mar. Agrawal and J. Parallel mining of association rules: Design, implementation, and experience.

Knowledge and Data Engineering, 8:962–969, 1996.AY99C. Aggarwal and P. A new framework for itemset generation. 1998 ACM Symp. Principles of Database Systems (PODS’98), pages 18–24, Seattle, WA, June 1999.Bay98R.

Efficiently mining long patterns from databases. 1998 ACM-SIGMOD Int. Management of Data (SIGMOD’98), pages 85–93, Seattle, WA, June 1998.BCG01D. Calimlim, and J.

MAFIA: A maximal frequent itemset algorithm for transactional databases. Data Engineering (ICDE’01), pages 443–452, Heidelberg, Germany, April 2001.

3Data Mining: Concepts and TechniquesHan and Kamber, 2006BGKW03 C. Kifer, and W. DualMiner: A dual-pruning algorithm for itemsets with constraints. Data Mining and Knowledge Discovery, 7:241–272, 2003. Motwani, and C. Beyond market basket: Generalizing association rules to correlations. 1997 ACM-SIGMOD Int.

Management of Data (SIGMOD’97), pages 265–276, Tucson, AZ, May 1997.BMUT97S. Ullman, and S. Dynamic itemset counting and implication rules for market basket analysis. 1997 ACM-SIGMOD Int. Management of Data (SIGMOD’97), pages 255–264, Tucson, AZ, May 1997.CHN+ 96D.

A fast distributed algorithm for mining association rules. Parallel and Distributed Information Systems, pages 31–44, Miami Beach, FL, Dec. Maintenance of discovered association rules in large databases: An incremental updating technique. Data Engineering (ICDE’96), pages 106–114, New Orleans, LA, Feb. Data mining: An overview from a database perspective. Knowledge and Data Engineering, 8:866–883, 1996.Cor00Microsoft Corporation.

OLEDB for Data Mining draft specification, www.microsoft.com/data/oledb/dm, Feb. Abstract-driven pattern discovery in databases. Knowledge and Data Engineering, 5:926–938, 1993.FH95Y. Meta-rule-guided mining of association rules in relational databases. Workshop Integration of Knowledge Discovery with Deductive and Object-Oriented Databases (KDOOD’95), pages 39–46, Singapore, Dec. Morishita, and T. Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization.

1996 ACM-SIGMOD Int. Management of Data (SIGMOD’96), pages 13–23, Montreal, Canada, June 1996.GZ03aB. Goethals and M. An introduction to workshop on frequent itemset mining implementations. Workshop on Frequent Itemset Mining Implementations (FIMI’03), Melbourne, FL, Nov. Grahne and J. Efficiently using prefix-trees in mining frequent itemsets.

Workshop on Frequent Itemset Mining Implementations (FIMI’03), Melbourne, FL, Nov. Discovery of multiple-level association rules from large databases.

Very Large Data Bases (VLDB’95), pages 420–431, Zurich, Switzerland, Sept. 1995.HFW+ 96J. Stefanovic, B. DBMiner: A system for mining knowledge in large relational databases. Data Mining and Knowledge Discovery (KDD’96), pages 250–255, Portland, OR, Aug. Hilderman and H.

Knowledge Discovery and Measures of Interest. Kluwer Academic, 2001.HK91P.

Hoschka and W. A support system for interpreting statistical data. PiatetskyShapiro and W. Frawley, editors, Knowledge Discovery in Databases, pages 325–346. AAAI/MIT Press, 1991.HPY00J. Mining frequent patterns without candidate generation. 2000 ACM-SIGMOD Int.

Management of Data (SIGMOD’00), pages 1–12, Dallas, TX, May 2000. 4version 0.9.InChapter 5 Mining Frequent Patterns, Associations, and CorrelationsBibliographic NotesIKA02T.

Imielinski, L. Khachiyan, and A. Cubegrades: Generalizing association rules.

Data Mining and Knowledge Discovery, 6:219–258, 2002.IV99T. Imielinski and A.

MSQL: A query language for database mining. Data Mining and Knowledge Discovery, 3:373–408, 1999.KH95K. Koperski and J.

Discovery of spatial association rules in geographic information databases. Large Spatial Databases (SSD’95), pages 47–66, Portland, ME, Aug. Metarule-guided mining of multi-dimensional association rules using data cubes. Knowledge Discovery and Data Mining (KDD’97), pages 207–210, Newport Beach, CA, Aug. 1997.KMR+ 94M. Klemettinen, H.

Ronkainen, H. Toivonen, and A. Finding interesting rules from large sets of discovered association rules.

Information and Knowledge Management, pages 401–408, Gaithersburg, MD, Nov. Using general impressions to analyze discovered classification rules. Knowledge Discovery and Data Mining (KDD’97), pages 31–36, Newport Beach, CA, Aug. Stock movement and n-dimensional inter-transaction association rules. 1998 SIGMOD Workshop Research Issues on Data Mining and Knowledge Discovery (DMKD’98), pages 12:1–12:7, Seattle, WA, June 1998.LKCH03Y.-K. CoMine: Efficient mining of correlated patterns. Data Mining (ICDM’03), pages 581–584, Melbourne, FL, Nov.

On computing, storing and querying frequent patterns. 2003 ACM SIGKDD Int. Knowledge Discovery and Data Mining (KDD’03), pages 607–612, Washington, DC, Aug. Lakshmanan, R. Optimization of constrained frequent set queries with 2-variable constraints.

1999 ACM-SIGMOD Int. Management of Data (SIGMOD’99), pages 157–168, Philadelphia, PA, June 1999.LPWH02J. Mining frequent item sets by opportunistic projection. 2002 ACM SIGKDD Int.

Knowledge Discovery in Databases (KDD’02), pages 239–248, Edmonton, Canada, July 2002.LSW97B. Swami, and J. Clustering association rules. Data Engineering (ICDE’97), pages 220–231, Birmingham, England, April 1997.MPC96R. Psaila, and S. A new SQL-like operator for mining association rules.

Very Large Data Bases (VLDB’96), pages 122–133, Bombay, India, Sept. Toivonen, and A.

Efficient algorithms for discovering association rules. AAAI’94 Workshop Knowledge Discovery in Databases (KDD’94), pages 181–192, Seattle, WA, July 1994.MTV97H.

Mannila, H Toivonen, and A. Discovery of frequent episodes in event sequences.

Data Mining and Knowledge Discovery, 1:259–289, 1997.MY97R. Miller and Y.

Association rules over interval data. 1997 ACM-SIGMOD Int. Management of Data (SIGMOD’97), pages 452–461, Tucson, AZ, May 1997.NLHP98R. Lakshmanan, J. Exploratory mining and pruning optimizations of constrained associations rules. 1998 ACM-SIGMOD Int.

Management of Data (SIGMOD’98), pages 13–24, Seattle, WA, June 1998. 5Data Mining: Concepts and TechniquesHan and Kamber, 2006Omi03E.

Alternative interest measures for mining associations. Knowledge and Data Engineering, 15:57–69, 2003.ORS98¨ B. Ramaswamy, and A. Cyclic association rules.

Data Engineering (ICDE’98), pages 412–421, Orlando, FL, Feb. Taouil, and L. Discovering frequent closed itemsets for association rules. Database Theory (ICDT’99), pages 398–416, Jerusalem, Israel, Jan. 1999.PCT+ 03F.

CARPENTER: Finding closed patterns in long biological datasets. 2003 ACM SIGKDD Int. Knowledge Discovery and Data Mining (KDD’03), pages 637–642, Washington, DC, Aug. An effective hash-based algorithm for mining association rules. 1995 ACM-SIGMOD Int. Management of Data (SIGMOD’95), pages 175–186, San Jose, CA, May 1995.PCY95bJ. Efficient parallel mining for association rules.

Information and Knowledge Management, pages 31–36, Baltimore, MD, Nov. Mining frequent itemsets with convertible constraints. Data Engineering (ICDE’01), pages 433–332, Heidelberg, Germany, April 2001.PHM00J.

CLOSET: An efficient algorithm for mining frequent closed itemsets. 2000 ACM-SIGMOD Int. Workshop Data Mining and Knowledge Discovery (DMKD’00), pages 11–20, Dallas, TX, May 2000.PHMA+ 01 J. Mortazavi-Asl, H. Dayal, and M.-C. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth.

Data Engineering (ICDE’01), pages 215–224, Heidelberg, Germany, April 2001. Notes of AAAI’91 Workshop Knowledge Discovery in Databases (KDD’91). Anaheim, CA, July 1991.RMS98S. Ramaswamy, S. Mahajan, and A. On the discovery of interesting patterns in association rules.

Very Large Data Bases (VLDB’98), pages 368–379, New York, NY, Aug. Srikant and R. Mining generalized association rules. Very Large Data Bases (VLDB’95), pages 407–419, Zurich, Switzerland, Sept.

Srikant and R. Mining sequential patterns: Generalizations and performance improvements. Extending Database Technology (EDBT’96), pages 3–17, Avignon, France, Mar. Mitbander, and C.

Metaqueries for data mining. Piatetsky-Shapiro, P.

Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 375–398. AAAI/MIT Press, 1996.SON95A. Omiecinski, and S.

An efficient algorithm for mining association rules in large databases. Very Large Data Bases (VLDB’95), pages 432–443, Zurich, Switzerland, Sept.

Omiecinski, and S. Mining for strong negative associations in a large database of customer transactions. Data Engineering (ICDE’98), pages 494–502, Orlando, FL, Feb. Silberschatz and A.

What makes patterns interesting in knowledge discovery systems. On Knowledge and Data Engineering, 8:970–974, Dec. 6Chapter 5 Mining Frequent Patterns, Associations, and CorrelationsBibliographic NotesSVA97R. Mining association rules with item constraints.

Knowledge Discovery and Data Mining (KDD’97), pages 67–73, Newport Beach, CA, Aug. Kumar, and J. Selecting the right interestingness measure for association patterns. 2002 ACM SIGKDD Int.

Knowledge Discovery in Databases (KDD’02), pages 32–41, Edmonton, Canada, July 2002.Toi96H. Sampling large databases for association rules. Very Large Data Bases (VLDB’96), pages 134–145, Bombay, India, Sept. TFP: An efficient algorithm for mining top-k frequent closed itemsets.

Knowledge and Data Engineering, 17:652–664, 2005.WHP03J. CLOSET+: Searching for the best strategies for mining frequent closed itemsets.

2003 ACM SIGKDD Int. Knowledge Discovery and Data Mining (KDD’03), pages 236–245, Washington, DC, Aug. Mining compressed frequent-pattern sets. Very Large Data Bases (VLDB’05), pages 709–720, Trondheim, Norway, Aug. Summarizing itemset patterns: A profile-based approach.

In submitted for publication, Feb. Fayyad, and P. Efficient discovery of error-tolerant frequent itemsets in high dimensions. 2001 ACM SIGKDD Int.

Knowledge Discovery in Databases (KDD’01), pages 194–203, San Fransisco, CA, Aug. 2001.YFM+ 97K. Morishita, and T. Computing optimized rectilinear regions for association rules. Knowledge Discovery and Data Mining (KDD’97), pages 96–103, Newport Beach, CA, Aug. Scalable algorithms for association mining.

Knowledge and Data Engineering, 12:372–390, 2000.ZH02M. CHARM: An efficient algorithm for closed itemset mining. 2002 SIAM Int. Data Mining (SDM’02), pages 457–473, Arlington, VA, April 2002.ZPOL97M. Parthasarathy, M.

Ogihara, and W. Parallel algorithm for discovery of association rules.

Data Mining and Knowledge Discovery, 1:343–374, 1997.7.

Download Data Mining: Concepts And Techniques (2nd Edition)

Author

Archives

Categories