Datasets for Itemset, Sequence and Tree Mining
DOI10.5281/zenodo.3785365Zenodo3785365MaRDI QIDQ6708170FDOQ6708170
Dataset published at Zenodo repository.
Publication date: 4 May 2020
Copyright license: Creative Commons Attribution 4.0 International
There are three different datasets included, that can be used for itemset, sequence and tree mining methods. dense_db.zip contains various real itemset datasets like chess, connect, mushroom, pumsb, T10I4D100K, T40I10D100Kand so on, used in the papers on frequent, closed and maximal itemset mining. For example,Mohammed J. Zaki and Ching-Jui Hsiao. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 17(4):462478, April 2005. doi:10.1109/69.846291. Or Karam Gouda and Mohammed J. Zaki. Genmax: an efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223242, November 2005. doi:10.1007/s10618-005-0002-x. plandata.zip: Planning dataset for sequence mining. It was used in the paperMohammed J. Zaki, Neal Lesh, and Mitsunori Ogihara. PLANMINE: predicting plan failures using sequence mining. Artificial Intelligence Review, 14(6):421446, December 2000. Special issue on Applications of Data Mining.doi:https://doi.org/10.1023/A:1006612804250. cslogs.zip: The CSLOGS data was used for tree mining, e.g., inMohammed J. Zaki. Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 17(8):10211035, August 2005. Special issue on Mining Biological Data. doi:10.1109/TKDE.2005.125.
This page was built for dataset: Datasets for Itemset, Sequence and Tree Mining