Key Papers about High Utility Itemset Mining

In this blog post, I will talk about the most important algorithms forĀ high utility itemset mining. I will present a list of algorithms that is of course subjective (according to my opinion). I did not list all the papers of all authors but I have listed the key papers that I have read and found interesting, as they introduced some innovative ideas. This list is partly based on my survey of high utility itemset mining. To help you while reading these papers, you may also check my glossary of high utility itemset mining, which explain the key terms used in that research field.

Author / DatePaper titleKey idea
Yao 2004A Unified Framework for Utility-based Measures for Mining
Itemsets
– This paper defined the problem of high utility itemset mining.
Liu 2005A two-phase algorithm for fast discovery of high
utility itemsets
– The first complete algorithm for high utility itemset mining, named Two Phase.
– Two Phase is an Apriori based algorithm, and it does two phases to find high utility itemsets
– Introduced the TWU upper bound for reducing the search space.
Ahmed 2009Efficient Tree Structures for High-utility
Pattern Mining in Incremental Databases
– The first FP-Growth based algorithm for high utility itemset mining, named IHUP.
– Can discover high utility itemset incrementally.
Wu 2011 / 2013Efficient algorithms for mining high utility
itemsets from transactional databases
– Improved FP-Growth algorithms for high utility itemset mining, called UP-Growth and UP-Growth+.
– These algorithms adopt several strategies to reduce the TWU upper bound.
– The paper was published in KDD and then in TKDE, which gave a high visibility to high utility itemset mining.
Liu 2012Mining high utility itemsets without candidate generation– Presented HUI-Miner, the first algorithm for mining high utility itemset mining in one phase. This has revolutionized this field as all previous algorithms were using two phases. Speeds up of 10 to 100 times can be obtained compared to the previous best algorithms
– Introduced the remaining utility upper bound which is tighter than the TWU.
– HUI-Miner is based on Eclat
Fournier-Viger 2014FHM: Faster high-utility itemset mining
using estimated utility co-occurrence pruning
– Presented a fast vertical algorithm for high utility itemset mining, called FHM that adopts a new technique called co-occurrence pruning. This can further speed up the task by 5 to 10 times.
-FHM is based on HUI-Miner, and was shown to outperform it.
Fournier-Viger 2015 / 2016EFIM: A Highly Efficient
Algorithm for High-Utility Itemset Mining
– Another major performance improvement.
– This paper presented EFIM a novel algorithm that mines high utility itemsets almost in linear time and space.
– Introduced several new ideas for high utility mining like transaction merging and using arrays for utility and upper bound counting.
– The sub-tree utility upper bound can be tighter than the upper bounds of HUI-Miner and FHM.
– This algorihtm is inspired by HUI-Miner and LCM.
Duong 2017Efficient High Utility
Itemset Mining using Buffered Utility-Lists
– Proposed to reduce the memory usage of HUI-Miner and FHM based algorithms using the concept of buffered utility-lists.
– The modified algorithm is called ULB-Miner
Qu 2020Mining High Utility Itemsets Using Extended Chain Structure and Utility Machine– Proposed the REX algorithm, a one phase algorithm, which adopts new strategies such as k-item utility machine and a switch strategy.
Tseng 2013 /2015Efficient Algorithms for Mining Top-K High
Utility Itemsets
– Tseng proposed the task of top-k high utility itemset mining where the user directly set the number k of patterns to be found.
– In this journal version of the paper, a fast one-phase algorithm called TKO is presented, which extends HUI-Miner, and beat the TKU algorithm presented in the conference paper.
Fournier-Viger 2014Novel Concise Representations of High Utility
Itemsets using Generator Patterns
– To reduce redundancy, this paper proposed to discover a subsets of all high utility itemsets called the generators of high utility itemsets.
– An algorithm called GHUI-Miner is presented based on FHM.
– It can be argued that these itemsets are more useful in some case than all high utility itemsets.
Wu 2019Efficient algorithms for high utility
itemset mining without candidate generation
– This paper presented an algorithm called CHUI-Miner for discovering the maximal high utility itemset.
– This algorithm is based on HUI-Miner.
– Maximal itemsets are the largest one. Discovering them can greatly reduce the number of patterns shown to the user.
Fournier-Viger 2016FHM+: Faster High-Utility Itemset Mining using Length Upper-Bound Reduction– This paper makes the observation that finding very long patterns is unecessary.
-Thus an optimized algorithm called FHM+ is presented to reduce the upper-bounds and gain better performance when searching for high utility itemset using a length constraint.
– FHM+ is based on FHM
Fournier-Viger 2016PHM: Mining Periodic High-Utility
Itemsets
– This paper introduce the concept of periodic patterns in high utility itemset mining.
– The goal is not only to find patterns that have a high utility but also patterns that appear periodically over time. For example, one may find that a customer periodically purchase beer andwine every week or so.
– The PHM algorithm was presented, which is inspired by HUI-Miner and PFPM.
Wu 2015Mining Closed+ High Utility Itemsets
without Candidate Generation
– This is the first paper on closed high utility itemset mining.
– This paper introduced the CHUD algorithm, which is inspired by DCI_Closed.
Closed itemsets allows to obtain a small set of high utility itemsets that provides concise information about all high utility itemsets (a summary).
– There have been several more efficient algorithms after that such as EFIM-Closed and CLS-Miner. However, CHUD is the first one.
Fournier-Viger 2015Mining Minimal High Utility Itemsets-This paper introduced a FHM-based algorithm called MinFHM to find the high utility itemsets that are minimal (not included in larger high utility itemsets).
– This can be useful for some applications.
Hong 2009Mining High Average-Utility Itemsets– This paper has introduced the problem of high average utility itemset mining.
– There has been many algorithms on this topic afterward. The utility is divided by the length of a pattern to avoid finding patterns that are too long.
– The TPAU and PBAU algorithms were presented which are inspired by Two-Phase, Apriori and Eclat.
Truong 2018Efficient Vertical Mining of High
Average-Utility Itemsets based on Novel Upper-Bounds
– This paper introduced the concept of vertical upper bounds in high average utility itemset mining. This has provided a major performance boost.
– The dHAUIM algorithm was presented, and published in TKDE, a top data mining journal.
Yin 2012USpan: an efficient algorithm for mining high utility sequential
patterns
– This paper presented an algorithm USpan for high utility sequential pattern mining, which is a related task that aims to find high utility patterns in sequence.
– It is not the first algorithm for this problem, but it was published in KDD and is arguably the most popular. Thus, I have selected it.
Lin 2015Mining high utility itemsets in big data– The first algorithm for mining high utility itemsets using a big data framework (Hadoop).
Zida 2015Efficient Mining of High
Utility Sequential Rules
– An algorithm named HUSRM to find high utility sequential rules.
– This topic is similar to high utility sequenital patterns mining but rules are found that have a high confidence.
Cagliero 2017Discovering high utility itemsets at multiple abstraction levels– The first paper to use a taxonomy of items for multi-level high utility itemset mining.
– The ML-HUI-Miner algorithm is an extension of HUI-Miner.
Fournier-Viger 2020Mining cross-level high utility itemsets– This paper has generalized the paper of Cagliero 2017 so as to find cross-level high utility itemsets (itemsets containing items from any abstraction levels of a taxonomy).
– The proposed CLH-Miner algorithm extends FHM. A top-k version of CLH-Miner called TKC was also proposed in another paper.
Chu et al., 2009An efficient algorithm for mining high utility itemsets with negative
item values in large databases
– Most algorithm for high utility itemset mining suppose that utility must be a positive number (e.g. amount of money).
– This is the first paper to consider that the utility can also be negative (e.g. selling an item at a loss in a supermarket).
– The HUI-NIV-Mine algorithm was designed for this task. It is a two phase algorithm inspired by Two-Phase.
Goyal 2015Efficient Skyline Itemsets Mining– This paper presented the first algorithm that mine skyline high utility itemsets.
– The idea is to find patterns that are not dominated by other patterns by considering their support and utility to find a Pareto front.
– Other more efficient algorithms were proposed later.
Nouioua 2021FHUQI-Miner: Fast High Utility Quantitative Itemset
Mining
– This paper presents the current state-of-the-art algorithm for high utility quantitative itemset mining.
– In this problem, patterns contains quantities. For example, a high utility itemset may say that a customer buys 2 to 5 breads with 1 or 2 bottles of milk.
– The original problem was proposed by Yen (2007) but this is the newest algorithm, based on FHM.
Kannimuthu1 2014Discovery Of High Utility Itemsets Using Genetic Algorithm With Ranked Mutation– This is one of the first heuristic algorithm for high utility itemset mining.
– It utilizes a genetic algorithm to find an approximation of all high utility itemsets.
– After that many algorithms have used other heuristics in recent papers.
Wu 2013Mining High Utility Episodes in Complex Event Sequences– Proposed US-Span, the first algorithm for finding high utility episodes, that is subsequences of high utility in a sequence of events.
Fournier-Viger, 2019HUE-SPAN: Fast High Utility Episode Mining.– Proposed a faster algorithm based on HUI-Miner, called HUE-SPAN for mining high utility episodes.
Fournier-Viger 2019Mining Local and Peak High Utility Itemsets– Proposed to consider the time dimension to find peak high utility itemsets and local high utility itemsets, that is itemsets that have a high utility in some non predefined time intervals (e.g. some products may yield a high product during Christmas time only).
– The LHUI-Miner algorithm and PHUI-Miner algorithm are variation of the FHM algorithm.
Fournier-Viger 2020Mining correlated high-utility itemsets using various measures– This paper aims to find correlated high utility itemsets, that is itemsets that not only have a high utility (importance) but also contains items that are highly correlated. This is to avoid finding patterns that have a high utility but just appear together by chance.
– Two measures from frequent itemset mining are incorporated into high utility itemset mining called the bond and all-confidence.
– The designed algorithm FCHM-Miner is an extension of the FHM algorithm.

Survey and videos

I wrote an easy to understand survey about high utility itemset mining, if you are interested by this topic, you may find it useful. Also, you may be interested in watching my video lectures on this topic on Youtube.

Conclusion

In this blog post, I have listed some key papers about high utility itemset mining. As I said above, this list is based on my opinion. But I think it can be useful. Hope you have enjoyed this post.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

This entry was posted in Big data, Data Mining, Data science, Pattern Mining, Utility Mining and tagged , , , , , , , . Bookmark the permalink.

One Response to Key Papers about High Utility Itemset Mining

  1. Pingback: An Introduction to High-Utility Itemset Mining | The Data Mining Blog

Leave a Reply

Your email address will not be published. Required fields are marked *