A Glossary of High Utility Pattern Mining

Today, I present a glossary of key terms usedin high utility itemset mining. This glossary will be useful to researchers and practitioners working on this important subtopic of pattern mining. By the way, if you are new to this research area, you may find it useful to first read the survey paper that I wrote on this topic, or to watch my video lectures on this topic on my Youtube Channel. Also, you may want to check out the list of key research papers on high utility itemset mining that I made, which can also be useful to get started on this topic or to discover some topics.

Closed high utility itemset: A high utility itemset that has no superset with the same utility. For example, if {bread, milk} and {bread, milk, cheese} have the same utility, then {bread, milk} is not closed but {bread, milk, cheese} is closed. The first algorithm for this problem is CHUD. Then, many more algorithms were developed to improve the performance.
Correlated high utility itemset: A correlated high utility itemset is a set of items that not only yield a high utility but is also highly correlated. Various measures of correlations can be used such as the bond, and all-confidence.
Cost-effective pattern mining: A variation of the high utility itemset mining task where the goal is to find itemsets that have a high utility but a low cost.
Cross-level high utility itemset: This is variant of the high utility itemset mining problem where items have categories such as “electronic products” or “food”. Then, the goal is to find itemsets that have a high utility but which may contain categories. The first algorithm for the general problem of cross-level high utility itemset is CLH-Miner. Another is TKC.
External utility: A measure of the importance of each item. It is usually given by the user or domain expert. For example, in the context of a shopping database, the external utility may represent the profit that is generated by each item (e.g. each apple yield a 1$ profit, while each orange yield a 2$ profit)
Frequent itemset mining (FIM): A data mining task that finds all sets of items that appear in at least a minimum number of transactions in a transaction database. It is a special case of high utility itemset mining.
High-utility association rule mining: A data mining task that extends HUIM by finding association rules that have a high confidence and a high utility value. An association rule is an implication of the form X -> Y, where X and Y are itemsets.
High average-utility itemset: An itemset whose average-utility is no less than a minimum average-utility threshold set by the user. The average-utility of an itemset is its utility divided by its size (number of items).
High-utility and diverse itemset mining: A data mining task that extends HUIM by incorporating a notion of diversity to find patterns that cover different categories of items. The diversity of an itemset is defined as the number of distinct categories covered by its items divided by the total number of categories.
High-utility episode mining: A data mining task that extends HUIM by finding episodes that have a high utility value. An episode is a partially ordered set of events that frequently occur together in an event sequence. An event sequence is a sequence of events, where each event has a type and a timestamp. Several algorithms have been proposed for this task such as UP-SPAN and HUE-SPAN.
High-utility graph pattern mining: A data mining task that extends HUIM by finding graph patterns that have a high utility value. A graph pattern is a subgraph that frequently appears in a graph database. A graph database is a database where each transaction is a graph, which is a set of nodes and edges. A representative algorithm for this problem is UGMINE.
High-utility itemset embedding: Some algorithms like HUI2Vec have been designed to use high utility itemsets to create embeddings of transaction databases. This is a recent topic.
High utility quantitative itemset mining: A data mining task that extends HUIM to find high utility itemsets with additional information about the quantities that are usually purchased for each itemset. This is a harder problem. There exist various algorithms for this problem such as FHUQI-Miner, HUQI-Miner and VHUQI.
High utility itemset: An itemset whose utility is no less than a minimum utility threshold set by the user. For example, if the minimum utility threshold is 10, then {bread, milk} is a high utility itemset if its utility is 10 or more
High utility itemset mining (HUIM): A data mining task where the goal is to finds sets of values (itemsets) in a transaction database that have a high importance, and where the importance is measured by a utility function. The typical application of HUIM is to find the sets of products that are purchased together and yield a high profit in transactions made by customers. But also many other applications exist. There exists many algorithms such as HAMM, REX, UP-Growth, EFIM, HUI-Miner, FHM, ULB-Miner, Two-Phase and others.
High-utility rare itemset mining: A data mining task that combines HUIM and rare itemset mining. It aims to find itemsets that have a high utility value but a low support value in a transaction database. It can be useful for finding profitable but infrequent patterns.
High-utility sequential pattern mining: A data mining task that extends HUIM by finding sequential patterns that have a high utility value. A sequential pattern is an ordered list of itemsets that frequently appear in a sequence database. A sequence database is a database where each transaction is associated with a timestamp or an order. A survey of high utility sequential pattern mining is here.
High utility sequential rule mining: A problem where the goal is to find temporal rules that have a high utility and indicate that if someone buy something, they will buy something else after. The first algorithm for mining high utility sequential rules is HUSRM.
Internal utility: A measure of the quantity of an item in a transaction (record). For instance, in the case of analyzing shopping data, the internal utility in a customer transaction is the quantities of items. Thus, a transaction may indicate, for example, that a customer has bought 4 apples, and 3 breads, and the internal utility of apple and bread in that transaction is 4 and 3, respectively.
Itemset: A set of items. For example, {bread, milk} is an itemset containing two items.
Local high utility itemset: A high utility itemset that is found in a database where transactions have timestamps. More precisely a local high utility itemset is an itemset that has a high utility in a time interval but not necessarily in the whole database. For example, some products may only yield a high profit during the summer in a store. LHUI-Miner is the first algorithm for this problem. A variation is also peak high utility itemset mining, described in the same paper.
Maximal high utility itemset: A high utility itemset that has no superset that is also a high utility itemset. For example, if {bread, milk} is a high utility itemset but {bread, milk, cheese} is not, then {bread, milk} is maximal. The first algorithm for this problem is CHUI-Miner(max).
Minimal High Utility Itemsets. A minimal high utility itemset is an itemset that is not a superset of another high utility itemsets. The first algorithm for this problem is MinFHM.
One-phase HUIM algorithm: A HUIM algorithm that can mine high utility itemsets without generating candidates. Several algorithms of this type such as HUI-Miner and FHM rely on the utility-list structure to efficiently calculate the utility of itemsets and prune the search space.
Pattern-growth algorithm: An algorithm that uses a divide-and-conquer strategy to find high utility itemsets. It recursively constructs conditional databases to explore the search space by performing a database projection operation. Some algorithms of this type are IHUP, UP-Growth, EFIM, HAMM and REX.
Periodic high utility itemset: An itemset that periodically appears over time and has a high utility. For example, a customer that buy cheese and bread every week. The first algorithm for this problem is PHM.
Pruning strategy: A technique that reduces the search space by removing unpromising items or itemsets from consideration. The goal of using pruning strategy in an algorithm is to improve performance (solve the problem more quickly or using less memory).
Quantitative database: A transaction database where each item has a quantity and an external utility value. The quantity of an item represents how many units of that item are present in a transaction. The external utility of an item represents the amount of profit yield by the sale of one unit of that item.
Stable high utility itemset: An itemset that has a utility that remains more or less stable over time. See the paper by Acquah Hackman for more details.
Support: The number of transactions that contain an itemset. For example, the support of {bread, milk} is 3 if it appears in three transactions1.
Top-K high utility itemset mining: A variant of HUIM that finds the K most profitable itemsets in a database without requiring the user to set a minimum utility threshold. There are many algorithms for this problem. The first one is TKU. Then, more efficient algorithms were proposed such as TKO.
Transaction. A transaction is a set of items bought by a customer. More generally, it can also be viewed as a database record containing some values.
Transaction database: A database containing a set of transactions made by customers.
Transaction utility (TU): The sum of the utilities of all items in a transaction. For example, the TU of a transaction where bread and milk are bought with quantities 5 and 2 respectively and profit units 0.5 and 1 respectively is (5 x 0.5) + (2 x 1) = 4.5.
Transaction-weighted utilization (TWU): An upper bound on the utility that can be used to reduce the search space for high utility itemset mining. There exists several upper bounds. The TWU is not the tightest upper bound but it is the most popular and has been used by many algorithms.
Two-phase HUIM algorithm: A HUIM algorithm that consists of two phases. In the first phase, it generates candidates by applying some pruning strategies based on some upper bounds on the utility. In the second phase, it scans the database again to calculate the exact utility of each candidate and filter out those that are not high utility. Some popular two-phase algorithms are Two-Phase, IHUP and UP-Growth.
Utility: A measure of importance or profit of an item or an itemset. It is typically calculated by multiplying the quantity and the external utility of an item in a transaction. But other utility functions could also be used.
Utility-list: A data structure that stores the utility information of an itemset in a compact way. It consists of a list of tuples, each containing the transaction identifier, the utility of the itemset in that transaction, and the remaining utility of the items after that itemset in that transaction. It is used by many algorithms such as HUI-Miner, FHM, ULB-Miner and many others.
Utility threshold: A user-defined parameter that specifies the minimum utility value for an itemset to be considered as high utility. It can be either absolute or relative. An absolute utility threshold is a fixed value, while a relative utility threshold is a percentage of the total utility of the database.

Hope that this is useful. If you think that I should add more definitions, you may suggest some in the comment section below!

If you want to try high utility itemset mining, you can also check the SPMF open-source software, which offers the most complete collection of such algorithms, and also provides datasets and other resources for carrying research on this topic.

—
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 120 data mining algorithms.

Report about the KDD 2018 conference

SPMF’s architecture (2) The Main class and the Command Processor

Discovering Alarm Correlation Rules for Network Fault Management (video)

A Glossary of High Utility Pattern Mining

Related posts:

One Response to A Glossary of High Utility Pattern Mining

Leave a Reply Cancel reply

Archives

Categories

Recent Posts

Recent Comments

Number of visitors:

A Glossary of High Utility Pattern Mining

Related posts:

One Response to A Glossary of High Utility Pattern Mining

Leave a Reply Cancel reply

Archives

Categories

Recent Posts

Recent Comments

Tag cloud

Number of visitors: