A Glossary of Pattern Mining

Pattern mining is a popular research area in data mining, which aims at discovering interesting and useful patterns in data. It is a field of research that has been active for over 25 years and there is a lot of technical terms related to this field. Thus, in this blog post, I will provide a short glossary of key terms found in pattern mining papers.

  1. Antecedent: The left side of an association rule.
  2. Apriori Algorithm: Apriori is a frequent itemset mining algorithm used to identify frequent itemsets in a dataset. It is the first algorithm for that task.
  3. Association Rule: A rule that expresses the dependence between two itemsets.
  4. Association Rule Mining: A technique for discovering associations and relationships between items in a dataset
  5. Closed Episode: An episode that is not a proper subset of any other episode.
  6. Closed Frequent Itemsets: A set of itemsets that are frequent and contain no supersets that are also frequent.
  7. Closed Sequential Patterns: A set of sequences of items that are frequent and contain no supersets that are also frequent.
  8. Consequent: The right side of an association rule
  9. Eclat Algorithm: A frequent itemset mining algorithm used to identify frequent itemsets in a dataset.
  10. Episode: A collection of one or more items or events that appear in a sequence.
  11. Episode Rule: A rule that expresses the dependence between two episodes, or between events.
  12. Episode Rule Mining: A process of discovering patterns of relationships between events in a sequence, which have the form of rules.
  13. Episode Mining: The process of discovering patterns that appear in a single long sequence of events with timestamps
  14. Frequent Episode: An episode that appears in a dataset with a support greater than a given threshold.
  15. Frequent Itemset: An itemset (set of items) that appears in a dataset with a support greater than a given threshold.
  16. FP-Growth Algorithm: A frequent itemset mining algorithm used to identify frequent itemsets in a dataset.
  17. GSP Algorithm: A sequential pattern mining algorithm used to identify frequent patterns in a sequence of items. It is the first algorithm for that problem.
  18. Graph Database: A database that stores data in the form of graphs (multiple graphs).
  19. Graph Mining: The process of discovering patterns, trends, and relationships in graphs.
  20. High-Utility Itemsets: A set of itemsets with a high total profit associated with them.
  21. High-Utility Sequential Patterns: A set of sequences of items with a high total profit associated with them.
  22. Itemset Mining: The process of discovering patterns and relationships between items in a dataset.
  23. Itemset: A collection of one or more items that appear in a sequence.
  24. Lift: A measure of the strength of an association rule.
  25. Minimum Support: A parameter used to specify the minimum number of occurrences of an itemset or pattern for it to be considered frequent.
  26. Minimum Confidence: A parameter used to specify the minimum confidence of an association rule for it to be considered valid.
  27. Maximal Episode: An episode that is as long as possible in a sequence.
  28. Maximal Frequent Itemsets: A set of itemsets that are frequent and contain no subsets that are also frequent.
  29. Maximum Gap: A parameter used to specify the maximum gap between two items in a sequence for it to be considered a valid pattern.
  30. Maximum Length: A parameter used to specify the maximum length of a pattern for it to be considered valid.
  31. Maximal Sequential Patterns: A set of sequences of items that are frequent and contain no subsets that are also frequent.
  32. Maximum Window Size: A parameter used to specify the maximum size of a sliding window for it to be used for pattern mining.
  33. Periodicity Constraint: A parameter used to specify the minimum periodicity of an itemset or pattern for it to be considered frequent.
  34. Periodic Itemsets: A set of itemsets that occur frequently and have a consistent period of occurrence.
  35. Periodic Pattern Mining: The process of finding patterns that are regularly appearing over time in a sequence of events. This can be done using algorithms such as PFPM.
  36. Periodic Sequential Patterns: A set of sequences of items that occur frequently and have a consistent period of occurrence.
  37. Prefix Span Algorithm: A sequential pattern mining algorithm used to identify frequent patterns in a sequence of items. It is an important algorithm but faster algorithms have been developed such as CM-SPAM and CM-SPADE (2014), and others.
  38. Prefix-tree: A tree-like data structure used by algorithms such as FP-Growth to store information. The information can be transactions, itemsets or other information.
  39. Sequence Database: A collection of sequences that can be used for sequence rule mining.
  40. Sequential Patterns: A set of sequences of items that occur frequently in a dataset.
  41. Sequential Pattern Mining: The process of discovering patterns and relationships between sequences of items.
  42. Sequential Rule Mining: The task of finding relationships between events or symbols in sequences that have the form of rules.
  43. Subgraph: A graph that is part of another graph.
  44. Subgraph Mining: The process of finding subgraphs that are interesting in a single graph or a graph database.
  45. Subsequence: A subset of a sequence that appears in the same order.
  46. Supersequence: A sequence that contains all the elements of a sequence.
  47. Support: A measure of how often an itemset appears in a dataset.
  48. Temporal Sequence Mining: A process of discovering patterns in time-stamped sequences.
  49. Time-Gap Constraint: A parameter used to limit the maximum gap between two items in a sequence for it to be considered a valid pattern.
  50. Window Constraint: A parameter used to limit the size of a sliding window used to identify sequential patterns.

==
Philippe Fournier-Viger is a professor  and the founder of the open-source data mining software SPMF, offering more than 250 data mining algorithms. 

This entry was posted in Data Mining, Data science, Pattern Mining. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *