Today, I publish a small quiz to test your knowledge about pattern mining. There are 10 questions, which are then followed by the answers. Those questions are not very hard 😉 How many can you get right?
Questions
- What is the goal of pattern mining?
- What is the difference between frequent pattern mining and sequential pattern mining?
- What is the Apriori algorithm used for?
- What is the difference between association rule mining and correlation analysis?
- What is the minimum support threshold in frequent pattern mining?
- What is the minimum confidence threshold in association rule mining?
- What is a closed pattern?
- What is the difference between itemset mining and sequence mining?
- What is the difference between vertical data format and horizontal data format in itemset mining?
- What is the difference between objective measures and subjective measures in pattern evaluation?
Answers
- The goal of pattern mining is to discover interesting, previously unknown patterns from large datasets.
- Frequent pattern mining focuses on finding frequently occurring patterns within a dataset, while sequential pattern mining focuses on finding frequently occurring patterns where the order of items matters.
- The Apriori algorithm is used for frequent itemset mining.
- Association rule mining focuses on finding relationships between items within a dataset, while correlation analysis focuses on finding relationships between variables within a dataset.
- The minimum support threshold in frequent pattern mining is the minimum number of times a pattern must occur within a dataset to be considered frequent.
- The minimum confidence threshold in association rule mining is the minimum probability that an item will appear in a transaction given that another item appears in that transaction.
- Closed patterns are patterns that cannot be extended by adding more items without decreasing their support. Some related concepts are generator patterns and maximal patterns.
- Itemset mining focuses on finding sets of items that frequently occur together within a dataset, while sequence mining focuses on finding sequences of items that frequently occur together within a dataset.
- In itemset mining, the vertical data format stores data as a list of transactions, where each transaction (record) contains a list of items, while horizontal data format stores data as a list of items, where each item contains a list of transactions in which it appears.
- Objective measures evaluate patterns based on statistical properties of the data, while subjective measures evaluate patterns based on their usefulness or interestingness to the user.
Tell me how many questions you answered right in the comment section below!
—
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 250 data mining algorithms.
I only got two…
That is not bad! Some of these questions are quite technical.
Pardon me. But I think the answer number 9 is not true. I believe it is the opposite. Please correct me if I am wrong.