(video) Minimal Correlated High Utility Itemsets with FCHM

This is a video presentation of the paper “Mining Correlated High-Utility Itemsets Using the bond Measure” about correlated high utility pattern mining using FCHM

More information about the FCHM algorithm are provided in this research paper:

Fournier-Viger, P., Zhang, Y., Lin, J. C.-W., Dinh, T., Le, B. (2018) Mining Correlated High-Utility Itemsets Using Various Correlation Measures. Logic Journal of the IGPL, Oxford Academic, to appear

The source code of FCHM and datasets are available in the SPMF software.

I will post videos about other high utility itemset mining algorithms in the near future, so stay tuned!

==
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

(Visited 108 times, 1 visits today)

Comments

(video) Minimal Correlated High Utility Itemsets with FCHM — 10 Comments

  1. Hello,
    Thank you for your informative explanation, I have a question :
    how can I calculate the accuracy of a high utility pattern mining algorithm after the generating the High Utility Patterns ?

    • Hi,
      thanks for reading, and welcome to the blog!

      In high utility itemset mining, there is no concept of accuracy. A high utility itemset mining algorithm takes as input a database and a minimum utility threshold. And the output is the set of all high utility itemsets. In other words, there is only one good solution for each high utility itemset mining task, and most algorithms for high utility itemset mining will always find that solution because they are “exact algorithms”. An exact algorithm means that the algorithm always exactly find the desired result. Because of that, there is no point to measure the accuracy.
      In fact, the goal of high utility itemset mining is to discover some knowledge to understand the data. The user sets the minimum utility patterns and finds the pattern. That’s all.

      This is different from other data mining tasks like classification. In classification, one can use a model like a neural network to do some predictions. Then you can calculate the percentage of correct predictions to get the accuracy. But in high utility itemset mining, the goal is not to make prediction.

      If you would use the high utility itemsets to do some predictions like some product recommendations then in that case you could calculate some accuracy. It is possible to do that but it would be another step AFTER doing high utility itemset mining, and then you would have to define how to make these predictions.

      By the way, there also exists some high utilitiy itemset mining algorithms that are “approximate” (do not guarantee to find the correct result). This is the case for example of algorithms based on genetic algorithms, particle swarm optimization (PSO) etc. For those algorithms, you could calculate how many patterns are missing from their output to calculate some accuracy.

      Hope that this is clear.

      Best regards,
      Philippe

      • OK, I have another question :
        Is there a diffrence between high utility pattern and high utility association rule ? or we just generate the rules from HUPs ?

        • Hi again,

          I first need to say that “pattern” is a general word that means something that you can find in the data. A “pattern” can be a high utility itemset, a high utility sequential pattern, a high utility association rule, etc. In other words, there are many types of patterns.

          Now, if you want to know the different between an itemset and an association rule, then yes, there is a difference. An itemset {A,B,C} means that A, B and C are appearing together (e.g. purchased together by a customer). An association rule is a bit different. It has the form A –> B, which means for example that if you buy A then you are likely to also buy B. Usually, how strong a rule is, is measured using some special measures like the confidence and lift.
          And generally, association rules are generated from the itemsets. Typically, an algorithm to find association rules will first find the itemsets, and then use these itemsets to generate the rules. So this is usually a two step process (finding the itemsets, and then using the itemsets to generate the rules).
          But there are also some algorithms for association rule mining that will directly generate the rules without generating the itemsets. This is also possible but less common.

          Most of the studies on association rule mining are designed to find frequent association rules rather than high utility association rules. I only saw a few papers on high utility association rules. If you want to also consider time, there are also a variation called high utility sequential rules.

          Best regards,

          • so, is High utility association rule mining a better research topic as it has less number of papers so there are more opportunities to research or it is wrong see it that way ?

          • Yes, you can see it that way. I mean to choose a topic, the most important is to think whether it is useful, and then if you can do something new. High utility association rule mining is useful I think. No problem about that. And since there are not so many papers, you can certainly do something in that area. So yes, I think it is a good topic.

Leave a Reply

Your email address will not be published. Required fields are marked *