In this blog post, I will answer a question that I have received in my e-mail about what is the difference between sequential pattern mining and sequence prediction. I think that this is a good question and sharing the answer can help to clarify some concepts for some people.
Generally speaking, the goal of sequential pattern mining is to find some patterns that appear in many sequences of symbols. For example, lets say that you have some sequences of purchases made by customers in a retail store. You can then apply a sequential pattern mining algorithm to find sequential patterns, that is to know what are some sequence of purchases that are common to many customers. For example, you may find that <harrypotter1, spiderman, batman> is a sequential pattern. This pattern means that many people have bought the movie Harry potter 1, and then Spiderman, and then Batman. If you find such patterns, it can help you to understand the data. If you are the retail store manager, you may use such pattern to take some business decisions such as to offer some discount to customers on Batman if they previously buy harrypotter and spiderman.
But there are many other usages of sequential patterns. You can also use the sequential patterns to make some sequence prediction. For example, if someone buys Harry Potter 1 and Spiderman, you may predict that he will buy Batman based on the above sequential pattern. This can be used to perform recommendation
Another example about the applications of sequential pattern mining is to find patterns in text documents. A text document is a set of sentences, and each sentences is a sequence of words. Thus, you can apply a sequential pattern mining algorithm to find the sequential patterns that tell you some frequent sequence of words appearing many times in a book. This can tell you about some writing patterns used by some authors, and you can even use these patterns to try to guess who is the author of some anonymous book (if you are curious, I actually did that in a paper: http://www.philippe-fournier-viger.com/FLAIRS2016__AUTHORSHIP_ATTRIBUTION.pdf).
On the other hand, the goal of sequence prediction is to predict what is the next symbol of a sequence of symbols. For example, some people buy the movies Harry Potter 1, Hulk, Batman, and then Star Wars, and we want to know what is the next movie that this person will buy? There are many ways to do sequence prediction. One way is to use the sequential patterns or a variation called sequential rules. For example, we did sequence prediction using sequential rules in apaper to predict the next webpage that someone will click: http://www.philippe-fournier-viger.com/sequential_rules_prediction_2012.pdf
But there are also many other models for sequence predictions that do not rely on sequential patterns like the CPT and CPT+ models (video presentation here: https://data-mining.philippe-fournier-viger.com/video-sequence-prediction-with-the-cpt-and-cpt-models/) , the all-k order markov model, the DG model, TDAG, and LZ78.
Thus, to summarize, the goal of sequential pattern mining is to find patterns. You can find these patterns in data for multiple purpose. It can be just to understand the data and learn something about it. It can be to use these patterns to do sequence prediction, or other tasks like clustering, authorship attribution, etc. Thus, sequential pattern mining has many applications and sequence prediction is one of them. And the goal of sequence prediction is to predict the next symbol of a sequence. There are many methods to do sequence prediction and sequential pattern mining is one of them.
Hope that his short answer will be helpful. Some additional blog posts that I wrote on these topics:
- An Introduction to Sequence Prediction
- An Introduction to Sequential Pattern Mining
- An Introduction to Sequential Rule Mining
- (Video) Sequence prediction with the CPT and CPT+ Models
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 180 data mining algorithms.