An Introduction to Sequence Prediction

In this blog post, I will give an introduction to the task of sequence prediction,  a popular data mining/machine learning task, which consist of predicting the next symbol of a sequence of symbols. This task is important as it have many real-life applications such as webpage prefetching and product recommendation.


What is a sequence?

Before defining the problem of sequence prediction, it is necessary to first explain what is a sequence. A sequence is an ordered list of symbols. For example, here are some common types of sequences:

  • A sequence of webpages visited by a user, ordered by the time of access.
  • A sequence of words or characters typed on a cellphone by a user, or in a text such as a book.
  • A sequence of products bought by a customer in a retail store
  • A sequence of proteins in bioinformatics
  • A sequence of symptoms observed on a patient at a hospital

Note that in the above definition, we consider that a sequence is a list of symbols and do not contain numeric values.  A sequence of numeric values is usually called a time-series rather than a sequence, and the task of predicting a time-series is called time-series forecasting. But this is another topic.

What is sequence prediction?

The task of sequence prediction consists of predicting the next symbol of a sequence based on the previously observed symbols. For example, if a user has visited some webpages A, B, C, in that order, one may want to predict what is the next webpage that will be visited by that user to prefetch the webpage.


An illustration of the problem of sequence prediction

There are two steps to perform sequence prediction:

  1. First,  one must train a sequence prediction model using some previously seen sequences called the training sequences.  This process is illustrated below:seq_model

    For example, one could train a sequence prediction model for webpage prediction using the sequences of webpages visited by several users.

  2. The second step is to use a trained sequence prediction model to perform prediction for new sequences (i.e. predict the next symbol of a new sequence), as illustrated below.seq_model2For example, using a prediction model trained with the sequences of webpages visited by several users, one may predict the next webpage visited by a new user.

An overview of state-of-the-art sequence prediction models

Having defined what are the main sequence prediction models, that could be used in an application?

There are actually numerous models that have been proposed by researchers such as DG, All-k-order Markov, TDAG, PPM, CPT and CPT+.  These models utilize various approaches. Some of them uses for example neural networks, pattern mining and a probabilistic approach.

How to determine if a sequence prediction model is good?

Various sequence prediction models have different advantages and limitations, and may perform more or less well on different types of data. Typically a sequence prediction model is evaluated in terms of criteria such as prediction accuracy, the memory that it uses and the execution time for training and performing predictions.

Several benchmark have been done in the literature to compare various models. For example, here is a recent benchmark performed by my team in our PAKDD 2015 paper about sequence prediction with CPT+accuracy

In this benchmark, we compare our proposed CPT+ sequence prediction model with several state-of-the-art models on various types of data.  Briefly, BMS, MSNBC, Kosarak and Fifa are sequences of webpages. SIGN is a sign-language dataset. Bible word and Bible char are datasets of sequences of Words and characters. As it can be seen, for this type of data at least, CPT+ greatly outperform other models.  There are several reasons. One of them is that several models such as DG assume the Markovian hypothesis that the next symbol only depends on the previous symbols. Another reason is that the CPT+ model use an efficient indexing approach to consider all the relevant data for each prediction (see details in the paper).

Where can I get open-source implementations of sequence prediction models?

Some open-source and Java implementation of the seven discussed sequence prediction models (DG, AKOM, TDAG, LZ78, PPM, CPT, CPT+) can be found in the SPMF open-source data mining library, which includes the implementations from the IPredict project.

There are extensive documentation about how to use these models on the SPMF website. Here I will provide a quick example that shows how the CPT model can be easily applied with just a few lines of code.

               // Phase 1:  Training
                // Load a file containing the training sequences into memory
		SequenceDatabase trainingSet = new SequenceDatabase();
		trainingSet.loadFileSPMFFormat("training_sequences.txt", Integer.MAX_VALUE, 0, Integer.MAX_VALUE);

		// Train the prediction model
                String optionalParameters = "splitLength:6 recursiveDividerMin:1 recursiveDividerMax:5";
		CPTPredictor predictionModel = new CPTPredictor("CPT", optionalParameters);

		// Phase 2: Sequence prediction
                // We will predict the next symbol after the sequence <1,4>
		Sequence sequence = new Sequence(0);
		sequence.addItem(new Item(1));
		sequence.addItem(new Item(4));
		Sequence thePrediction = predictionModel.Predict(sequence);
		System.out.println("For the sequence <(1),(4)>, the prediction for the next symbol is: +" + thePrediction);

Thus, without going into the details of each prediction models, it can be seen that it is very easy to train a sequence prediction model and use it to perform predictions.


This blog post has introduced the task of sequence prediction, which has many applications. Furthermore, implementations of open-source sequence prediction models have been presented.

Philippe Fournier-Viger is a full professor  and the founder of the open-source data mining software SPMF, offering more than 110 data mining algorithms. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

Related posts:

This entry was posted in Big data, Data Mining, Data science, Research and tagged , , , . Bookmark the permalink.

4 Responses to An Introduction to Sequence Prediction

  1. Great! Thanks a lot

  2. kar says:

    Great article!
    Quick question..
    Is sequential pattern mining a prerequisite for sequence prediction? A first step for sequence prediction could be to identify the various possible sequences, so can we use the sequence pattern mining for that?

    • Hello,

      Thanks. Yes you could use sequential patterns or sequential rules. Some people have done that. For example, in my ADMA paper from 2012, I have compared different types of sequential rules for sequence prediction:

      Fournier-Viger, P. Gueniche, T., Tseng, V.S. (2012). Using Partially-Ordered Sequential Rules to Generate More Accurate Sequence Prediction. Proc. 8th Intern. Conference on Advanced Data Mining and Applications (ADMA 2012), Springer LNAI 7713, pp. 431-442.

      However, a problem with this idea of using patterns for predictions, is that you need to set the minimum support. If you set this parameter to 20% for example, then you will miss all the rare cases that have a support below 20 %, and thus the prediction for these rare cases may not be good. But rare cases are also important if you want your model to have a high prediction accuracy. In other words, using patterns for predictions can be done, but it will work best for frequent cases and may not work well for rare cases because no pattern will be available. In some case, if you don’t have patterns, you cannot even make a prediction.

      For this reason, in my work after that, my student T. Gueniche have stopped using patterns and have designed CPT and CPT+. The idea of these models is that we do not need patterns because if we use patterns we loose information. Instead, we keep all the sequences and use some special structure to quickly find all the relevant sequences for making a given predictions. Thus, this approach can work both for frequent and rare cases. CPT and CPT+ were published in these papers:

      Gueniche, T., Fournier-Viger, P., Raman, R., Tseng, V. S. (2015). CPT+: Decreasing the time/space complexity of the Compact Prediction Tree. Proc. 19th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD 2015), Springer, LNAI9078, pp. 625-636.

      Gueniche, T., Fournier-Viger, P., Tseng, V. S. (2013). Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction. Proc. 9th Intern. Conference on Advanced Data Mining and Applications (ADMA 2013) Part II, Springer LNAI 8347, pp. 177-188.

      Using CPT and CPT+ we obtain better prediction results than using sequential patterns/rules.

      So in my opinion, it is better to use an approach without patterns to get the best predictions.

  3. Pingback: Making Predictions with Sequences | A bunch of data

Leave a Reply

Your email address will not be published. Required fields are marked *