I have added two new webpages offering tools to count the frequency of each word and ngrams (consecutive sequences of words) in a text document. These webpages can be found here:
The Word Frequency Counter
First, let me show you the Word Frequency Counter so that you know what it can do.
The user interface is very user-friendly and simple. It is shown below:
To use the tool, we must first select a file to upload, and then select some options such as to ignore case and puncutations. We can also choose the type of output format such as the CSV format, tab-separated format, and SPMF format. Then we must click the “calculate frequency button” to run the program.
For instance, lets say that I try with a small text file from my computer called TODO.txt. The results look like this:
We can see the words with their frequencies, sorted by descending order of frequency. In this case, I chose the default output format. But other output format are offered:
If I choose CSV format, the results look like this:
If I choose the Tab-separated format, the results look like this:
And if I choose the SPMF format, the results look like this:
The Ngram Frequency Counter
Now let me show you the Ngram Frequency Counter. The user interface is very similar, except that we have an additional parameter which is the ngram size. For example, if we set the ngram size to 3, the tool will count the frequency of all sequences of 3 words (that are consecutive in a text document).
Let me show you an example, with the same example text file and an n-gram size of 3:
Now, let’s say that I change the ngram size to 4, here is the result:
As for the other tools, we can also change the output format.
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 250 data mining algorithms.