Finding a Data Scientist Unicorn or building a Data Science Team?

In recent months/years, many blog posts have been trending on the social Web about what is a “data scientist“, as this term has become very popular.  As there is much hype about this term, some people have even jokingly said that a “data scientist is a statistician who lives in San Francisco“.

In this blog post, I will talk about this recent discussion about what is a data scientist, which has led some people to claim that there is some easy signs to recognize a bad data scientist or a fake data scientist. In particular, I will discuss the blog post “10 signs of a bad data scientists”  and explain why this discussion is going the wrong way.

10 signs of a bad data scientists
In this blog post, authors claim that a data scientists must be good a math/statistics, good at coding, good at business, and know most of the tools from Spark, Scala, Pyhthon, SAS to Matlab.

What is wrong with this? The obvious is that it is not really necessary to know all these technologies to analyze data. For example, a person may never have to use Spark to analyze data and will rarely use all these technologies in the same environment. But more importantly, this blog post seems to imply that a single person should replace a team of three persons: (1)  a coder with data mining experience, (2) a statistician, and (3) someone good at business.  Do we really need to have a person that can replace these three persons? The problem with this idea is that a person will always be stronger on one of these three dimensions and weaker on the two other dimensions. Having a person that possess skills in these three dimensions, and is also excellent in these three dimensions is quite rare. Hence, I here call it the  data scientist unicorn, that is a person that is so skilled that he can replace a whole team.

The data scientist unicorn

The data scientist unicorn

In my opinion, instead of thinking about finding that unicorn, the discussion should rather be about creating a good data science team, consisting of the three appropriate persons that are respectively good at statistics, computer sciences, and business, and also have a little background/experience to be able to discuss with the other team members. Thus, perhaps that we should move the discussion from what is a good data scientist to what is a good data science team.

A data science team

A data science team

An example

I will now discuss my case as an example to illustrate the above point that I am trying to make. I am a researcher in data mining. I have a background in computer science and I have worked for 8 years on designing highly efficient data mining algorithms to analyze data.  I am very good at this, ( I am even the founder of the popular Java SPMF data mining library). But I am less good at statistics, and I have almost no knowledge about business.

But this is fine because I am an expert at what I am doing, in one of these three dimensions, and I can still collaborate with a statistician or someone good at business, when I need.  I should not replace a statistician. And it would be wrong to ask a statistician to do my job of designing highly efficient algorithms, as it requires many years of programming experience and excellent knowledge of algorithmic .

A risk with the vision of the “data scientist unicornthat is good at everything is that it may imply that the person may not be an expert at any of those things.

Perhaps that a solution for training good data scientist is those new “data science” degrees that aim at teaching a little bit of everything. I would not say whether these degrees are good or not, as I did not look at these programs. But there is always the risk of training people who can do everything but are not expert at anything.  Thus, why not trying to instead build a strong data science team?

==

Philippe Fournier-Viger is a full professor  and also the founder of the open-source data mining software SPMF, offering more than 110 data mining algorithms.

If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

Posted in Big data, Data Mining, Data science | Leave a comment

Brief report about the ACM SAC 2016 conference

Last week, I have attended the ACM SAC 2016 conference (31st ACM Symposium on Applied Computing). It was held in Pisa, Italy from the 4th to the 8th April 2016. In this blog post, I will briefly discuss this conference.

ACM SAC 2016

ACM SAC 2016

About the SAC conference

It is not the first time that I attend the SAC conference. I had papers published in this conference in 2011, 2013, 2015 and now 2016. This conference is a conference where it is possible to submit papers in multiple tracks, where each track has its own topic. The topics are quite varied ranging from programming languages, to intelligent virtual learning environment to data mining.  Personally, I am particularly in the data mining and data stream tracks, which are somewhat competitive and publish some high quality papers.  Last year, for example, the acceptance rate of the data mining track for full papers was about 20 %.

One may think that a  drawback of this conference is that it is not as specialized as a conference on data mining, for example. However, it is also possible to see this as advantage, as it is allows to talk and exchange ideas with people from other fields. Also, an interesting aspect of this conference is that it is friendly toward research with applications (hence the name “symposium on applied computing”).

Also, this conference is a good place for meeting European researchers, as it was held in Europe. But it was quite international, with a great number of people coming from other continents. I have also had good discussions with some well-known researcher.

Finally, another good thing about this conference is that it is published by the ACM and the proceedings are indexe by DBLP and other important indexes in computer science.

SAC 2016

SAC 2016 – Opening

About the organization

The conference was well-organized in general. There might always be something to complain about. But in general the organizers did a very good job, as well as the student volunteers.

The location was also quite good. Italy has good weather and the people were friendly. Even had the occasion to take a few pictures with the leaning Pisa tower.

The data mining track

I have attended most of the talks from the data mining track.  The main topics were:  clustering, scalable algorithms for big data, graph sampling, associative classifiers, classification, itemset mining, and vizualization.

Me and my colleagues have presented two papers respectively on (1) a new algorithm for mining recent weighted frequent itemsets, and (2) a new efficient algorithm for mining closed high utility itemsets named EFIM-Closed.  For that later paper, I have received a nomination for the best poster award.

Nomination for the best poster award for the EFIM-Closed algorithm at SAC 2016

Nomination for the best poster award for the EFIM-Closed algorithm at SAC 2016

SAC 2017 conference

It was announced that the SAC 2017 conference will be held in Marrakech, Morocco, March 27 – 31, 2017.  It may be very interesting to attend this conference next year, for the reasons above, but also for the location, as there are not so often some big international conferences in Morocco.

ACM SAC 2017 conference

ACM SAC 2017 conference

That is all for day (it was just intended to be a short blog post)!

==

Philippe Fournier-Viger is a full professor  and also the founder of the open-source data mining software SPMF, offering more than 110 data mining algorithms.

If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

Posted in Big data, Conference, Data Mining, Research, Uncategorized | 3 Comments

News about SPMF

Some quick news about the SPMF project.

  • First, this month I have made a few updates to SPMF.  Two new algorithms have been added: USpan (for high-utility sequential pattern mining) and FCHM (for correlated high utility itemset mining). Moreover, I have fixed a few bugs, and also added a new window for visualizing the patterns found. Hope you will enjoy that new version of SPMF.
  • Second , this week I have read a nice blog post from a user of SPMF that I would like to share with you. The title is: “Sequential pattern mining, the initial attempts” from the Gigantic Data blog by Ryan Panos.  You may want to read it.

==

Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 110 data mining algorithms.

Posted in spmf | Tagged , , , , | Leave a comment

SPMF data mining library 0.98: new pattern visualization window

This blog post is to let you know that I have just published a new version of the SPMF open-source Java data mining library (0.98) that offers a new window for visualizing the patterns found by data mining algorithms. This window should be more convenient to use than the text editor for visualizing results as it offers sorting and filtering functions. Here is a picture:

This window is specifically designed for visualizing patterns found by pattern mining algorithms. But it also work with some algorithms such as clustering algorithms, also offered in SPMF.

How to access this new window? This window can be accessed when using the graphical interface of SPMF by selecting the checkbox “Open output file” “using SPMF viewer“. The new window show the patterns found by an algorithm in a table, and it let the user apply some filters to select patterns or to sort the patterns by ascending or descending orders using various measures such as support and confidence (depending on the algorithms) by clicking on the column headers.

This window for visualizing patterns should work with most algorithms offered in SPMF. If you find some bugs related to this new window for visualizing results, you may let me know. Also, if you have some ideas to improve, or want to participate to improve the user interface of SPMF, you may let me know.

Hope you will like it! Also, thanks again to all users of SPMF for your support.

==

Philippe Fournier-Viger is a professor and also the founder of the open-source data mining software SPMF, offering more than 100 data mining algorithms. Follow me on Twitter: @philfv

Posted in Data Mining, General, open-source, Research, spmf | Tagged , , , , , | Leave a comment

The ADMA 2015 conference has been cancelled

Recently I have submitted a few papers to the ADMA 2015 conference (11th conference on Advanced Data Mining and Applications, which was supposed to be held at Fudan university, China in January 2016 (despite being named 2015).

The website of ADMA 2015 conference  ( http://adma2015.fudan.edu.cn ) has been online since the end of October.

ADMA 2015 conference

The ADMA 2015 conference website

I had submitted two data mining papers to the conference using their EasyChair website, and some colleagues of mine also submitted papers:

ADMA2015 easychair

ADMA2015 easychair

According to the website, the deadline notification of the ADMA 2015 conference is supposed to be the 1st October. I sent an e-mail to the organizers on the 24th September to ask about the deadline and they replied that there would be at least a two weeks deadline extension.

ADMA 2015 email

Email from the ADMA 2015 conference organizers

Then,  the time passed and I did not receive any notification about the fate of my papers.

I have thus sent e-mails to the organizers (on the 28th October, 10th November, and 17th November) to ask when we would get the notification for our papers. But the organizers of the ADMA 2015 conference did not answer any e-mails from me or my colleagues.  I have tried the official e-mail address of the conference 11thadma@fudan.edu.cn and also directly the e-mail of the organizer. But no answer.

Now, we are the 30th November, and it seems that the website of the ADMA 2015 conference has been taken down.

ADMA 2015 website on November 30th 2015

I thus have to conclude that the conference has most likely been cancelled. But why?  And why not answering the e-mails, or letting us know ?

It is a pity because I actually enjoyed the previous ADMA conferences.

If I receive further news about what is happening, I will update the blog post. Hopefully, we will know soon what is happening with the papers that have been submitted.

Update in January 2016: It is clear that the conference has been cancelled, although the organizers never bothered to inform the authors or answer their e-mails about the status of the conference. This is really unprofessional.

Update in 2017: After the failed ADMA 2015 conference,  the ADMA conference has been back in 2016 with a conference in Australia. That conference was not organized by  the organizers of the failed ADMA 2015 conference, and has been a success to my knowledge. So I am looking forward to ADMA 2017. It is also interesting that the website of the failed ADMA 2015 conference is suddenly back online: http://ndbc2011.fudan.edu.cn/

Posted in Big data, Conference, Data Mining | 5 Comments

Interview with the SPMF library founder

Today, I will just write a short blog post to let you know that I was recently interviewed on Rahaman’s blog.  The interview talks about various topics such as (1) why creating the SPMF data mining library, (2) why choosing to work in academia instead of in the industry, (3) what are the skills required to be a successful researcher,  and (4) how to improve writing skills for research. You can read the interview here:

http://rahablog.com/index.php/2015/10/10/interview-with-dr-philippe-fournier-viger/

interview

==

Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 100 data mining algorithms. Follow me on Twitter: @philfv

 

Posted in Big data, Data Mining, Data science, Research, spmf | Tagged , , , | 2 Comments

200,000 visitors on the SPMF website!

Today, I will just write a short blog post to mention that the SPMF open-source data mining library has recently passed the milestone of 200,000 visitors.  This is possible thanks to the support of all users of SPMF, and the contributors who have provided source code, suggested bug reports, and more. Here is a chart showing the growing popularity of SPMF.

visitor_spmf

Thanks for your support!

Philippe Fournier-Viger, Ph.D.
Founder of the SPMF data mining library

Posted in Data Mining, Data science, open-source, Research, spmf | Tagged , , , , , | Leave a comment

Top 5 Data Mining Books for Computer Scientists

I have often been asked what are some good books for learning data mining. In this blog post, I will answer this question by discussing some of the top data mining books for learning data mining and data science from a computer science perspective.  

These books are especially recommended for those interested in learning how to design data mining algorithms and that wants to understand the main algorithms as well as understand some more advanced topics.

  1. “Introduction to data mining” by Tan, Steinbach & Kumar (2006)
introduction to data mining

This book is a very good introduction book to data mining that I have enjoyed reading . It discusses all the main topics of data mining: clustering, classification, pattern mining and outlier detection. Moreover, it contains two very good chapters on clustering by Tan & Kumar, which are specialists in this domain.  What I like about this book is that the chapters explain the techniques with enough details to have a good understanding of the techniques and their drawbacks unlike some other books that do not go into details.  Some free sample chapters of the book can be found here.  Before buying this book, note that a 3rd edition has been announced to be released soon, although it has been delayed for more than  a year.

2. Data Mining: Concepts and Techniques, Third Edition by Han, Kamber & Pei (2013)

han & kamber book

This book is another great book that I like. I have also used it for teaching data mining. It covers all the main topics of data mining that a good data mining course should covers, as the previous book. However, this book is more like an encyclopedia. It covers a lot of topics and give a very broad view of the field but does not cover each topics in much details. It is also designed for a computer scientist audience. Besides, it is also written by some top data mining researchers (Han & Pei).

3. Data Mining and Analysis Fundamental Concepts and Algorithms  by Zaki & Meira (2014)

book zaki

This is another great data mining book written by a leading researcher (Zaki) in the field of data mining.  It also target computer scientist. This books covers all the main topics of data mining but also has some chapters on some advanced topics such as graph mining, which are very interesting.  A version of the book that can be used for personal use only is offered freely here.  The algorithms in this books are very detailed and it is possible to implement them by reading the book. In general, some algorithms are presented in each chapter. They are not always the best algorithms but are often the most popular (the classical algorithms).

4.  Data Mining: The Textbook  by Aggarwal (2015)

book aggarwal

This is probably one of the top data mining book that I have read recently for computer scientist. It also covers the basic topics of data mining but also some advanced topics. Moreover, it is very up to date, being a very recent book. It is also written by a top data mining researcher (C. Aggarwal). It also covers many recent and advanced topics such as time series, graph mining and social network mining, not covered in several other books.

book freidman

5. “The Elements of Statistical Learning” by Freidman et al (2009)
This is aquite popular book a little bit more focused toward statistics. It covers both many data mining techiques such as Neural networks, association rule mining, SVM, regression, clustering and other topics.  What is interesting about this book is that it is a top book used in many university courses like the others and can be downloaded for free here.

Conclusion

In this blog post, I have discussed some of the top books for learning data mining algorithms for computer scientists.  I have tried to discuss about general books that gives a good foundation for learning data mining and that can also be interesting for advanced topics. However, note that if one is interested in specific topics such as recommender systems and text mining, there also exists some specialized books that covers only these topics in details, that may also be interesting.

==

That is all I wanted to write for now. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 52 data mining algorithms.

Posted in Data Mining, Research | 3 Comments

How to design memory-efficient data mining algorithms in Java?

A while ago, I had written a blog post about How to measure the memory usage of algorithms in Java. Today, I will discuss the topic of optimizing the memory usage of algorithms written in Java to design memory-efficient data mining algorithms. The reason for writing about this topic is that  memory consumption is often a challenge in Java for implementing data mining algorithms.

memory efficiency

Brief overview of Java memory management

I will first give a brief overview of Java memory management model. In Java unlike in many other language such as C++, the user generally does not have the power to finely control how the memory is managed. The user can allocate memory by create some objects or variables. However, there is no simple way to control when the memory will be freed. In Java, the Garbage Collector (GC) is the process responsible for automatically freeing the memory. Using a GC has the advantage of making programming in Java easier and to avoid memory leaks and other memory related problems. However, using a GC makes the the memory usage much less predictable. The reason is that there  is absolutely no guarantee about when the GC will free the memory used by a Java program. The GC periodically checks references to objects and when an object is not referenced anymore, it may be freed. There is a common myth that if someone calls System.gc() the garbage collector will immediately free the memory. However it is not the case.

So how to optimize memory usage in Java?

There many ways to optimize memory usage.  I will discuss a few principles for optimizing memory usage in Java below, and I will then provide a detailed example with some Java code.

1) Using memory-efficient data structures. A first principle for optimizing memory usage is to use memory efficient data structures when possible.  For example, one may consider using an array of integers instead of an ArrayList because ArrayList introduces some memory overhead. Another example: one may uses int instead of Integer.  But there is sometimes a trade-off between memory usage and execution time. Thus, one should not just think about memory when choosing a data structure but also about execution time.

2) Reducing object creation. An important thins to know in Java is that garbage collection is VERY costly. In particular, if a Java program reaches the memory limit of Java, then the program may suddenly become very slow because of the GC (see my previous blog post explaining this issue). Thus, a very good strategy to optimize Java algorithms is to design the algorithms such that variables/objects  are reused as much as possible (when it makes sense) rather than creating new variables/objects. If less variables/objects are created, then less memory will be used and the GC will have to work less, which may also improves speed For example, imagine a for loop that is repeatedly creating objects. It is sometimes possible to declare a single variable/object outside the for loop and reuse the same object. Again, whether it should be done or not depends on the overall picture. In general, one should focus on optimizations that are meaningful and not do micro-optimizations. Moreover, performing optimizations should ideally not decrease the code readability or maintainability.

A variable or object that is reused can be called a buffer object.

A detailed example.

I will now present a detailed example showing how the two above principles can be used to improve the memory efficiency of some code. The example that I will provide is abstract and can be applied to most depth-first search pattern mining algorithms.  The solution that I will present was applied data mining algorithm implementations of the SPMF data mining library written in java..

Consider the following code. A List of Integer is first created. Then a recursive method is called. This recursive methods copy the list, add an element to the list and then recursively call itself. This method is not memory efficient since every time that it is called it will create a new List object.  This can be quite slow because of object creation time. But moreover, the GC will have to free all these objects, which will also slow down the program.

static public void  main(String[] args){
		List<Integer> list = new ArrayList<Integer>();
		recursiveMethod(list);
	}


	static public void recursiveMethod(List<Integer> list) {
		// make a copy of the list
		List<Integer> copyOfList = new ArrayList<Integer>();
		copyOfList.addAll(list);
		

		// Add some new integer to the list
		int integer = ... ;
		// ...
		copyOfList.add(integer);
		
		//.... the method is called recursively in a for loop
		recursiveMethod(copyOfList);
	}

Now, let’s look at a better solution, below.  First, instead of using a List of Integer, an array of integer is used. This is already better in terms of memory since it is a more simple data structure. Second, the array of integers is used as a buffer object. Instead of repeatedly creating List objects, the same buffer is always reused.  The buffer is initialized with a large enough size (for example 500 entries).  This version of the code is much faster because (1) it is not necessary to always create objects, (2) we don’t copy list anymore, (3) a single item is copied for each recursive call!

static public void  main(String[] args){
		int[] buffer = new int[500];
		int currentSize = 0;
		recursiveMethod(buffer, currentSize);
	}


	static public void recursiveMethod(int[] buffer, int bufferSize) {
		// Add some new integer to the list
		int integer = ...;
		buffer[bufferSize++] = integer;
		
		//....  the method is called recursively in a for loop
		recursiveMethod(buffer, bufferSize);
	}

The above solution is extensively used in  algorithm implementations of the SPMF data mining library. In some cases, this allowed to reduce memory usage by half.

Conclusion

In this blog post, I have discussed the problem of designing memory-efficient algorithms in Java. I have presented a few principles and then presented a detailed example of how to optimize data mining algorithms in Java. Hope you have enjoyed that post. In future blog post, I will discuss more examples of memory optimizations, if there is enough interest on this topic!

==

That is all I wanted to write for now. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 52 data mining algorithms.

Posted in Uncategorized | Leave a comment

The SPMF data mining library: a brief history and what’s next?

In this blog post, I will talk about the well-known open-source library of data mining algorithms implemented in Java, which I am the founder of. I will give a brief overview of its history, discuss some lessons learned from the development of this library, and then give a glimpse of what’s next for the development of the library.

A brief history of SPMF

The first version of this library was designed at the end of 2008 as a term project for a data mining course during my Ph.D. at University of Quebec in Montreal. At that time, I had implemented about five algorithms such as Apriori and AprioriClose. The code was not so great and there was no website. And it was just an unnamed project. 😉

Then, in 2009, I started to work on implementing and developing new sequential pattern mining algorithms for my Ph.D. project, and to add them to the same project. I added several algorithms such as PrefixSpan and BIDE. I then launched the SPMF website during the summer of 2009, and choose the name SPMF for the project. At that time, the website had few information. It just provided a few instructions about how to download the library and use it.

Over the years, I have added much more algorithms to the librayr. There are now more than 90 algorithms offered in SPMF.  I have implemented many of them in my spare time, some of them for my research, some of them just for my personal satisfaction, and also several contributors have provided source code of algorithms for the library, and have reported bugs, and suggestions, which have also greatly helped the project. I have also added a user graphical interface and command line interface to SPMF in the last few years.

SPMF user interface

The SPMF graphical user interface

The source code of SPMF has been quite improved over the year. Originally, there was a lot of duplicated code in the project. In the years 2012-2013, I have made a major refactoring of the source code that took about 1 month. I removed as much duplicated code as possible. As a result, the number of source code files in the project was reduced by 25 %, the number of lines of code was reduced by 20 %. Moreover, I added about 10,000 lines of comments during this refactoring.  In the last two years, I have also added several optimizations to the source code of SPMF because some code written in the early year was not really optimized as I did not have enough experience implementing data mining algorithms.

Since then, SPMF has become quite popular. It is an especially important library in the field of pattern mining (discovering patterns in databases). The number of visitors on the website recently reached 190,000. Moreover, SPMF was cited or used in about 190 research papers in the last few years, which is awesome. Here is a brief overview about the number of visitor on the website:SPMF visitor count

The lessons learned

From the SPMF project, I have learned a few general lessons about providing an open-source project.

  • It is important to make a high-quality documentation of how to use the library. If there is no appropriate documentation on the website, then users will always ask questions about how to do this or do that, and the developers will spend a lot of time to answer these questions. The users will also be less likely to use the library if it is too complicated to use. On the contrary, if a good documentation is provided, then most users will find answers in it. Thus the reviewers will spend less time always answering the same questions and users are more likely to use the software. Over the years, I have updated the website so that it provides information for the most common questions. I have also added a developpers’s guide, a documentation of how to use each algorithm, etc. to try to make the software as easy to use a possible.
  • The code should follow standard conventions and be well-documented. To make an open-source project easily reusable and understandable by other users, the code should contain a good amount of comments, be well-structured, and follow commonly used conventions. For example, in Java, there are standard conventions for writing code and documenting code with Javadoc. In SPMF, I have tried to follow these conventions as much as possible. As a result, several users have said to me that the code of SPMF is very easy to understand. It is important to write good code. I understand that many programmers may not like to document their code, but it is important to do it as it makes it much more understandable for users.
  • It is important to choose an appropriate license for an open-source project. I originally choose theCreative Common License for SPMF in 2009. But I then noticed that it was rarely used for licensing software. I thus then read about several licenses and choose the more commonly used GPL, which I prefers.
  • Listen to the users. It is important to listen to what users need in terms of features. This gives a good indication of what should be included in the software in the next releases. If many users request a specific feature, it is probably very important to provide it.

What’s next?

So what is next for SPMF?  I intend to continue developing this library for at least several years 😉

I have currently implemented several new algorithms that have not yet been released such as: FOSHU, d2Hup, USpan, TS-Houn, HUP-Miner, GHUI-Miner, HUG-Miner mainly for high-utility pattern mining. Also my students have implemented several others for sequence prediction and pattern mining such as: CPT+, CPT, DG, TDAG, AKOM and LZ78, and EFIM and HUSRM. All these algorithms should be released soon in SPMF.  I think that several of them may be released in a new major release in September of October. Thus, SPMF should reach the milestone of 100 algorithms before the end of 2015.

Other improvements that I would like to add in the future are to handle more file types as input. For example, it would be great to add a module for converting text files to sequences for sequential pattern mining. Another idea is to add visualization capabilities. Currently, the results of most algorithms offered in SPMF are presented as text files to the user. It would be great to add some visualization modules. Another idea is to add some modules for automatically running experiments for comparing algorithms. This is especially useful for data mining researchers that wish to compare the performance of data mining algorithms.

For the future, I also hope that more collaborators will provide source code to the project. Several researchers have used SPMF in their projects but not many have given back source code to the project. It would be great if more users could provide source code when proposing new algorithms. This would greatly helps the project. If more students or professors would like to contribute to the project, it would be also very welcome.

Also, another important aspect to help the project is to cite the SPMF project in your papers if you have been using it in your research.  It should be preferably cited as follows:

Fournier-Viger, P., Gomariz, Gueniche, T., A., Soltani, A., Wu., C., Tseng, V. S. (2014). SPMF: a Java Open-Source Pattern Mining Library. Journal of Machine Learning Research (JMLR), 15: 3389-3393.

Lastly, I would like to say thank you to everyone who has supported the SPMF library over the years either by contributing code, reporting bug, using the software and citing it. This is great!

This is all for today. I just wanted to discuss the current state of SPMF and what is next. Hope that you enjoyed reading this blog post. If you want to get notified of future blog posts, you may follow my twitter account @philfv.

Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 80 data mining algorithms.

Posted in Data Mining, open-source, Programming, Research, spmf | Tagged , , , | Leave a comment