The Data Blog

Brief report about the SMARTDSC 2022 conference

Posted on 2022-07-20 by Philippe Fournier-Viger

This week, besides IEA AIE 2022, I am also participating to the SMARTDSC 2022 conference (5th international conference on Smart Technologies in Data Science and Communication) as general co-chair and keynote speaker. I will give a brief report about this conference in this post.

What is SMART-DSC?

SMARTDSC is a conference organized by the KL (deemed to be) University in India in collaboration with several international researchers. This is the fifth edition of the conference. The conference focuses on data science, communication and smart technologies and the quality is good. This year, over 150 papers have been received and less than 20% have been accepted for oral presentation, which makes this conference competitive. The proceedings are also published by Springer, which ensures indexing and a good visibility for papers.

The accepted papers are from oven ten different states in India and also from 5 other countries. There is also an excellent line-up of eight keynote speakers for the conference from various countries including Turkey, Egypt, China, France, and Malaysia.

The first keynote talk was by Shumaila Javaid affliated to Shanghai Research Institute for Intelligent Autonomous Systems in China. The talk was about medical sensors and their integration for pervasive healthcare. This was a quite interesting topic has it has the real-life applications that may change lives.

Then, I gave a keynote talk about the automatic discovery of interesting patterns in data. Here are a few slides of my talk where I introduced various topics.

There was then several paper presentations followed by other keynote talks. I will try to add more details about these presentations later, in this blog post. The SMART DSC conference is held for three days. I am attending the conference at different moments during these three days as I have to participate to two conferences at the same time (SMARTDSC 2022 and IEA AIE 2022).

On overall, SMARTDSC 2022 is an interesting conference. It is especially great for participants in India for the convenience of travelling but it is also international with several participants, and keynote speakers from abroad. I am happy to participate to it.

—
Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Conference, Data Mining, Data science | Tagged communication, conference, data mining, data science, india, smart technologies, smartdsc, smartdsc2022 | Leave a comment

Brief report about the IEA AIE 2022 conference

Posted on 2022-07-19 by Philippe Fournier-Viger

This week, I am attending the 35th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems conference (IEA AIE 2022). I will give a brief report about the conference

Program

This year, the conference received 127 paper submissions, from which 65 have been accepted as full papers, and 14 as short papers. All the papers have been reviewed by at least 3 members from the program committee.

Opening ceremony

The IEA AIE 2022 conference was held in Japan in hybrid mode. I think the majority of attendants were online but there was still many people attending in person. On the first day, there was the opening ceremony. The conference was introduced including the program and other aspects.

Paper presentations

There was several paper presentations, covering many different topics such as: industrial applications, health informatics, optimization, video and image processing, natural language processing, agent and group-based systems, pattern recognition, security.

Here is screenshots from some presentations, that I have attended.

This is a paper about air pollution, which use image processing combined with a periodic pattern mining algorithm to obtain good detection:

Below is a paper from my collaborators about parallel high utility itemset mining based on Spark. In that paper some good results are obtained where a parallel version of EFIM and d2HUP provides some good speed-up (up to 20 times) over the sequential versions of those algorithms for mining high utility itemsets.

There was also an interesting paper about weighted sequential pattern mining in uncertain data:

There was also many other papers that I have listened too. I will not report on all of them.

Keynote talks

At IEA AIE 2022, there was two keynote talks. The first keynote was by Prof. Tao Wu from Shanghai University of Medicine & Health Sciences about health informatics.

The second keynote talk was by Prof. Sebastian Ventura from University of Cordoba, Spain about Improving Predictive Maintenance with Advanced Machine Learning. He talked about how to build models and systems to prevent failure from happening in industrial systems by doing maintenance in advance (predictive maintenance – PdM). He explained that various techniques can be used such as for outlier detection and classification. Prof. Ventura told that he is doing a project for the maintenance of military vehicles. Following the talk, there was a good discussion with conference participants. Prof. Ventura explained that building simple models is good but it is not necessarily the most important. A complex model can be acceptable if it is explainable. In fact, he said that it is more important to have explainable models because in real-applications, models often need to be verified by domain experts. Here are a few slides from that talk about the introduction:

Here are some slides about potential data mining techniques that can be used:

And here are some techniques that have been used in the specific project for predictive maintenance of vehicles:

Here are some challenges and open problems and the conclusion from the talk:

If you are interested by this topic, you may also check the survey paper published recently by Prof. Ventura:

A. Esteban, A Zafra & S. Ventura. Data Mining in Predictive Maintenance Systems. WIREs DMKD. https://doi.org/10.1002/widm.1471

Conclusion

That’s all for this. You can see some of my reports about previous editions of that conference here: IEA AIE 2016, IEA AIE 2018, IEA AIE 2019 , IEA AIE 2020, and IEA AIE 2021.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in artificial intelligence, Conference, Machine Learning | Tagged applications, applied artificial intelligence, artificial intelligence, conference, iea aie, ieaaie2022, machine learning | 4 Comments

How to draw subgraphs side-by-side in Latex? (with TIKZ)

Posted on 2022-07-09 by Philippe Fournier-Viger

Today, I will give an example of how to draw a figure containing three subgraphs that appear side by side in Latex using the TIKZ library, and where each subgraph has a caption. This can be useful when writing research papers, where we want to discuss different types of subgraphs.

The result will be like this:

And here is the Latex code:

\documentclass{article} 
\usepackage{caption}
\usepackage{subcaption}
\usepackage{tikz}
\usetikzlibrary{automata,arrows,positioning,calc}

\begin{document}


\begin{figure}
  \begin{subfigure}[b]{0.30\textwidth}
    \centering
      \begin{tikzpicture}[> = stealth,  shorten > = 1pt,   auto,   node distance = 1.5cm]
        \node[state] (v) {$A$};
        \node[state] (w) [right of=v] {$B$};
        \node[state] (t) [right of=w] {$C$};

		\path[->] (v)  edge node {0} (w);
		\path[->] (w) edge   node  {1}(t);
      \end{tikzpicture}
    \caption{Subgraph 1}
  \end{subfigure}
  \begin{subfigure}[b]{.30\textwidth}
    \centering
      \begin{tikzpicture}[> = stealth,  shorten > = 1pt,   auto,   node distance = 1.5cm]
        \node[state] (x) {$E$};
        \node[state] (y) [below of=x] {$F$};
        \node[state] (n) [right of=x] {$G$};
        \node[state] (z) [below of=y] {$H$};


		\path[->] (x)  edge node {0} (y);
		\path[->] (x)  edge node {0} (n);
		\path[->] (n)  edge node {0} (z);
		\path[->] (y) edge   node  {1}(z);
      \end{tikzpicture}
    \caption{Subgraph 2}
  \end{subfigure}
  \begin{subfigure}[b]{.30\textwidth}
    \centering
      \begin{tikzpicture}[> = stealth,  shorten > = 1pt,   auto,   node distance = 1.7cm]
        \node[state] (g) {$A$};
        \node[state] (h) [above of=g] {$B$};
        \node[state] (e) [above right= 0.3 cm and 0.3 cm of g] {$C$};
		\path[->] (g)  edge node {} (e);
		\path[->] (g)  edge node {} (h);
		\path[->] (h) edge   node  {}(e);
      \end{tikzpicture}
    \caption{Subgraph 3}
  \end{subfigure}
\caption{Three subgraphs}
\end{figure}

In this code, I use the automata package of TIKZ, which is great for drawing graphs. You could also use other packages and tweak the above example.

Hope this is useful.

—
Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Latex | Tagged figure, graph, latex, subgraph, tikz | 1 Comment

(videos) Introduction to sequential rule mining + the CMRules algorithm

Posted on 2022-07-07 by Philippe Fournier-Viger

I have made two new videos to explain interesting topics about pattern mining. The first video is an introduction to sequential rule mining, while the second video explains in more details how the CMRules algorithm for sequential rule mining works!

You can watch the videos here:

An Introduction to Sequential Rule Mining (pdf / ppt / video – 33 min )
The CMRules algorithm (pdf / ppt / video – 32 min)

And you can also find them on my Youtube Channel.

If you want to try these algorithms, you can check the SPMF open-source software, which offers fast implementations of these algorithms.

Hope you will enjoy the videos. I will make more videos about pattern mining soon. By the way, you can also check my website about The Pattern Mining course. It gives videos and slides for a free online course on pattern mining. It explains all the main topics about pattern mining and is good for students who are starting to do research in this area. But this course is in beta version, which means that I am still updating it. More videos and content will be added over time.

—
Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Data Mining, Pattern Mining, Video | Tagged cmrules, data mining, pattern mining, sequential rule mining, video | 2 Comments

(video) Periodic Pattern Mining

Posted on 2022-07-06 by Philippe Fournier-Viger

Hi all, this is to let you know that I have made another video to explain some interesting pattern mining topics. This time, I will talk about periodic pattern mining.

You can watch the video here: (pdf / ppt / video – 34 min)

This video is part of the free online course on pattern mining.

Hope that it is interesting!

—
Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Pattern Mining | Tagged data mining, data science, pattern mining, periodic itemset, periodic pattern, periodic pattern mining, pfpm, video | Leave a comment

Drawing the Powerset of a Set using Latex and TIKZ (Hasse Diagram)

Posted on 2022-06-19 by Philippe Fournier-Viger

Today, I will show how to draw the powerset of a set using Latex and TIKZ, to produce some nice figures that can be used in paper written using Latex. The code that is shown below is adapted from code from StackOverflow.

First, we will draw the powerset of the set {a,b} as a Hasse diagram:

\documentclass{article}
\usepackage{tikz-cd}
\begin{document}

\begin{tikzpicture}
    \matrix (A) [matrix of nodes, row sep=1.2cm]
    { 
        & $\{a,b\}$ \\  
	$\{a\}$ &  & $\{b\}$\\
        & $\emptyset$ \\
    };
    \draw (A-1-2)--(A-2-1);
    \draw (A-1-2)--(A-2-3);
    \draw (A-2-1)--(A-3-2);
    \draw (A-2-3)--(A-3-2);
\end{tikzpicture}
\end{document}

Then, I will show how to draw the powerset of the set {a,b,c}:

\documentclass{article}
\usepackage{tikz-cd}
\begin{document}

\begin{tikzpicture}
    \matrix (A) [matrix of nodes, row sep=1.2cm]
    {
        $\{a,b\}$ & $\{a,c\}$ & $\{b,c\}$ \\
        $\{a\}$ & $\{b\}$ & $\{c\}$ \\
        & $\emptyset$ \\
    };
    \path (A-1-1)--(A-1-2) node[above=1.2cm] (link) {$\{a,b,c\}$};
    
    \foreach \i in {1,...,3}
    \draw (link.south) -- (A-1-\i.north);
    \foreach \i/\j in {1/2, 3/2, 2/1, 1/1, 3/3, 2/3}
    \draw (A-1-\i.south)--(A-2-\j.north);
    \foreach \i/\j in {1/2, 2/2, 3/2}
    \draw (A-2-\i.south)--(A-3-\j.north);
\end{tikzpicture}

\end{document}

Then, I will show how to draw the powerset of the set {a,b,c,d}:

\begin{tikzpicture}
    \matrix (A) [matrix of nodes, row sep=1.5cm]
    { 
       ~ &  ~ & $\{a,b,c,d\}$ \\  

	~ &  $\{a,b,c\}$ & $\{a,b,d\}$  & $\{a,c,d\}$ & $\{b,c,d\}$\\
	$\{a,b\}$ & $\{a,c\}$  & $\{a,d\}$ & $\{b,c\}$ & $\{b,d\}$ & $\{c,d\}$\\
	~ &  $\{a\}$ & $\{b\}$  & $\{c\}$ & $\{d\}$\\
      ~ & ~ &  $\emptyset$ \\
    };
    \draw (A-1-3.south)--(A-2-2.north);
    \draw (A-1-3.south)--(A-2-3.north);
    \draw (A-1-3.south)--(A-2-4.north);
    \draw (A-1-3.south)--(A-2-5.north);

    \draw (A-2-2.south)--(A-3-1.north);
    \draw (A-2-2.south)--(A-3-2.north);
    \draw (A-2-2.south)--(A-3-4.north);

    \draw (A-2-3.south)--(A-3-1.north);
    \draw (A-2-3.south)--(A-3-3.north);
    \draw (A-2-3.south)--(A-3-5.north);

    \draw (A-2-4.south)--(A-3-2.north);
    \draw (A-2-4.south)--(A-3-3.north);
    \draw (A-2-4.south)--(A-3-6.north);

    \draw (A-2-5.south)--(A-3-4.north);
    \draw (A-2-5.south)--(A-3-5.north);
    \draw (A-2-5.south)--(A-3-6.north);

    \draw (A-3-1.south)--(A-4-2.north);
    \draw (A-3-1.south)--(A-4-3.north);

    \draw (A-3-2.south)--(A-4-2.north);
    \draw (A-3-2.south)--(A-4-4.north);

    \draw (A-3-3.south)--(A-4-2.north);
    \draw (A-3-3.south)--(A-4-5.north);

    \draw (A-3-4.south)--(A-4-3.north);
    \draw (A-3-4.south)--(A-4-4.north);

    \draw (A-3-5.south)--(A-4-3.north);
    \draw (A-3-5.south)--(A-4-5.north);

    \draw (A-3-6.south)--(A-4-4.north);
    \draw (A-3-6.south)--(A-4-5.north);

    \draw (A-4-2.south)--(A-5-3.north);
    \draw (A-4-3.south)--(A-5-3.north);
    \draw (A-4-4.south)--(A-5-3.north);
    \draw (A-4-5.south)--(A-5-3.north);
\end{tikzpicture}

Here is another diagram that I have done but this time using the TIKZ automata package instead:

Here is the code:

\documentclass{article} 
\usepackage{caption}
\usepackage{subcaption}
\usepackage{tikz}  
\usetikzlibrary{automata,arrows,positioning,calc}

\begin{document}


\begin{figure}\centering
\resizebox{\columnwidth}{!}{
       \begin{tikzpicture}[> = stealth,  shorten > = 1pt,   auto,   node distance = 3cm,state/.style={circle, draw, minimum size=1.8cm}]

%%% LEVEL 1
        \node[state] (x) {$\emptyset$};
%%% LEVEL 2
        \node[state] [below left of=x] (B) {$\{ B \}$};
        \node[state][below right of=x]  (C) {$\{ C \}$};
        \node[state] [left  of=B] (A) {$\{ A \}$};
        \node[state][right of=C]  (D) {$\{ D \}$};
%%%% LEVEL 3
        \node[state][below of=B]  (AD) {$\{ A,D \}$};
        \node[state][below of=C]  (BC) {$\{ B,C \}$};
        \node[state][below of=D]  (BD) {$\{ B,D \}$};
        \node[state][below of=A]  (AC) {$\{ A,C \}$};
        \node[state]  [left  of=AC] (AB){$\{ A,B \}$};
        \node[state]  [right of=BD] (CD){$\{ C,D \}$};

%%  LEVEL   4
        \node[state][below of=AD]  (ABD) {$\{ A,B,D \}$};
        \node[state][below of=BC]  (ACD) {$\{ A,C,D \}$};
        \node[state][left of=ABD]  (ABC) {$\{ A,B,C \}$};
        \node[state][right of=ACD]  (BCD) {$\{ B,C,D \}$};

%%%% LEVEL 5
\node[state][below right of=ABD]  (ABCD) {$\{ A,B,C,D \}$};


%% LEVEL 1 to LEVEL 2
		\path[->] (x)  edge node {} (A);
		\path[->] (x)  edge node {} (B);
		\path[->] (x)  edge node {} (C);
		\path[->] (x)  edge node {} (D);

%% LEVEL 2 to LEVEL 3
		\path[->] (A)  edge node {} (AB);
		\path[->] (A)  edge node {} (AC);
		\path[->] (A)  edge node {} (AD);
		\path[->] (B)  edge node {} (AB);
		\path[->] (B)  edge node {} (BC);
		\path[->] (B)  edge node {} (BD);
		\path[->] (C)  edge node {} (AC);%
		\path[->] (C)  edge node {} (BC);
		\path[->] (C)  edge node {} (CD);
		\path[->] (D)  edge node {} (AD);
		\path[->] (D)  edge node {} (BD);
		\path[->] (D)  edge node {} (CD);
%%% LEVEL 3 TO 4

		\path[->] (AB)  edge node {} (ABC);
		\path[->] (AB)  edge node {} (ABD);
		\path[->] (AC)  edge node {} (ABC);
		\path[->] (AC)  edge node {} (ACD);
		\path[->] (AD)  edge node {} (ABD);
		\path[->] (AD)  edge node {} (ACD);
		\path[->] (BC)  edge node {} (ABC);
		\path[->] (BC)  edge node {} (BCD);
		\path[->] (BD)  edge node {} (ABD);
		\path[->] (BD)  edge node {} (BCD);
		\path[->] (CD)  edge node {} (ACD);
		\path[->] (CD)  edge node {} (BCD);
%%%% LEVEL 4 to 5
		\path[->] (ABC)  edge node {} (ABCD);
		\path[->] (ABD)  edge node {} (ABCD);
		\path[->] (ACD)  edge node {} (ABCD);
		\path[->] (BCD)  edge node {} (ABCD);
%%%% Dashed lines
\draw[dashed] (-10,-1) -- (9,-1); 
\draw[dashed] (-10,-4) -- (9,-4); 
\draw[dashed] (-10,-6) -- (9,-6); 
\draw[dashed] (-10,-9) -- (9,-9); 
\node[] at (-10,-1.5) {level 1};
\node[] at (-10,-4.5) {level 2};
\node[] at (-10,-6.5) {level 3};
\node[] at (-10,-9.5) {level 4};
      \end{tikzpicture}
}
\caption{The caption}
\end{figure}

\end{document}

That is all for today. Hope this will be useful.

Note that I had also written a blog post about how to draw the powerset of a set using Java and GraphViz instead of using Latex.

Posted in Latex | Tagged hasse diagram, latex, powerset, tikz | Leave a comment

A New Tool for Running Performance Comparison of Algorithms in SPMF 2.54

Posted on 2022-06-14 by Philippe Fournier-Viger

Today, I am happy to announce a cool new feature in SPMF 2.54, which is a tool to automatically run performance experiments to compare several algorithms when a parameter is varied. This is a useful feature to compare algorithms when writing a research paper. The new tool to do performance experiments let you choose oneor more algorithms, indicate default parameter values and that a parameter must be varied. A time-out can be specified to avoid running algorithms for too long. Besides, each algorithm execution is done in a separate Java virtual machine to ensure that the comparison is always fair (e.g. memory will not accumulate from previous algorithm executions). When running the tool, the results can also be easily exported to Excel and Latex to generate charts for research papers. This tool can save a lot of time for performing experimental comparisons of algorithms!

Briefly the main way to use the new tool is to run the graphical user interface of SPMF and then select the tool from the list of algorithms:

This will open-up a new window, where we can configure the parameters for running the experiment:

For example, without going into details, here we choose five algorithms called Apriori, Eclat, FPGrowth_itemsets, FPClose and FPMax. We also select an input file called Chess.text on which we will run the algorithms and we select a directory called EXPERIMENTS to save all the results that will be generated to files. We also say that the algorithms have one parameter which will be varied (We indicate that a parameter is varied using a special code ##), and the parameter will be varied from 0.95 to 0.80. We say that the time limit for each execution will be 10 seconds, and we select the option that we want not only to compare time and memory usage but also the number of patterns (lines in the output) and save all results as Latex figures, as well.

Then we click “Run the experiments” to run the experiments. We will get summarized results as shown below for execution time, memory and number of patterns:

TIME (S)
parameter 0,95 0,94 0.93. 0,92 0,91 0,9 0,89 0,88 0,87 0,86 0,85 0,84 0,83 0,82 0,81 0,8
Apriori 0,39 0,4 0,23 0,48 0,52 0,59 0,67 0,75 0,93 1,25 1,52 1,94 2,33 2,48 2,87 3,43
Eclat 0,53 0,61 0,32 0,58 0,63 0,71 0,75 0,81 0,88 0,86 0,92 1,07 1,11 1,27 1,41 1,47
FPGrowth_itemsets 0,77 0,59 0,39 0,55 0,57 0,53 0,6 0,65 0,63 0,54 0,55 0,62 0,6 0,6 0,62 0,69
FPClose 0,61 0,57 0,41 0,67 0,63 0,66 0,65 0,54 0,62 0,59 0,72 0,64 0,63 0,72 0,69 0,73
FPMax 0,67 0,51 0,33 0,5 0,59 0,56 0,58 0,59 0,56 0,58 0,59 0,59 0,55 0,72 0,68 0,58

MEMORY (MB)
parameter 0,95 0,94 0.93. 0,92 0,91 0,9 0,89 0,88 0,87 0,86 0,85 0,84 0,83 0,82 0,81 0,8
Apriori 18,88 18,88 18,88 18,88 18,88 19,32 19,76 19,76 19,76 20 20,44 21,32 21,32 22 3,34 3,36
Eclat 19,92 27,93 27,93 26,76 11,96 38,9 72,43 110,26 154,03 94,25 141,23 91,27 176,8 71,23 15,29 172,98
FPGrowth_itemsets 6,75 7,43 7,43 7,43 7,44 7,44 8,11 8,1 8,78 9,44 10,1 10,77 11,44 12,78 14,12 15,45
FPClose 6,76 7,43 7,43 7,43 7,43 7,43 8,1 8,77 8,76 9,43 10,1 11,43 12,11 12,1 13,45 15,45
FPMax 6,76 7,44 7,44 7,42 7,43 7,43 7,44 7,43 8,1 8,1 8,77 9,44 9,43 9,44 10,11 10,78

OUTPUT_SIZE (LINES)
parameter 0,95 0,94 0.93. 0,92 0,91 0,9 0,89 0,88 0,87 0,86 0,85 0,84 0,83 0,82 0,81 0,8
Apriori 77 139 -1 305 421 622 883 1195 1553 1987 2669 3484 4243 5312 6656 8227
Eclat 77 139 -1 305 421 622 883 1195 1553 1987 2669 3484 4243 5312 6656 8227
FPGrowth_itemsets 77 139 -1 305 421 622 883 1195 1553 1987 2669 3484 4243 5312 6656 8227
FPClose 73 124 -1 269 362 498 689 922 1183 1457 1885 2400 2883 3487 4216 5083
FPMax 11 18 -1 26 30 34 51 75 69 89 119 145 133 163 221 226

This was generated in just a few seconds. If we would have run these experiments by hand, it would have took us a lot more time. Now, after we have these results, since they are tab-separated we can directly import them into Excel to generate charts:

Besides, since we selected the option to generate PGFPlots figures for Latex, SPMF will have generated latex documents as output, that we can directly compile to use in Latex documents:

As you can see these figures are quite beautiful.

This is just a brief overview of this new feature to do performance experiments. There are more explanations about how it works in the documentation of SPMF, here:
Example: Run Experiments To Compare The Performance of One or More Algorithms on a Dataset (One Parameter Is Varied) (SPMF – Java)

I will continue improving this tool to generate experiments in the next few weeks. In particular, I am now working on an option to generate scalability experiments as well, which should be released soon. Also, I will modify the tool to make it easier to compare algorithms from SPMF with algorithms that are not in SPMF.

—
Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in open-source, Pattern Mining, spmf | Tagged data mining, experiments, java, pattern mining, software, spmf | Leave a comment

Abnormal Google Scholar Profiles

Posted on 2022-06-06 by Philippe Fournier-Viger

Today, I will talk about signs that show that something is wrong or suspicious with some researchers based on their Google Scholar profiles. I will talk about this because I have recently received many CVs and while browsing Google Scholar, and I found some very suspicious profiles.

Normal Google scholar profiles

First, lets talk about what is a normal Google Scholar profile. Generally, a normal Google scholar profile shows a slow increase in the number of citations over time, and sometimes it will stagnates or decrease towards the end of the career of a researcher. Here are some examples of normal profiles:

And sometimes, the citations are increasing a bit more quickly due to some great contributions, but still the curve is quite smooth:

Abnormal Google scholar profiles (case 1)

Now let’s talk about some abnormal Google Scholar profiles. Let’s look at a first example:

What is wrong? Well, here we have a researcher that received 500 citations in 2017, and then about 20 citations in 2018, before going back to 400 citations per year in 2021. I don’t see any logical explanation that would explain a big drop like this other than citation manipulation. Even if we assume that the person did not publish any papers for two years, papers should still continue to be cited. So it seems to indicate that there is some manipulation of the citations that has been done.

Abnormal Google scholar profiles (case 2)

A third example is even more suspicious. Someone who had 1406 citations in 2017 suddenly dropped to 84 citations per year in 2018. I cannot see any reason for this other than some manipulations of citations.

Abnormal Google scholar profiles (case 3)

And here is another suspicious Google Scholar profiles.

Again, why would citations drop like this and never go up again? There is no logical reason other than citation manipulation.

Conclusion

I wrote this blog post to show that the Google Scholar Profiles can reveal many interesting things. Some profiles clearly show something abnormal. I will not write any names. That is not the point of this blog post. My goal is just to show that suspicious behavior can be easily detected in Google Scholar.

For young researcher, my advice is to always be honest, focus on the quality of papers rather than quantity, and do not try to cheat. To become a good researcher, it is important to build a good reputation and this start by doing quality work, being honest and following the rules.

Hope that this has been interesting. If you have any comments, please leave them in the comment section below.

Posted in Academia, Research | Tagged academia, google scholar, Research, suspicious behavior | Leave a comment

How to draw an FP-Tree in Latex? (using TIKZ)

Posted on 2022-06-05 by Philippe Fournier-Viger

In this blog post, I will show how to draw a beautiful FP-Tree data structure in a Latex document. The FP-Tree is a tree-like structure that was proposed in the FP-Growth algorithm for itemset mining, and is also used in many other pattern mining algorithms. I will show how to draw an FP-tree in Latex using the TIKZ library. An FP-tree consists of a table and a tree that are linked using some pointers (dashed arrows). The result is like this:

I took me a while to obtain this result, so here is the code:

\documentclass[a4paper, 12pt]{article} 

\usepackage[utf8]{inputenc}
\usepackage{amssymb}
\usepackage{amsmath}
\usepackage{tikz}
\usetikzlibrary{positioning,matrix}

\begin{document}

\begin{figure}
    \centering
\begin{tikzpicture}
%%%%% THE TREE
\begin{scope}[->,font=\small,draw,circle, every node/.style={fill=white!10,shape=circle,draw},
  edge from parent/.style={black,thick,draw},
  level 1/.style={sibling distance=2.5cm},
  level 2/.style={sibling distance=2.5cm}]
  \node (TREE) {root}
    child {node {$a$:3}
      child {node {$b$:1} }  
      child {node {$c$:2}
        child {node{$d$:1}}}} 
    child {node {$c$:2}
      child {node {$d$:2}}
    };
   % Node links within the tree
  \draw[->, dashed] (TREE-1-2) -- (TREE-2);
  \draw[->, dashed] (TREE-1-2-1) -- (TREE-2-1);
\end{scope}
%%%%% THE TABLE
  \begin{scope}[xshift=-5cm,yshift=-2cm,every tree node/.style={shape=rectangle,draw}]
\matrix (TABLE) [matrix of nodes, 
     row sep=-\pgflinewidth, column sep=-\pgflinewidth, 
    nodes={draw, text height=5mm,
    align=center, minimum width=15mm, inner sep=0mm, minimum height=7mm}]{
Item & Link\\ 
$a$ & ~\\
$b$ & ~ \\
$c$ & ~\\
$d$ & ~\\
};
\end{scope}
%%%% LINKS FROM THE TABLE TO THE TREE
\draw[->,dashed] (TABLE-4-2.center) to  (TREE-1-2.west);
\draw[->,dashed] (TABLE-5-2.center) to  (TREE-1-2-1.west);
\end{tikzpicture}
    \caption{An Example of FP-tree}
    \label{fig:my_label}
\end{figure}
\end{document}
}

Hope that this will be useful! If you like it or if you want to suggest some improvement, please let me know in the comment section below or by e-mail.

—
Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Latex, Pattern Mining | Tagged diagram, fp-growth, fp-tree, fpgrowth, fptree, itemset, itemset mining, latex, tikz, tree | 1 Comment

UDML 2022 workshop on utility mining and learning is back at ICDM

Posted on 2022-06-04 by Philippe Fournier-Viger

I am glad to announce that the UDML workshop (5th Workshop on Utility-Driven Mining and Learning) will be back for the IEEE ICDM 2022 conference this year.

The main focus of this workshop is about the concept of utility or importance in data mining and machine learning. The workshop is suitable for papers on pattern mining (e.g. high utility itemsets, frequent pattern mining, sequential pattern mining and other topics) but also for machine learning papers where these is a concept of utility or importance. In fact, the scope of the workshop can be quite broad, so please do not hesitate to submit your papers.

All the accepted papers are published by IEEE in the ICDM Workshop proceedings, which are indexed by EI, DBLP etc. The dates are as follows:

Paper submission deadline: 2nd September 2022
Paper notifications: 23rd September 2022
Workshop date: 1st December, 2022

This is the website of the workshop: http://www.philippe-fournier-viger.com/utility_mining_workshop_2022/

In the last few years, the workshop has been held at KDD 2018, ICDM 2019, ICDM 2020, and ICDM 2021.

Looking forward to your submissions!

Posted in cfp | Tagged cfp, icdm, ieee icdm, pattern mining, udml, utility mining, workshop | Leave a comment

Brief report about the SMARTDSC 2022 conference

Brief report about the IEA AIE 2022 conference

How to draw subgraphs side-by-side in Latex? (with TIKZ)

(videos) Introduction to sequential rule mining + the CMRules algorithm

(video) Periodic Pattern Mining

Drawing the Powerset of a Set using Latex and TIKZ (Hasse Diagram)

A New Tool for Running Performance Comparison of Algorithms in SPMF 2.54

Abnormal Google Scholar Profiles

Normal Google scholar profiles

Abnormal Google scholar profiles (case 1)

Abnormal Google scholar profiles (case 2)

Abnormal Google scholar profiles (case 3)

How to draw an FP-Tree in Latex? (using TIKZ)

UDML 2022 workshop on utility mining and learning is back at ICDM

Archives

Categories

Recent Posts

Recent Comments

Number of visitors:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Normal Google scholar profiles

Abnormal Google scholar profiles (case 1)

Abnormal Google scholar profiles (case 2)

Abnormal Google scholar profiles (case 3)

Related posts:

Related posts:

Related posts:

Archives

Categories

Recent Posts

Recent Comments

Tag cloud

Number of visitors: