Some interesting statistics about SPMF

While I am preparing the next version of Java SPMF data mining software (2.60), here are some interesting statistics about the project, that I have generated directly from the metadata provided by SPMF. Here it is:

The number of algorithms implemented per person (based on metadata)
Note: this is generated automatically according to the metadata of each algorithm in SPMF using the class DescriptionOfAlgorithm, and some author names are spelled in multiple ways, and may contain some errors. The full list of contributors of SPMF is displayed on the SPMF website.

Philippe Fournier-Viger206
Yang Peng12
Antonio Gomariz Penalver9
Jayakrushna Sahoo6
Jerry Chun-Wei Lin5
Lu Yang5
Chen YangMing5
Wei Song et al.5
Yangming Chen5
Wei Song4
Ting Li4
Azadeh Soltani4
Peng Yang and Philippe Fournier-Viger4
Nader Aryabarzan4
Vincent M. Nofong modified from Philippe Fournier-Viger3
Cheng-Wei Wu et al.3
Zhihong Deng3
Prashant Barhate3
Chaomin Huang et al.3
Jiaxuan Li3
Zhitian Li3
Antonio Gomariz Penalver & Philippe Fournier-Viger3
Yimin Zhang2
Chaomin Huang2
Nouioua et al.2
Ting Li et al.2
Philippe Fournier-Viger and Yuechun Li2
Song et al.2
Fournier-Viger et al.2
Saqib Nawaz et al.2
Chao Cheng and Philippe Fournier-Viger2
Zevin Shaul et al.2
Alan Souza2
Rathore et al.2
Bay Vo et al.2
Junya Li2
Ryan Benton and Blake Johns2
Siddharth Dawar et al.2
Yanjun Yang2
Siddhart Dawar et al.2
Huang et al.1
M.1
C.W. Wu et al.1
Philippe Fournier-Viger and Cheng-Wei Wu1
Sacha Servan-Schreiber1
Dhaval Patel1
jnfrancis1
Cheng-Wei. et al.1
Ganghuan He and Philippe Fournier-Viger1
Siddharth Dawar1
Improvements by Nouioua et al.1
Philippe Fournier-Viger and Chao Cheng1
Yang Peng et al.1
Salvemini E1
Java conversion by Xiang Li and Philippe Fournier-Viger1
Alex Peng et al.1
Hoang Thanh Lam1
Souleymane Zida1
F.1
Shifeng Ren1
Lanotte1
github: limuhangk1
Youxi Wu et al.1
Hazem El-Raffiee1
Jiakai Nan1
Ahmed El-Serafy1
Souleymane Zida and Philippe Fournier-Viger1
Feremans et al.1
Han J.1
Shi-Feng Ren1
Fumarola F1
Vikram Goyal1
P. F.1
Petijean et al.1
Srinivas Paturu1
Malerba D1
& Malerba1
Ashish Sureka1
Fumarola1
Ying Wang and Peng Yang and Philippe Fournier-Viger1
Sabarish Raghu1
Wu et al.1
D.1
Srikumar Krishnamoorty1
Siddharth Dawar et al1
Ceci1
Wu1

The number of algorithms per category

HIGH-UTILITY PATTERN MINING83
FREQUENT ITEMSET MINING54
SEQUENTIAL PATTERN MINING48
TOOLS – DATA VIEWERS22
TIME SERIES MINING16
ASSOCIATION RULE MINING16
TOOLS – DATA TRANSFORMATION15
PERIODIC PATTERN MINING13
EPISODE MINING10
EPISODE RULE MINING10
CLUSTERING10
SEQUENTIAL RULE MINING10
GRAPH PATTERN MINING6
TOOLS – DATA GENERATORS5
TOOLS – STATS CALCULATORS4
TOOLS – SPMF GUI4
TOOLS – RUN EXPERIMENTS1
PRIVACY-PRESERVING DATA MINING1

The number of algorithms per type

DATA_MINING259
DATA_PROCESSOR30
DATA_VIEWER25
DATA_GENERATOR5
OTHER_GUI_TOOL4
DATA_STATS_CALCULATOR4
EXPERIMENT_TOOL1

The number of algorithms for each input data type

  • Transaction database (194)
  • Simple transaction database (80)
  • Transaction database with utility values (77)
  • Sequence database (73)
  • Simple sequence database (48)
  • Transaction database with timestamps (17)
  • Time series database (16)
  • Sequence database with timestamps (9)
  • Database of double vectors (8)
  • Labeled graph database (6)
  • Graph database (6)
  • Transaction database with utility values and time (5)
  • Multi-dimensional sequence database with timestamps (4)
  • Text file (4)
  • Multi-dimensional sequence database (4)
  • Time interval sequence database (3)
  • Sequence database with utility values (3)
  • Transaction database with utility values and taxonomy (3)
  • Transaction database with shelf-time periods and utility values (3)
  • Transaction database with utility values (HUQI) (3)
  • Sequence database with cost and binary utility (3)
  • Simple time interval sequence database (3)
  • Frequent closed itemsets (3)
  • Sequence database with cost and numeric utility (2)
  • Transaction database with utility values skymine format (2)
  • Transaction database with profit information (2)
  • Uncertain transaction database (2)
  • ARFF file (2)
  • Transaction database with utility and cost values (2)
  • Sequence database with strings (2)
  • Dynamic Attributed Graph (2)
  • Simple sequence database with strings (2)
  • Sequence database with utility and probability values (2)
  • Cost sequence database (2)
  • Sequential patterns (1)
  • Set of text documents (1)
  • Sequence database in non SPMF format (1)
  • Clusters (1)
  • Sequence (1)
  • Single sequence (1)
  • Transaction database with utility values (MEMU) (1)
  • Transaction database in non SPMF format (1)

The number of algorithms for each output data type

  • High-utility patterns (91)
  • High-utility itemsets (60)
  • Frequent patterns (56)
  • Sequential patterns (51)
  • Frequent itemsets (37)
  • Frequent sequential patterns (30)
  • Database of instances (22)
  • Association rules (16)
  • Episodes (15)
  • Time series database (14)
  • Periodic patterns (13)
  • Transaction database (12)
  • Periodic frequent patterns (12)
  • Sequential rules (11)
  • Episode rules (10)
  • Frequent closed itemsets (9)
  • Frequent closed sequential patterns (8)
  • Simple transaction database (8)
  • Sequence database (8)
  • Closed itemsets (8)
  • Top-k High-utility itemsets (7)
  • Closed high-utility itemsets (7)
  • Simple sequence database (6)
  • Frequent Sequential patterns (6)
  • Clusters (6)
  • Closed patterns (6)
  • Frequent episodes (6)
  • Frequent sequential rules (5)
  • Rare itemsets (5)
  • Rare patterns (5)
  • High average-utility itemsets (5)
  • Frequent episode rules (5)
  • Skyline patterns (4)
  • Subgraphs (4)
  • Generator patterns (4)
  • High-Utility episodes (4)
  • Generator itemsets (4)
  • Cost-efficient Sequential patterns (3)
  • Skyline High-utility itemsets (3)
  • Frequent sequential generators (3)
  • Frequent subgraphs (3)
  • Frequent itemsets with multiple thresholds (3)
  • Local Periodic frequent itemsets (3)
  • Correlated patterns (3)
  • Quantitative high utility itemsets (3)
  • Maximal patterns (2)
  • Cross-Level High-utility itemsets (2)
  • Multi-dimensional frequent closed sequential patterns (2)
  • Maximal itemsets (2)
  • High-utility probability sequential patterns (2)
  • Frequent maximal sequential patterns (2)
  • Density-based clusters (2)
  • Periodic frequent itemsets common to multiple sequences (2)
  • On-shelf high-utility itemsets (2)
  • Multi-dimensional frequent closed sequential patterns with timestamps (2)
  • Top-k frequent sequential rules (2)
  • Frequent maximal itemsets (2)
  • Frequent closed and generator itemsets (2)
  • Closed association rules (2)
  • Frequent time interval sequential patterns (2)
  • Perfectly rare itemsets (2)
  • Sequence Database with timestamps (2)
  • Top-k frequent sequential patterns (2)
  • Top-k High-Utility episodes (2)
  • Transaction database with utility values (2)
  • Closed and generator patterns (2)
  • Periodic high-utility itemsets (2)
  • Minimal rare itemsets (2)
  • Trend patterns (2)
  • Correlated High-utility itemsets (2)
  • Generator high-utility itemsets (2)
  • Association rules with lift and multiple support thresholds (2)
  • Rare correlated itemsets common to multiple sequences (1)
  • Productive Periodic frequent itemsets (1)
  • Peak high-utility itemsets (1)
  • Indirect association rules (1)
  • Top-k non-redundant association rules (1)
  • Top-k class association rules (1)
  • Database of double vectors (1)
  • Uncertain frequent itemsets (1)
  • Non-redundant Periodic frequent itemsets (1)
  • Transaction database with utility values and time (1)
  • Top-k Stable Periodic frequent itemsets (1)
  • Rare correlated itemsets (1)
  • Frequent sequential rules with strings (1)
  • Top-k frequent episodes (1)
  • Minimal itemsets (1)
  • High-utility association rules (1)
  • Uncertain patterns (1)
  • Ordered frequent sequential rules (1)
  • Top-k association rules (1)
  • High-utility itemsets with length constraints (1)
  • High-utility generator itemsets (1)
  • Multi-dimensional frequent sequential patterns with timestamps (1)
  • Local high-utility itemsets (1)
  • High-utility sequential rules (1)
  • Periodic frequent itemsets (1)
  • Density-based cluster ordering of points (1)
  • Frequent sequential patterns with occurrences (1)
  • Frequent sequential patterns with timestamps (1)
  • Frequent closed sequential patterns with timestamps (1)
  • Minimal high-utility itemsets (1)
  • Significant Trend Sequences (1)
  • Frequent fuzzy itemsets (1)
  • Periodic rare patterns (1)
  • Top-k Frequent subgraphs (1)
  • Minimal patterns (1)
  • Stable Periodic frequent itemsets (1)
  • Maximal high-utility itemsets (1)
  • Self-Sufficient Itemsets (1)
  • Text clusters (1)
  • Multi-dimensional frequent sequential patterns (1)
  • Top-k frequent non-redundant sequential rules (1)
  • Cost transaction database (1)
  • Hierarchical clusters (1)
  • Top-k frequent sequential patterns with leverage (1)
  • Compressing sequential patterns (1)
  • Minimal non-redundant association rules (1)
  • Sporadic association rules (1)
  • High-utility rules (1)
  • Locally trending high-utility itemsets (1)
  • Skyline Frequent High-utility itemsets (1)
  • High-utility sequential patterns (1)
  • Progressive Frequent Sequential patterns (1)
  • Erasable itemsets (1)
  • Attribute Evolution Rules (1)
  • Generators of high-utility itemsets (1)
  • Multiple Frequent fuzzy itemsets (1)
  • Correlated itemsets (1)
  • Multi-Level High-utility itemsets (1)
  • Association rules with lift (1)
  • Top-k sequential patterns with quantile-based cohesion (1)
  • Erasable patterns (1)
  • Frequent generator itemsets (1)
  • Frequent high-utility itemsets (1)

Conclusion

Hope that this is interesting 🙂 If you have any comments, please leave them below.

This entry was posted in Data Mining, Data science, spmf and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *