Subgraph mining datasets

In this post, I will provide links to standard benchmark datasets that can be used for frequent subgraph mining. Moreover, I will provide a set of small graph datasets that can be used for debugging subgraph mining algorithms.

The format of graph datasets

A graph dataset is a text file which contains one or more graphs. A graph is defined by a few lines of text that follow the following format (used by the GSpan algorithm)

t # N This is the first line of a graph. It indicates that this is the N-th graph in the file

v M L This line defines the M-th vertex of the current graph, which has a label L

e P Q L This line defines an edge, which connects the P-th vertex with the Q-th vertex. This edge has the label L

Small datasets for debugging

Here are some small datasets that can be used for debugging frequent subgraph mining algorithms. Each dataset contains one or two graphs, which is enough for some small debugging tasks.

1) single_graph1.txt

Content of the file:

Visual representation:

(L10) ---L21--- (L11) ---- L21 ---- (L12)
  0              1                   2

2) single_graph2.txt

Content of the file:

Visual representation:

(L10) --- L21 --- (L11) --- L21 ---- (L10)
  0                 1                  2
                    |
                    |
                   L21
                    |
                    |
                  (L10)3

3) single_graph3.txt

Content of the file:

Visual representation:

 (L10)---- (L11) ---- (L10)
    0        1          2

4) single_graph4.txt

Content of the file:

Visual representation:

    (L10) ------- L20 ------ (L11)
      |                    /   |
      |                 /      |
      |              /         |
      L21          /           |
      |         L23           L22
      |        /               |
      |      /                 |
      |    /                   |
      |  /                     |
    (L10) ------ L20 -------- (L11)

5) single_graph5.txt

Content of the file:

Visual representation:

(10) -- 20 --  (11) -- 20 – (10) –-- 20 –---(11)
  0            2           1                3

6) One_graph.txt

Content of the file:

Visual representation:

Large datasets for subgraph mining

Moreover, here are about 15 large sugraph datasets that are used in frequent sub-graph mining available at this webpage:

SPMF Public Datasets (webpage)

Want to try frequent subgraph mining?

If you want to try frequent subgraph mining algorithms, some public fast Java open-source implementations of TKG for top-k frequent subgraph mining, cgSpan and gSpan are available in the SPMF data mining library.

Conclusion

In this blog post, I have share some helpful datasets. If you want to know more about subgraph mining you may read my short introduction to subgraph mining.

—
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

How to call SPMF from R?

Brief Report about the PKDD 2020 conference

Key Papers about High Utility Itemset Mining

Subgraph mining datasets

Related posts:

Leave a Reply Cancel reply

Archives

Categories

Recent Posts

Recent Comments

Number of visitors:

Subgraph mining datasets

Related posts:

Leave a Reply Cancel reply

Archives

Categories

Recent Posts

Recent Comments

Tag cloud

Number of visitors: