Skip to content

2.0.0

Compare
Choose a tag to compare
@rusty1s rusty1s released this 13 Sep 07:48
· 2653 commits to master since this release

PyG 2.0 πŸŽ‰ πŸŽ‰ πŸŽ‰

PyG (PyTorch Geometric) has been moved from my own personal account rusty1s to its own organization account pyg-team to emphasize the ongoing collaboration between TU Dortmund University, Stanford University and many great external contributors. With this, we are releasing PyG 2.0, a new major release that brings sophisticated heterogeneous graph support, GraphGym integration and many other exciting features to PyG.

If you encounter any bugs in this new release, please do not hesitate to create an issue.

Heterogeneous Graph Support

We finally provide full heterogeneous graph support in PyG 2.0. See here for the accompanying tutorial.

Highlights

  • Heterogeneous Graph Storage: Heterogeneous graphs can now be stored in their own dedicated data.HeteroData class (thanks to @yaoyaowd):

    from torch_geometric.data import HeteroData
    
    data = HeteroData()
    
    # Create two node types "paper" and "author" holding a single feature matrix:
    data['paper'].x = torch.randn(num_papers, num_paper_features)
    data['author'].x = torch.randn(num_authors, num_authors_features)
    
    # Create an edge type ("paper", "written_by", "author") holding its graph connectivity:
    data['paper', 'written_by', 'author'].edge_index = ...  # [2, num_edges]

    data.HeteroData behaves similar to a regular homgeneous data.Data object:

    print(data['paper'].num_nodes)
    print(data['paper', 'written_by', 'author'].num_edges)
    data = data.to('cuda')
  • Heterogeneous Mini-Batch Loading: Heterogeneous graphs can be converted to mini-batches for many small and single giant graphs via the loader.DataLoader and loader.NeighborLoader loaders, respectively. These loaders can now handle both homogeneous and heterogeneous graphs:

    from torch_geometric.loader import DataLoader
    
    loader = DataLoader(heterogeneous_graph_dataset, batch_size=32, shuffle=True)
    
    from torch_geometric.loader import NeighborLoader
    
    loader = NeighborLoader(heterogeneous_graph, num_neighbors=[30, 30], batch_size=128,
                            input_nodes=('paper', data['paper'].train_mask), shuffle=True)
  • Heterogeneous Graph Neural Networks: Heterogeneous GNNs can now easily be created from homogeneous ones via nn.to_hetero and nn.to_hetero_with_bases. These processes take an existing GNN model and duplicate their message functions to account for different node and edge types:

    from torch_geometric.nn import SAGEConv, to_hetero
    
    class GNN(torch.nn.Module):
        def __init__(hidden_channels, out_channels):
            super().__init__()
            self.conv1 = SAGEConv((-1, -1), hidden_channels)
            self.conv2 = SAGEConv((-1, -1), out_channels)
    
        def forward(self, x, edge_index):
            x = self.conv1(x, edge_index).relu()
            x = self.conv2(x, edge_index)
            return x
    
    model = GNN(hidden_channels=64, out_channels=dataset.num_classes)
    model = to_hetero(model, data.metadata(), aggr='sum')

Additional Features

Managing Experiments with GraphGym

GraphGym is now officially supported in PyG 2.0 via torch_geometric.graphgym. See here for the accompanying tutorial. Overall, GraphGym is a platform for designing and evaluating Graph Neural Networks from configuration files via a highly modularized pipeline (thanks to @JiaxuanYou):

  1. GraphGym is the perfect place to start learning about standardized GNN implementation and evaluation
  2. GraphGym provides a simple interface to try out thousands of GNN architectures in parallel to find the best design for your specific task
  3. GraphGym lets you easily do hyper-parameter search and visualize what design choices are better

Breaking Changes

  • The datasets.AMiner dataset now returns a data.HeteroData object. See here for our updated MetaPath2Vec example on AMiner.
  • transforms.AddTrainValTestMask has been replaced in favour of transforms.RandomNodeSplit
  • Since the storage layout of data.Data significantly changed in order to support heterogenous graphs, already processed datasets need to be re-processed by deleting the root/processed folder.
  • data.Data.__cat_dim__ and data.Data.__inc__ now expect additional input arguments:
    def __cat_dim__(self, key, value, *args, **kwargs):
        pass
      
    def __inc__(self, key, value, *args, **kwargs):
        pass
    In case you modified __cat_dim__ or __inc__ functionality in a customized data.Data object, please ensure to apply the above changes.

Deprecations

Additional Features

Minor Changes

Bugfixes