Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the ogbl-biokg with DeepSNAP #35

Open
sophiakrix opened this issue Dec 9, 2021 · 4 comments
Open

Using the ogbl-biokg with DeepSNAP #35

sophiakrix opened this issue Dec 9, 2021 · 4 comments

Comments

@sophiakrix
Copy link

Hi there!

I was just trying out to use the ogbl-biokg graph with DeepSNAP, more precisely using it as input for the link_prediction.py for heterogeneous graphs. Since deepSNAP requires a networkx or pytorch geometric object, I tried to convert the ogbl biokg graph into a pytorch geometric object and then to transform it to a HeteroGraph, as you point out in the tutorial here.

Yet, when I did that it threw an error since the graph would not have an 'edge_index':

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/storage.py in __getattr__(self, key)
     47         try:
---> 48             return self[key]
     49         except KeyError:

~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/storage.py in __getitem__(self, key)
     67     def __getitem__(self, key: str) -> Any:
---> 68         return self._mapping[key]
     69 

KeyError: 'edge_index'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_42299/3068006995.py in <module>
----> 1 graph = Graph.pyg_to_graph(ogbl_biokg_dataset[0])

~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/deepsnap/graph.py in pyg_to_graph(data, verbose, fixed_split, tensor_backend, netlib)
   1991             if netlib is not None:
   1992                 deepsnap._netlib = netlib
-> 1993             if data.is_directed():
   1994                 G = deepsnap._netlib.DiGraph()
   1995             else:

~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/data.py in is_directed(self)
    184     def is_directed(self) -> bool:
    185         r"""Returns :obj:`True` if graph edges are directed."""
--> 186         return not self.is_undirected()
    187 
    188     def clone(self):

~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/data.py in is_undirected(self)
    180     def is_undirected(self) -> bool:
    181         r"""Returns :obj:`True` if graph edges are undirected."""
--> 182         return all([store.is_undirected() for store in self.edge_stores])
    183 
    184     def is_directed(self) -> bool:

~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/data.py in <listcomp>(.0)
    180     def is_undirected(self) -> bool:
    181         r"""Returns :obj:`True` if graph edges are undirected."""
--> 182         return all([store.is_undirected() for store in self.edge_stores])
    183 
    184     def is_directed(self) -> bool:

~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/storage.py in is_undirected(self)
    395             return value.is_symmetric()
    396 
--> 397         edge_index = self.edge_index
    398         edge_attr = self.edge_attr if 'edge_attr' in self else None
    399         return is_undirected(edge_index, edge_attr, num_nodes=self.size(0))

~/.conda/envs/env_ogbl_gpu/lib/python3.8/site-packages/torch_geometric/data/storage.py in __getattr__(self, key)
     48             return self[key]
     49         except KeyError:
---> 50             raise AttributeError(
     51                 f"'{self.__class__.__name__}' object has no attribute '{key}'")
     52 

AttributeError: 'GlobalStorage' object has no attribute 'edge_index'

How can I convert the ogbl-biokg graph into an object that can be used with deepSNAP?

I would very much appreciate any help!

@anniekmyatt
Copy link

anniekmyatt commented Dec 12, 2021

Hello! I'm not a maintainer of the deepsnap package but I would be happy to try and help. Would you mind posting the code that you used, that resulted in the error message, so it is easier to spot what went wrong?

I wonder whether what happened is that you created a heterogeneous graph object with v2 of pytorch-geometric, which might not yet be supported by deepsnap? If you have a pytorch-geometric HeteroData graph object, the edges of the different types are already stored in a dictionary, so then it wouldn't have an edge_index entry (but rather an edge_index_dict) and that migth be what the error message is complaining about?

A question for the developers of deepsnap: are you planning to update deepsnap to be compatible with HeteroData objects from pytorch-geometric >=v2? Can you share something about the roadmap for deepsnap in general?

@sophiakrix
Copy link
Author

Hi @anniekmyatt !
Thanks for chipping in on this. There are only a few lines of code I used for this:

from ogb.linkproppred import PygLinkPropPredDataset

ogbl_biokg_dataset = PygLinkPropPredDataset(name = "ogbl-biokg")
graph = Graph.pyg_to_graph(ogbl_biokg_dataset[0])

@anniekmyatt
Copy link

I just ran this and I'm getting the same error (I added the line from deepsnap.graph import Graph though).

This error occurs because the ogbl_biokg_dataset has an edge_index_dict rather than a single edge_index attribute. OGB uses this edge_index_dict dictionary to specify the edges for the different edge types. If you are keen to use deepsnap, rather than pytorch-geometric directly, it seems like you need to manually create the hetero graph object like here. However, the ogbl_biokg_dataset consists only of triplets, it doesn't have node features so you'll have to create some appropriate (or placeholder) features for the node_feature input. To create the deepsnap heterograph object your code would look something like this:

dataset = PygLinkPropPredDataset(name = "ogbl-biokg")
graph = dataset[0] 
hetero_graph = HeteroGraph(
     node_feature=<insert your node features here,>,
     edge_index=graph.edge_index_dict, # Note that this is a dictionary with edge index for each edge type
     directed=True)

About the node features: this should be a dictionary with keys of each node type (e.g. disease, drug...) and as values a torch tensor of dimension (number_of_nodes, number_of_features_per_node).

I am curious which deepsnap functionality specifically you would like to use? For an RDF graph like this (without node features), wouldn't a package like DGL-KE be more helpful as it has lots of embedding functionality that doesn't rely on message passing of node features?

@sophiakrix
Copy link
Author

sophiakrix commented Dec 21, 2021

Hi @anniekmyatt !

Thanks for your reply. I tried to create a deepsnap HeteroGraph object from scratch here for the ogblbiokg graph. I followed the tutorial from deepsnap for heterogeneous graphs to create the object.

One important step here is to relabel the nodes from the ogblbiokg since it starts with label 0 for every node type, but networkx requires consecutive node labels.

import torch
import tqdm
import numpy as np
from collections import defaultdict
import networkx as nx
from ogb.linkproppred import PygLinkPropPredDataset


ogbl_biokg_dataset = PygLinkPropPredDataset(name = "ogbl-biokg")

# =====================
# Relabel nodes
# =====================

## convert to array for speed
edge_split_array = dict()
for dataset in ['train', 'valid', 'test']:
    edge_split_array[dataset] = dict()
    for key in edge_split[dataset]:
        if type(edge_split[dataset][key]) != list:
            edge_split_array[dataset][key] = edge_split[dataset][key].numpy()
        else: 
            edge_split_array[dataset][key] = np.array(edge_split[dataset][key])

# new node label
current_node_label = 0
# track nodes that have been seen
seen = set()

new_label_mapping = defaultdict(dict)
new_label_mapping_inv = defaultdict(dict)

for dataset in ['train', 'valid', 'test']:
    for i in tqdm(range(len(edge_split_array[dataset]['head']))):

        tmp_head_node = (edge_split_array[dataset]['head'][i], edge_split_array[dataset]['head_type'][i])
        tmp_tail_node = (edge_split_array[dataset]['tail'][i], edge_split_array[dataset]['tail_type'][i])

        if tmp_head_node not in seen:

            seen.add(tmp_head_node)
            new_label_mapping[current_node_label]['original_node_label'] = int(edge_split_array[dataset]['head'][i])
            new_label_mapping[current_node_label]['node_type'] = edge_split_array[dataset]['head_type'][i]
            new_label_mapping_inv[tmp_head_node] = current_node_label
            current_node_label += 1

        if tmp_tail_node not in seen:

            seen.add(tmp_tail_node)
            new_label_mapping[current_node_label]['original_node_label'] = int(edge_split_array[dataset]['tail'][i])
            new_label_mapping[current_node_label]['node_type'] = edge_split_array[dataset]['tail_type'][i]
            new_label_mapping_inv[tmp_tail_node] = current_node_label
            current_node_label += 1


# =====================
# Create HeteroGraph
# =====================
G = nx.DiGraph()

for dataset in ['train', 'valid', 'test']:
    for i in tqdm(range(len(edge_split_array[dataset]['head']))):
        
        # head node
        head_node_id = edge_split_array[dataset]['head'][i].item()
        head_node_type = edge_split_array[dataset]['head_type'][i]
        new_head_node_id = new_label_mapping_inv[(head_node_id, head_node_type)]

        # tail node
        tail_node_id = edge_split_array[dataset]['tail'][i].item()
        tail_node_type = edge_split_array[dataset]['tail_type'][i]
        new_tail_node_id = new_label_mapping_inv[(tail_node_id, tail_node_type)]

        # edge type
        edge_type_id = edge_split_array[dataset]['relation'][i].item()
        edge_type_label = edge_index_to_type_mapping[edge_split_array[dataset]['relation'][i].item()]

        G.add_node(new_head_node_id, node_type=head_node_type, node_label=head_node_type)
        G.add_node(new_tail_node_id, node_type=tail_node_type, node_label=tail_node_type)
        G.add_edge(new_head_node_id, new_tail_node_id, edge_type=str(edge_type_id))
        

When I run this code, it creates a networkx graph, as shown in the tutorial. I can also convert it into a HeteroGraph object from deepsnap with this:

# Transform to a heterograph object that is recognised by deepSNAP
hetero = HeteroGraph(G)

But the object does not have the attribute edges :

>>> hetero
HeteroGraph(G=[], edge_feature=[], edge_index=[], edge_label_index=[], edge_to_graph_mapping=[], edge_to_tensor_mapping=[3540567], edge_type=[], node_feature=[], node_label_index=[], node_to_graph_mapping=[], node_to_tensor_mapping=[93773], node_type=[])

>>> hetero.edges()
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: 'HeteroGraph' object has no attribute 'edges'

I am wondering why this is, since I followed the tutorial of the authors. Do you have any idea? Would be great if any of the authors could comment on this @farzaank @JiaxuanYou @RexYing @jmilldotdev ?

P.S. The reason why I would like to use deepsnap is exactly that it can use node features, which I would add for another graph later on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants