Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load torch tensors in OGBDatasets #107

Closed
CarloLucibello opened this issue Mar 27, 2022 · 7 comments · Fixed by #172
Closed

load torch tensors in OGBDatasets #107

CarloLucibello opened this issue Mar 27, 2022 · 7 comments · Fixed by #172
Labels

Comments

@CarloLucibello
Copy link
Member

Some of the features of the OGBDataset are downloaded as torch tensor stored in the ".pt" format. They are currently ignored at the moment, but we could load them using Pickle.jl (e.g. see this comment)

@Dsantra92
Copy link
Collaborator

Can I work on this?

@CarloLucibello
Copy link
Member Author

Sure. I don't remember for which specific dataset this was needed though

@CarloLucibello
Copy link
Member Author

This problem can be seen in OGBDataset("ogbl-collab")

@Dsantra92
Copy link
Collaborator

Been inactive due to Uni. exams, will start working on it today.

@yuehhua
Copy link
Collaborator

yuehhua commented May 21, 2022

Some problems have been overcome here, including loading ".pt" format using Pickle.jl and have been discussed with @chengchingwen : https://github.com/yuehhua/GraphMLDatasets.jl/blob/65d6a2bb02d31569a64b47004a0c4b192739a066/src/preprocess.jl#L391
Hope these code help.

@Dsantra92
Copy link
Collaborator

Dsantra92 commented Jun 13, 2022

Split tensors appear for edge-level tasks in OGB Datasets. The dataset loading for LinkPropped Datasets differs from GraphPropped or NodePropped. We might need a change of OGB-Dataset APIs.
Here are some approaches:

  1. Mention the split of the dataset
data = OGBDataset(name, split; dir)

But this has one obvious problem: loading any split eg. train would involve computation of the other two splits (val and test) given the intertwined nature of how the data is stored.

  1. Return train, test and validation split for each dataset
train_data, test_data, valid_data = OGBDataset(name; dir)

Can be ambiguous for non-split datasets and does not exactly match with other dataset APIs.

  1. Compute split from dataset
data = OGBDataset(name; dir)
train_split = split(data, :train) # this may weird way to do
# maybe something like
train_split = data[:train]

Representation for link tasks in OGBDataset will differ from Node or Graph tasks.

@Dsantra92
Copy link
Collaborator

Also, API for splits should be consistent for different data sources. eg: Cora and OGBDataset access training masks using different APIs.

@Dsantra92 Dsantra92 linked a pull request Sep 3, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants