-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Main memory overloading when training using DICE-embeddings library #1
Comments
A solution to point 1 is to generate the indices of unique entities and relations before hand and convert the dataset into an index transformed dataset. |
Initially, ran individual tests on different portions of the dataset to test this approach in a pickle file. It works for smaller datasets up to 2 million triples but fails after that. |
Alternative solution: Issue 2 comment |
Another proposal is to use mmappickle library which is designed for “unstructured '' parallel access, with a strong emphasis on adding new data. #4 Issues: - The indexing is done directly to a memory mapped file in the form of dictionaries using
This process of writing to a memory mapped file in the Parallel File System of Noctua clusters is very slow because lustre has a very bad management for memory mapped files. #5 |
Another solution is to use the DGX partition nodes which have After running the training test on 1 chunk (10 million triples) using dice-embedding library, we get the following file sizes:
The estimated size of files for full dataset (
|
A workaround is to create indexed |
Update: Issue #9 creates a workaround for training embedding models from individual graphs by splitting the dataset based on domain. A domain for a triple is defined as the authority base URL in their namespace. Split the dataset by creating different dataset files based on domain names and then train the models based on these small graphs. |
The RAM is getting overloaded because the unique entities and relations are stored in RAM memory on GPU nodes of Noctua 1 (180GB usable main memory) and on Noctua 2 (470GB usable main memory). This leads to an Out of Memory (OOM) error in the SLURM.
The text was updated successfully, but these errors were encountered: