Test on memory map approach using `mmappickle.mmapdict` on `/dev/shm` instead of PFS on Clusters #5

sshivam95 · 2024-06-11T09:14:00Z

The memory map approach is taking a lot of time to process the index dictionary in the memory mapped file. It took $3$ days to process $41,602$ entities out of $5,037,674$ in a chunk of 10 million triples.

sshivam95 · 2024-06-11T09:18:50Z

#4 runs on the parallel file system on Noctua clusters with uses lustre. After a discussion with them, it turns out that lustre has a very bad memory management when it comes to memory mapped file. Therefore, storing the memory mapped files in /dev/shm folder should do the trick

sshivam95 · 2024-06-11T09:20:59Z

Update: the write on memory mapped pickle dictionary in /dev/shm is way faster than lustre but still comparatively very slow. It took $1$ day to process $146,893$ entities out of $5,037,674$.

Way faster than lustre but very slow overall.

sshivam95 · 2024-06-11T09:39:46Z

Alternate solution, create a B+ tree implementation in C++

sshivam95 · 2024-06-12T15:44:20Z

Update: Might not be needing this approach if using domain specific datasets under issue #9

sshivam95 closed this as completed Jun 11, 2024

sshivam95 mentioned this issue Jun 11, 2024

Main memory overloading when training using DICE-embeddings library #1

Open

This was referenced Jun 11, 2024

Create indexed file for dice-embedding training #7

Closed

Incremental saving approach #2

Closed

sshivam95 mentioned this issue Jun 11, 2024

Use dictionary to create indexing and save after a memory threshold #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test on memory map approach using `mmappickle.mmapdict` on `/dev/shm` instead of PFS on Clusters #5

Test on memory map approach using `mmappickle.mmapdict` on `/dev/shm` instead of PFS on Clusters #5

sshivam95 commented Jun 11, 2024

sshivam95 commented Jun 11, 2024

sshivam95 commented Jun 11, 2024

sshivam95 commented Jun 11, 2024

sshivam95 commented Jun 12, 2024 •

edited

Loading

Test on memory map approach using mmappickle.mmapdict on /dev/shm instead of PFS on Clusters #5

Test on memory map approach using mmappickle.mmapdict on /dev/shm instead of PFS on Clusters #5

Comments

sshivam95 commented Jun 11, 2024

sshivam95 commented Jun 11, 2024

sshivam95 commented Jun 11, 2024

sshivam95 commented Jun 11, 2024

sshivam95 commented Jun 12, 2024 • edited Loading

Test on memory map approach using `mmappickle.mmapdict` on `/dev/shm` instead of PFS on Clusters #5

Test on memory map approach using `mmappickle.mmapdict` on `/dev/shm` instead of PFS on Clusters #5

sshivam95 commented Jun 12, 2024 •

edited

Loading