Create base training model using a chunksize #9

sshivam95 · 2024-06-11T15:16:20Z

The original idea of using skiprows together with nrows parameter in pandas.read_csv was a bad idea.

Pandas is using a bafflingly memory-intensive way of implementing skiprows. On using skiprows=12_000_000_000, tt is basically doing skiprows = set(list(range(skiprows))). It's building a giant list and a set, each containing 12 billion triples!

The text was updated successfully, but these errors were encountered:

sshivam95 · 2024-06-11T17:26:36Z

Use iterator=True option in pandas.read_csv

sshivam95 · 2024-06-11T17:34:31Z

Won't do. Create files and use these files to train models

sshivam95 · 2024-06-12T15:32:12Z

Update: Partition dataset using domains (namespace in XML or the authority part of the base URL). Basically the domain of subject and object should be same. If subject is connected to a blank node so it is in the domain in which the subject is.

sshivam95 · 2024-06-13T12:42:06Z

Update: For blank nodes connected to blank nodes, we have to take care of CBD.

- This includes the dataset which exceeds the main memory as a whole, e.g. rdfa, hcard, microdata, jsonld - A solution to Issue #9

sshivam95 · 2024-06-18T08:52:55Z

sshivam95 · 2024-06-27T14:41:15Z

sshivam95 added a commit that referenced this issue Jun 18, 2024

Add domain extraction for very large datasets

e7fe3c2

- This includes the dataset which exceeds the main memory as a whole, e.g. rdfa, hcard, microdata, jsonld - A solution to Issue #9

This was referenced Jun 18, 2024

Link individual small graphs with wikidata usig LIMES #10

Open

Clean and materialize domain specific dataset using sed #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create base training model using a chunksize #9

Create base training model using a chunksize #9

sshivam95 commented Jun 11, 2024 •

edited

Loading

sshivam95 commented Jun 11, 2024

sshivam95 commented Jun 11, 2024

sshivam95 commented Jun 12, 2024

sshivam95 commented Jun 13, 2024

sshivam95 commented Jun 18, 2024 •

edited

Loading

sshivam95 commented Jun 27, 2024 •

edited

Loading

Create base training model using a chunksize #9

Create base training model using a chunksize #9

Comments

sshivam95 commented Jun 11, 2024 • edited Loading

sshivam95 commented Jun 11, 2024

sshivam95 commented Jun 11, 2024

sshivam95 commented Jun 12, 2024

sshivam95 commented Jun 13, 2024

sshivam95 commented Jun 18, 2024 • edited Loading

sshivam95 commented Jun 27, 2024 • edited Loading

sshivam95 commented Jun 11, 2024 •

edited

Loading

sshivam95 commented Jun 18, 2024 •

edited

Loading

sshivam95 commented Jun 27, 2024 •

edited

Loading