First steps on migrating to a new version of mongo. #3651

zenhack · 2022-07-31T05:21:35Z

Per discussion on IRC, we're going to write a helper binary for this in Go, because:

The currently maintained driver package supports mongo versions back
to 2.6 (what we're using), which is not true for most other languages.
We already have go in our toolchain for boringssl's test suite.
The build is unlikely to break due to bitrot re: Go's toolchain.
The generated binary is static, so all else fails we can just bundle
the executable, though I don't anticipate that.
I will be much more productive than in something else.

Right now all this does is bundle up a hello-world go binary with Sandstorm's build system. Marking as a draft.

We're going to write a helper binary for this in Go, because: - The currently maintained driver package supports mongo versions back to 2.6 (what we're using), which is not true for most other languages. - We already have go in our toolchain for boringssl's test suite. - The build is unlikely to break due to bitrot re: Go's toolchain. - The generated binary is static, so all else fails we can just bundle the executable, though I don't anticipate that. - I will be much more productive than in something else.

I'm getting a permissions error trying to just run this from my local dev directory; need to figure out what's going on.

We're successfully listing the collections.

xet7 · 2022-08-02T03:42:56Z

Could be related, listing collections of MongoDB, exporting to JSON.

https://github.com/wekan/wekan/wiki/Export-from-Wekan-Sandstorm-grain-.zip-file#11b-dump-database-to-json-text-files

ocdtrekkie · 2022-08-02T04:21:59Z

@xet7 I am hoping this particular project may also yield a way to upgrade meteor-spk-built applications to the newest Mongo version as well.

xet7 · 2022-08-02T04:35:59Z

@ocdtrekkie

I did also think if it would make any sense to convert raw MongoDB database files directly to other format, without starting any MongoDB server. And would there be similar ways for mongodump file format. But I presume it may not be so useful.

https://www.percona.com/blog/2021/05/18/wiredtiger-file-forensics-part-1-building-wt/

It would be nice if all converting of Sandstorm MongoDB databases would be scheduled to happen at night, so that it would not disturb daytime using of apps.

These conversions needs also checks, that is there enough free disk space to convert.

xet7 · 2022-08-02T04:38:22Z

mongodump (or mongorestore) of 400 GB MongoDB database takes about 4 hours. Some Snap and Docker WeKan users have that size databases.

zenhack · 2022-08-02T05:35:05Z

FWIW, the dumping half of this is already written. I'd be wary of dumping it to json though, since mongo's native format is bson, which supports a couple extra data types, like timestamps and binary blobs -- so dumping to json loses information.

I suspect for sandstorm itself it won't be too slow; the database isn't that huge, since most data is stored in grains' storage. Though I'd be curious to know how big the database on alpha is (@kentonv?)

Note on my local dev instance the on-disk use of /opt/sandstorm/var/mongo was around ~128MiB, and the exported data was less than 512KiB, so I assume it's doing some pre-allocation or something.

xet7 · 2022-08-02T17:26:10Z

@zenhack

I'd be wary of dumping it to json though, since mongo's native format is bson, which supports a couple extra data types, like timestamps and binary blobs -- so dumping to json loses information.

Really? Binary blobs are exported in base64 encoded format, like GridFS attachments. Is somewhere more info about this?

zenhack · 2022-08-02T17:47:08Z

Quoting Lauri Ojansivu (2022-08-02 13:26:21)

***@***.*** I'd be wary of dumping it to json though, since mongo's native format is bson, which supports a couple extra data types, like timestamps and binary blobs -- so dumping to json loses information. Really? Binary blobs are exported in base64 encoded format, like GridFS attachments. Is somewhere more info about this?

When you read it back in, how do you tell if it's supposed to be a string or a base64-encoded binary?

xet7 · 2022-08-02T18:22:34Z

@zenhack

That bash script exports each collection/table in separate json files. By opening each file in text editor, I see json structure, is it nested, etc. At attachments, each part of attachments has ID and base64 string. If some file is divided to many parts, other info in json shows what is filename, size, md5, part IDs, etc.

Sure more useful way would be to save attachments to binary files, and use those unique file IDs as filenames.It's not so useful to use real filenames, because there are many attachments with same names, special characters urlencoded, etc.

Another way would be to name attachments by their sha256 or other hash, and that way do deduplication and save disk space.

Some thinking is also about encrypting files etc data, but I have not coded it yet.

I also have not coded yet scripts to convert JSON etc to other formats.

zenhack · 2022-08-02T18:55:07Z

You can certainly come up with ways of doing this that are safe for a particular database. But the way this PR does it, we just export as bson instead of json; then we don't have to worry about what the contents of the database actually are. Quoting Lauri Ojansivu (2022-08-02 14:22:45)

…

***@***.*** That bash script exports each collection/table in separate json files. By opening each file in text editor, I see json structure, is it nested, etc. At attachments, each part of attachments has ID and base64 string. If some file is divided to many parts, other info in json shows what is filename, size, md5, part IDs, etc. Sure more useful way would be to save attachments to binary files, and use those unique file IDs as filenames.It's not so useful to use real filenames, because there are many attachments with same names, special characters urlencoded, etc. Another way would be to name attachments by their sha256 or other has, and that way do deduplication and save disk space. Some thinking is also about encrypting files etc data, but I have not coded it yet. I also have not coded yet scripts to convert JSON etc to other formats. -- Reply to this email directly, [2]view it on GitHub, or [3]unsubscribe. You are receiving this because you were mentioned. Message ID: ***@***.***> Verweise 1. https://github.com/zenhack 2. #3651 (comment) 3. https://github.com/notifications/unsubscribe-auth/AAGXYPQMPBUGQXRXED3Y3O3VXFRPLANCNFSM55EKIPLQ

Also make the flag an integer instead of a string.

zenhack marked this pull request as draft July 31, 2022 05:21

xet7 mentioned this pull request Jul 31, 2022

Making imports easier by allowing to upload JSON and not copy-paste large text files wekan/wekan#4615

Closed

zenhack force-pushed the mongo-migrate branch from 9cbde21 to 0f0aa36 Compare August 1, 2022 05:32

zenhack added 5 commits August 1, 2022 01:32

Attempt to list connection names.

f70464b

I'm getting a permissions error trying to just run this from my local dev directory; need to figure out what's going on.

Fix build system re: mongo-migrate

9628577

Get mongo-migrate to actually connect on startup.

d3122f5

We're successfully listing the collections.

Take a snapshot

e539143

zenhack added 2 commits November 7, 2022 18:09

Default value for --mongo-port

5ac63a0

Also make the flag an integer instead of a string.

Merge remote-tracking branch 'origin/master' into mongo-migrate

333c45c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First steps on migrating to a new version of mongo. #3651

First steps on migrating to a new version of mongo. #3651

zenhack commented Jul 31, 2022

xet7 commented Aug 2, 2022

ocdtrekkie commented Aug 2, 2022

xet7 commented Aug 2, 2022

xet7 commented Aug 2, 2022

zenhack commented Aug 2, 2022 •

edited

Loading

xet7 commented Aug 2, 2022

zenhack commented Aug 2, 2022 via email

xet7 commented Aug 2, 2022 •

edited

Loading

zenhack commented Aug 2, 2022 via email

First steps on migrating to a new version of mongo. #3651

Are you sure you want to change the base?

First steps on migrating to a new version of mongo. #3651

Conversation

zenhack commented Jul 31, 2022

xet7 commented Aug 2, 2022

ocdtrekkie commented Aug 2, 2022

xet7 commented Aug 2, 2022

xet7 commented Aug 2, 2022

zenhack commented Aug 2, 2022 • edited Loading

xet7 commented Aug 2, 2022

zenhack commented Aug 2, 2022 via email

xet7 commented Aug 2, 2022 • edited Loading

zenhack commented Aug 2, 2022 via email

zenhack commented Aug 2, 2022 •

edited

Loading

xet7 commented Aug 2, 2022 •

edited

Loading