Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUST-1488 Populate inserted_ids when an insert_many fails #761

Closed
wants to merge 5 commits into from

Conversation

kmahar
Copy link
Contributor

@kmahar kmahar commented Oct 6, 2022

fixes #748

The fix ended up being a little more complicated than I initially expected in order to get it work properly for multi-batch bulk writes but was still relatively straightforward; most of the diff here is just adding testing. @isabelatkinson we should consider adding automated tests around this type of thing as part of the new bulk spec

@@ -1196,7 +1202,7 @@ where

let mut cumulative_failure: Option<BulkWriteFailure> = None;
let mut error_labels: HashSet<String> = Default::default();
let mut cumulative_result: Option<InsertManyResult> = None;
let mut cumulative_inserted_ids = HashMap::new();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially had just added logic to set the inserted_ids on the cumulative_failure for failed batches, but I realized that would not correctly handle cases where there are a mix of successful and unsuccessful batches -- for example, if the first batch succeeds and the second batch fails, we would put the inserted_ids from batch 1 on the cumulative_result but the inserted_ids from batch 2 on the cumulative_failure, and then only return the failure.
given this, it seemed simplest to switch to tracking the inserted_ids in one place and then just setting them on the InsertManyResult / BulkWriteFailure at the end.


// an ordered, 2-batch bulk write where a document in the first batch generates write error:
// nothing should be inserted
// note: these numbers were chosen because maxWriteBatchSize is 100,000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isabelatkinson for the purpose of trying to write spec tests around batching behavior for bulk, we might consider requiring drivers have some way to internally override maxWriteBatchSize. otherwise we'd need massive yaml/json files to fit in enough data to generate multiple batches (the limit is either 100k writes or 16 MB, whichever is smaller).
alternatively, drivers can write prose tests like this one, but I think it would be much easier if we could artificially use bulk batches with a smaller size.
alternatively/additionally we could consider asking the server to make the parameter somehow configurable so we could temporarily decrease it for the bulk tests

@kmahar
Copy link
Contributor Author

kmahar commented Oct 7, 2022

the evg failures don't seem related.
also, I suppose we should probably backport this to the 2.3 branch and tag 2.3.2

@kmahar kmahar marked this pull request as ready for review October 7, 2022 22:01
@kmahar kmahar requested review from a team and abr-egn and removed request for a team October 7, 2022 22:01
Copy link
Contributor

@abr-egn abr-egn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!


// an unordered, 2-batch bulk write where the second-to-last document in the second batch
// generates a write error: everything besides that document should be inserted
let res = run_inserted_ids_test(&client, &docs, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add one more case where we have a 2 batch write with a write error in both batches?

)),
None => Ok(cumulative_result.unwrap_or_else(InsertManyResult::new)),
Some(mut failure) => {
failure.inserted_ids = cumulative_inserted_ids;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is actually currently pub(crate), and the only reason the user was able to observe it was the derived Debug implementation. I did some digging and found that RUST-260 was filed to eventually expose this field properly and to stop discarding the intermediate results, and I believe the field originally was introduced for counting writes attempted rather than actually tracking inserted ids (per #377 (comment)).

Given that we've done the work for populating this field, I think we probably ought to do RUST-260 also and mark the field as pub.

Regarding the Jira logistics of this, I wonder if it makes sense to rename RUST-1488 to cover the bug of the Debug implementation leaking the unset, pub(crate) inserted_ids map, and instead have this PR be titled RUST-260. Wdyt?

@@ -1235,6 +1236,10 @@ where

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the event that one of the batches fails with a non-BulkWriteError, we just return early without continuing it looks like. I think this is probably inconsistent with the docstring of unordered writes ("If false, when a write fails, continue with the remaining writes, if any), but besides that, it also means we discard our cumulative idea of what has been inserted thus far. I wonder if we would instead want to add a Vec<Error> to BulkWriteFailure to account for the other errors that might occur during a multi-batch write in order to preserve the inserted_ids map. We have something similar in Swift, though I think it's only a single error instead of an array of them.

This may get kind of tricky when deciding which errors we want to continue on with and which constitute a "terminal" error. For example, I imagine an auth error would not be something we'd want to continue attempting to insert after, but a network error would be (e.g. so that we could get a different mongos for the next batch). Maybe its worth deferring to another ticket? Or maybe we keep the existing behavior?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qm3ster
Copy link

qm3ster commented Jun 28, 2024

Being as ignorant and disorganized as I am, I have reopened this in #1157 and attempted an implementation in #1158.
I am currently using that from my private fork where BulkWriteFailure.inserted_ids is pub.
If anyone would like to look over the code and give it a general go-ahead I'd be happy to write (steal from this PR) some tests.

But if this is a whole process, (I thought it may not be since it's not a breaking change) I believe it would be a lot more ergonomic to start breaking application "errors" out of the general error.
Examples would range from Result<InsertManyResult{ids: Vec<Result<Id, Err>>}, mongodb::Error>
to even Result<Result<InsertOneResult, InsertOneFailure>, mongodb::Error>

But that sounds like 4.0.

@isabelatkinson
Copy link
Contributor

@qm3ster yeah that would be quite a major overhaul of our error API. 3.0 is very fresh so breaking changes will likely not happen again for a while.

Also worth noting that the new bulk write API does report inserted IDs in its error type; BulkWriteError contains a partial result which, if verbose_results was configured on the call to bulk_write, will record an InsertResult for any successful inserts that happened before the error occurred. e.g.:

let write_models = <InsertOneModel for each insert document>;
let bulk_write_result = client.bulk_write(write_models).verbose_results().await;
let insert_results = match bulk_write_result {
    Ok(result) => Some(result.insert_results),
    Err(error) => {
        if let ErrorKind::BulkWrite(bulk_write_error) = *error.kind {
            match bulk_write_error.partial_result {
                Some(PartialBulkWriteResult::Verbose(result)) => Some(result.insert_results),
                _ => None,
            }
        } else {
            None
        }
    }
}; // returns Option<HashMap<usize, InsertOneResult>>

(The major caveat here is that bulk write will not be available for use in production until MongoDB 8.0 is released, but once that happens this will be a way to inspected inserted IDs upon error without requiring changes to the driver.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RUST-1488 BulkWriteFailure does not contain inserted_ids when insert_many fails
5 participants