-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse error falls back to Invalid #1312
Conversation
rust/sbp/src/de.rs
Outdated
)?) | ||
let sender_id = self.sender_id(); | ||
let mut payload = self.payload(); | ||
let msg = Sbp::from_parts(self.msg_type(), sender_id, payload).or_else(|_| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would mask malformed messages though? Like if the message type is a known message but the payload is too short. Then you'd have an Unknown
message with a message type that corresponds to a known message which is a little odd. But maybe that's what we want, I guess it's a trade off between being lenient and surfacing errors better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea... think we're just treating Unknown as the Frame
at this point to serialize as json :/
although that makes sense for the purpose of the Unknown message i guess.
maybe a better compromise would to just add a flag like strict
to address dropping payload errors?
or maybe a 'from payload' like 'from fields' for json
issue with the raw decoder or persisting broken data is where do we actually trim the unnecessary data - which thought everything would propagate up to sbp converters and they would drop those - or maybe we should just never drop them 😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Decoder
should only output valid SBP (as in CRC is good and the payload is correct), if a user wants access to potentially invalid messages they can go with the Framer
I think. So for example sbp2json
should probably be doing something like
for frame in Framer::new(rdr) {
if let Ok(msg) = frame.to_sbp() {
// valid crc/payload, possibly an Unknown message
json_encoder.write(&msg)?;
} else {
// invalid crc or payload, could also be an Unknown message
// need some json representation of a frame
json_encoder.write(&frame)?;
}
}
rust/sbp2json/src/lib.rs
Outdated
@@ -78,7 +79,13 @@ where | |||
W: Write, | |||
F: Formatter + Clone, | |||
{ | |||
let source = maybe_fatal_errors(sbp::iter_messages(input), fatal_errors); | |||
let source = maybe_fatal_errors(sbp::iter_frames(input), fatal_errors).map(|frame| { | |||
frame.to_sbp().unwrap_or(Sbp::Unknown(Unknown { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably implement Serialize/Deserialize for Frame instead of using Unknown for the case when the frame is invalid so we can distinguish between an Unknown message type and in invalid message. It should look the same (as in serialize/deserialize to a struct with msg_type, sender_id and payload fields) but then you can tell whether or not its an issue with an outdated version of libsbp or a malformed/corrupted message.
Also you'll need to handle the other direction for this is json2sbp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although I guess we are using an untagged enum so you wouldn't be able to tell the difference between a Frame and an Unknown hmmm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmmm, is unknown actually used for anything tho
could just use unknown for the types of msg that we dont known about
this would make it so we dont need to do anything to json2sbp since the msgs they produce would just be kinda the payload except wrapped with unknown 🤔
unless we want exact raw corrupt packet without wrapping anything then might be a bit more complicated than this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@notoriaga i.e. would it be ok to simply wrap around as an unknown message (since these msgs are indeed 'unknown') or do we want some special form of deserialization in JSON for corrupt messages in addition to the sbp?
i.e.
{
"msg_type": "...",
"payload": "...",
"sender_id": "...."
},
{
"corrupt_data": "..."
}
or simply just handling corrupt_data inside Unknown packets like
{
"msg_type": "0",
"sender_id": "0",
"payload": "the corrupt msg"
}
which would then address going from json2sbp - but would serialize as an unknown msg from there onwards which means
- it would be handled as a legit unknown message (not sure when unknowns are used)
- it would be serialized as a real message, (no modification needed on the sbp2json side - downside is it would be serialized as unknown)
^ tradeoffs without introducing an additional way to serialize corrupt data as JSON format but it would make some sense since these messages are labelled as unknown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also add a msg_name field to the json representation of messages like with rtcm2json so then we'd have
{
"msg_type": 123,
"msg_name": UNKNOWN,
"sender_id": 4,
"payload": [...]
}
{
"msg_type": 321,
"msg_name": FRAME,
"sender_id": 4,
"payload": [...]
}
could also be nice to have for jq stuff, the message names are bit easier to remember than the ids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think the message names are alread yhtere for sbp2json, making this field necessary would mean json2sbp only uses this field to check specific case where msg_name == "FRAME"
🤔 not sure if theres any better way or...
^^ this also seems much more complicated / invovled tho #1317 WIP
hmmmm....
Changes I pushed make the JSON output "look" better for random garbage data:
But tests are having issues and the inverse
|
Converted to draft to remove from review list until we can make more progress on this. |
@silverjam I'm picking this up now, but i don't know what you mean by inverse
|
I think I just meant that |
The minimum set of changes to get the parse unknown falllback tests to pass, but I'm not 100% sure it solves the desired problem
… into adrian/unknown-fallback
Looks like we might need to bump the compiler version and MSRV that we're using or remove these let-else blocks:
|
I am very close to getting this within spec, it just seems like a CRC is added somewhere, e.g this is what {"msg_name":"INVALID","payload":"VQoCw5UibB1EFL/0ELSA4EJAzKTjmU2eXsCYqanZWkCiAAwBDwl23lUO","crc":3669} I'm not sure where that extra crc is coming from. I also removed a few other places where panics could happen by replacing |
Seeing some unexpected results when just using garbage data, these might be "expected" given the current parsing behavior, but not what I initially expected would happen (given the intent of this PR). If I test with the
However, if I start by creating a bit of garbage data:
Contents of the corrupt blob:
Then attempt to decode that I don't get an invalid message:
Putting this data at the start and end yields similar results:
Similar results for the middle:
If I use a corrupt blob that happens to start with an SBP preamble byte I get better results. But only in certain cases. To create the bad blob, you can run
This should yield:
As the only chunk of data we don't get anything back:
But in the start and middle test cases we do get invalid messages back:
We don't get anything if the bad data is at the end though:
Ultimately I think we want these test case to pass, but we might be getting into the weeds of how the SBP parser currently works, so we'll probably want to fix these issues with a new PR. The above blobs are attached here: corrupt_blobs.zip |
There is something else, where data aren't round tripping from sbp -> json -> sbp properly, even when it seems to create invalid messages. This PR is at the point now where it needs unit tests on this behavior, but it is so large I am going to do a stacked PR to it, because it is getting too unwieldy. I am hopeful that with unit tests these bugs will be easy to fix. |
As long as we can still round trip valid data I'm not sure I want to invest more time on this right now, once we get the new tests in let's think about moving on. We've made a lot of good progress but this is starting to be a bigger time sink than I expected. |
This PR adds a unit test showing that parsing errors fallsback to an invalid message. Seeing this unit test makes it clear why some of the questions @silverjam was having about invoking this on certain inputs causes it to pass and others cause it to fail. Additionally the round-tripping on arbitrary bytes is not possible yet because both the iter_frame & iter_messages will only return a message if there is something that starts with the prelude byte, and it returns the bytes that exist between that prelude and HEADER_LEN, whatever the length of the payload is on byte 5 + the CRC_LEN. This means that if there are extra bytes sent in the stream for aliasing for instance, the framer ignores those bytes, and they are never even tried to be parsed. @silverjam gave two examples that do not throw an error as the unknown fallback is currently implemented. One of those has been added as a unit test here. But the main takeaway is the invalid-fallback as currently written only works on messages that are properly framed but perhaps wrongly labeled or with a CRC error.
rust/sbp/src/json/ser.rs
Outdated
@@ -19,7 +20,7 @@ const BASE64_BUFLEN: usize = BUFLEN * 4; | |||
pub fn to_writer<W, M>(mut writer: W, msg: &M) -> Result<(), JsonError> | |||
where | |||
W: io::Write, | |||
M: SbpMessage + Serialize, | |||
M: SbpMessage + Serialize + WireFormat + Clone, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think WireFormat was introduced in all of these methods because it was removed form SbpMessage? Are we sure that all of these methods really need WireFormat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to crate::ser::to_buffer
on line 239 requires WireFormat
, which then has to be propagated up to pretty much all of the functions in json::ser
. I took out WireFormat
from the SbpMessage
trait in an abandoned refactoring, so if you prefer i can re-add it.
Finally regarding benchmarks, it seems like the rust version is too fast and I'll need to bump some of the expected values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took out WireFormat from the SbpMessage trait in an abandoned refactoring, so if you prefer i can re-add it.
Let's re-add it if it's no longer needed
@silverjam it passes now! 🎉 |
@adrian-kong FYI, after this PR is merged, the next release cut of libsbp should be |
I'm just waiting to merge to let other small changes in if we want to cut two releases. Otherwise I think it is g2g |
Two releases is a good idea, let's wait for Adrian to cut |
@pcrumley 4.17.0 was cut I think we're good to merge this |
Thanks for letting me push the button |
Just noticed this, this isn't true right? I think I saw some unit test updates. |
No, It has unit test from bytes to In terms of unit test coverage, I think it is OK but there should be one that shows an invalid JSON message is parsed properly. I will do that today. |
see this pr which adds more test coverage: #1352 |
# Description @swift-nav/devinfra This adds an integration test (with some unit tests inside of it) which test the round tripping of an invalid message from invalid sbp -> json with msg_name "INVALID" -> invalid sbp. Covers a gap in test coverage from an earlier PR: #1312 Please note, there are no non-test code changes here, so it passed as is.
Description
This PR makes it so that if at any point in the parsing of sbp messages the parsing fails, it has the option of falling back to an InvalidMessage which can then be written to the output. The way this PR acheives this goal is through some major breaking changes. The idea is that when errors are thrown throughout the process, we store all that local information in the constructed error. That way, choosing how to write the output can become as simple as matching on the thrown error and then basically writing
SbpMessage::from(error)
. This enables us to keep the previous behavior of skipping errors or returning on errors simply by refusing to handle the errors.sbp2json
,json2sbp
&json2json
now have a new optional arg--error_handler
which allows the following optionsskip
,return
orToInvalid
. that skip and return are the equivalent of the false and true passed to fatal-errors arg which as been removed.Care was taken to remove some panics in the parsing of SBP which could cause some types of invalid messages to be lost.
Finally this PR does not do things like try to recover from being passed in malformed json or given an I/O error. In case of those error, if the
ToInvalid
option is passed, the code will panic.Some issues with this PR --
Sbp::Invalid
add unit tests for recovering from parsing errors of sbp #1347)JIRA Reference
https://swift-nav.atlassian.net/browse/DEVINFRA-1220