-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CDAP-16739] Added hard limit for Avro data file, proper error messag… #406
base: develop
Are you sure you want to change the base?
Conversation
…e redirecting user to use file source connector with format avro. Fixed parse-as-json to highlight error as well as handling integers correctly now
String msg = e.getCause() != null ? e.getCause().getMessage() : e.getMessage(); | ||
// In case there are more rows being handled, we attempt to surface the Json that is | ||
// causing an issue to make it easier for users to identify the problem quickly. | ||
if (rows.size() > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't rows always size 1? Otherwise throwing an ErrorRowException would incorrectly mark every Row as an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not always. If we parsing AvroFile. Parse-as-avro-file will generate records > 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but the RecipeExecutor will break that into multiple lists of size 1. Either this is always size 1, or the entire ErrorRowException contract is wrong.
wrangler-core/src/main/java/io/cdap/directives/parser/ParseAvroFile.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/directives/parser/ParseAvroFile.java
Outdated
Show resolved
Hide resolved
wrangler-core/src/main/java/io/cdap/directives/parser/ParseAvroFile.java
Outdated
Show resolved
Hide resolved
@@ -52,6 +52,40 @@ public void testParseAsAvroFile() throws Exception { | |||
Assert.assertEquals(1495194308245L, results.get(1688).getValue("timestamp")); | |||
} | |||
|
|||
@Test | |||
public void testAvroWithJsonFieldParsingWithParseJsonIgnoreError() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these new tests are testing two directives. More specifically, these are not testing any change in behavior in the avro directive, they are just testing the changes in json directive.
Unit tests should be testing isolated components when possible. It would be better to enhance the parse json test. For example, if we remove the parse avro file directive, we could easily remove test coverage of the parse json directive accidentally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a problem customer had. We have other tests for catching parse-as-json problems. This was when a json in avro file have. Using redacted user data as test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this is a unit test, meaning it should be testing very specific parts of the code. With the ignore-error changed introduced, it should be testing that the json directive returns empty results when given bad data and ignore-error is true.
rows.add(new Row("body", data)); | ||
|
||
List<Row> results = TestingRig.execute(directives, rows); | ||
Assert.assertEquals(6693, results.size()); // total 6670, 7 records are bad. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these types of asserts are not specific enough. For example, how can we be sure the right rows are filtered? It would be better to just send a single row in as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using customer test data to make sure we don't break it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't make sure of anything besides the size
@albertshau PTAL |
@albertshau PTAL |
…e redirecting user to use file source connector with format avro. Fixed parse-as-json to highlight error as well as handling integers correctly now