Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV: Default values in missing cells (column present) #355

Open
hosswald opened this issue Nov 21, 2022 · 0 comments
Open

CSV: Default values in missing cells (column present) #355

hosswald opened this issue Nov 21, 2022 · 0 comments

Comments

@hosswald
Copy link

hosswald commented Nov 21, 2022

When trying to parse a CSV with missing cells (in present columns) into a Kotlin Data class with fields that have default values, I found (FasterXML/jackson-module-kotlin#605) that I needed to combine CsvParser.Feature.EMPTY_STRING_AS_NULL and KotlinFeature.NullIsSameAsDefault to achieve this.
Looking through the test cases, it looks to me as if the CSV handling of missing columns could be improved.
Quoting myself from the above mentioned ticket:

the usage of default parameter should be the default when parsing a CSV and encountering an empty cell, regardless of whether or not EMPTY_STRING_AS_NULL is used, in my opinion.

That is, because CSVs don't have explicit nulls (like JSONs do).
If I'm not mistaken, missing fields in JSONs are parsed in a way that default values on the respective field are used. So far so good.
However, in CSVs, there is a difference between missing columns (defaults are used) and missing cells in present columns (requires me to mix feature flags from CsvParser and Kotlin module to achieve this.
I'm not sure about the situation for Java/POJOs. Looking at the existing tests,

MissingColumnsTest::testDefaultMissingHandling() handles missing columns the way I would like missing cells (in a present column) to be handled, but NullReadTest::testEmptyStringAsNull330() ensures empty cells are handled as null even if there is a default value.

As a infrequent user, I find the different handling of missing/present cells/columns confusing. In my opinion, all three of the following assertions should succeed or there should be a single flag to make it so (something like EMPTY_AND_MISSING_AS_DEFAULT or so):

    static class PojoWithDefault {
        public Integer id;
        public String value = "default";
    }

    public void testDefault() throws Exception {
        CsvSchema headerSchema = CsvSchema.emptySchema().withHeader();

        PojoWithDefault missingColumn = MAPPER
                .readerFor(PojoWithDefault.class)
                .with(headerSchema)
                .<PojoWithDefault>readValues("id\n"
                                             + "1").next();
        Assert.assertEquals(missingColumn.value, "default"); //succeeds

        PojoWithDefault missingCellInPresentColumnWithoutComma = MAPPER
                .readerFor(PojoWithDefault.class)
                .with(headerSchema)
                .<PojoWithDefault>readValues("id,value\n"
                                    + "1").next();
        Assert.assertEquals(missingCellInPresentColumnWithoutComma.value, "default"); //succeeds

        PojoWithDefault missingCellInPresentColumnWithComma = MAPPER
                .readerFor(PojoWithDefault.class)
                .with(headerSchema)
                .<PojoWithDefault>readValues("id,value\n"
                                    + "1,").next();
        Assert.assertEquals(missingCellInPresentColumnWithComma.value, "default"); //fails
    }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant