You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
There is something wrong with the code that parses lists/arrays. If you have a column that is a list/array type, and you have rows where that column is either empty (i.e., 0 elements) or null, ParquetViewer shows the data mixed up across rows. Examples:
In all examples below, assume the following schema:
internal class TestRow
{
public string Column1 { get; set; }
public List<double> Column2 { get; set; }
public TestRow(string column1, List<double> column2)
{
Column1 = column1;
Column2 = column2;
}
}
Example 1: This has no nulls or empty values and works as expected:
List<TestRow> data1 = new List<TestRow>
{
new TestRow("Row 1", new List<double> { 1, 2, 3, 4, 5 }),
new TestRow("Row 2", new List<double> { 6, 7, 8, 9, 10 }),
new TestRow("Row 3", new List<double> { 11, 12, 13, 14, 15 })
};
ParquetSerializer.SerializeAsync(data1, @"sample1.parquet").Wait();
Example 2: This has an empty list in row 1 and results in scrambled data in rows 1-3
List<TestRow> data2 = new List<TestRow>
{
new TestRow("Row 1", new List<double>()),
new TestRow("Row 2", new List<double> { 6, 7, 8, 9, 10 }),
new TestRow("Row 3", new List<double> { 11, 12, 13, 14, 15 })
};
ParquetSerializer.SerializeAsync(data2, @"sample2.parquet").Wait();
Example 3: This has an empty list in row 2 and results in scrambled data in rows 2-3
List<TestRow> data3 = new List<TestRow>
{
new TestRow("Row 1", new List<double> { 1, 2, 3, 4, 5 }),
new TestRow("Row 2", new List<double>()),
new TestRow("Row 3", new List<double> { 11, 12, 13, 14, 15 })
};
ParquetSerializer.SerializeAsync(data3, @"sample3.parquet").Wait();
Parquet Viewer Version
2.10.1.1
Where was the parquet file created?
Parquet.NET
Description
There is something wrong with the code that parses lists/arrays. If you have a column that is a list/array type, and you have rows where that column is either empty (i.e., 0 elements) or null, ParquetViewer shows the data mixed up across rows. Examples:
In all examples below, assume the following schema:
Example 1: This has no nulls or empty values and works as expected:
Example 2: This has an empty list in row 1 and results in scrambled data in rows 1-3
Example 3: This has an empty list in row 2 and results in scrambled data in rows 2-3
Sample files
sample_parquets.zip
The text was updated successfully, but these errors were encountered: