Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dataframe incorrectly parse CSV when renameDuplicatedColumns is true #7242

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/Microsoft.Data.Analysis/DataFrame.IO.cs
Original file line number Diff line number Diff line change
Expand Up @@ -388,7 +388,8 @@ private static DataFrame ReadCsvLinesIntoDataFrame(WrappedStreamReaderOrStringRe
// First pass: schema and number of rows.
while ((fields = parser.ReadFields()) != null)
{
if (renameDuplicatedColumns)
//Only first row contains column names
if (renameDuplicatedColumns && rowline == 0)
{
var names = new Dictionary<string, int>();

Expand Down
19 changes: 19 additions & 0 deletions test/Microsoft.Data.Analysis.Tests/DataFrame.IOTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,25 @@ private static Stream GetStream(string streamData)
return new MemoryStream(Encoding.Default.GetBytes(streamData));
}

[Fact]
public void TestReadCsvWithHeaderCultureInfoAndColumnTypeAutoGuess()
{
//see https://github.com/dotnet/machinelearning/issues/7240

CultureInfo.CurrentCulture = CultureInfo.InvariantCulture; // or en-US

string csv =
@"""Col1"",""Col2"",""Col3"",""Col4""
""v1.1"",""5/7/2017"",""v3.1"",""v4.1""
"""","""",""v3.2"",""v4.2""
";

var dataFrame = DataFrame.LoadCsvFromString(csv, separator: ',', header: true,
dataTypes: null, // guess the column types
renameDuplicatedColumns: true, // try to rename the duplicated columns, if any
cultureInfo: CultureInfo.InvariantCulture);
}

[Theory]
[InlineData(false)]
[InlineData(true)]
Expand Down
Loading