Allow `qsv diff` to show only fields that differ #2000

mfripp · 2024-07-26T23:30:23Z

Is your feature request related to a problem? Please describe.
In csv files with many columns, it can be difficult and unreliable to find the particular fields that differ between dropped and added rows. This requires carefully scanning across the output, using a grid-oriented csv viewer.

Describe the solution you'd like
One possible solution would be to add a --drop-identical-fields flag (or something similar), which will cause identical fields between a "-" and "+" row to be replaced with either empty values or a flag like "(same)". Then, before outputting the results, any columns that don't have any changes (i.e., the column is entirely full of empty fields or "(same)" markers) will be dropped. So the output file will only contain the key columns and any data columns that actually have differences, and even in those, it will only show values when there are differences. This will make it easy to see exactly what data is different between the two files.

Describe alternatives you've considered
One alternative is to open the result in a spreadsheet and add flags to indicate where differences occur, but this is cumbersome. Currently I just scan visually across pairs of rows, but this is also cumbersome and error prone.

Another option might be to output a sort of "patch" format, with one row per different field. This could be a table where the first n fields are the index values, the next field is called "column" and gets the name of the field that differed, the next field is called "left_value" and has the value of this field from the left file, and the final field is called "right_value" and has the value from the right file. That might be clearer (no risk of conflict with existing empty fields or fields that already say "(same)"), but I'm not sure it's better.

Another option that might be better would be to use color to highlight the columns that are actually different, at least when output is sent to a TTY. This would be similar to the display in the GNU version of the diff command, VS Code's diff view, Apple's FileMerge viewer or vim -d file1 file2.

Additional context
(none)

The text was updated successfully, but these errors were encountered:

jqnatividad · 2024-07-27T02:53:55Z

Thanks for the well thought-out feature request @mfripp !

Copying in @janriemer - csv-diff's maintainer...

janriemer · 2024-07-27T10:14:20Z

Thank you, @mfripp, for the detailed description and thoughts on this feature (and @jqnatividad for making me aware of it)!
I really like the possible solution you've described and I feel like this should have highest priority regarding next features of diff command.

@jqnatividad Can you please assign this issue to me. Thank you.

The possibility of getting the fields that are different is actually already in the implementation of diff - it is just not used yet (waiting on a feature request like yours 😉):

qsv/src/cmd/diff.rs

Lines 245 to 251 in 08cfda6

    
           DiffByteRecord::Modify { 
        
               delete, 
        
               add, 
        
               // TODO: this should be used in the future to highlight the column where differences 
        
               // occur 
        
               field_indices: _field_indices, 
        
           } => {

So it shouldn't be too difficult to implement your idea (famous last words?). 🙂

Unfortunately, I'm a bit busy lately, so didn't have the time currently.😢
However, mid/end August should be more time, so I can start implementing a prototype then. 🤞

With regard to your alternative solutions

Regarding patch: there is already an open issue in csv-diff itself (the lib that powers diff) describing the need to have git-diff-style format - is this roughly what you have in mind?
Also we might want to take inspiration from the CLI tool csvdiff and their different output formats

This implements a new flag for the command `diff`. When activated, it drops the values of fields that are equal within a row of type `Modified` and replaces them with the empty string (an empty byte slice to be precise). For now, the value for replacing equal values is not configurable, but should be trivial to add in the future. Note that key field values are _not_ dropped and always appear in the output. Example: csv_left.csv col1,col2,col3 1,foo,bar csv_right.csv col1,col2,col3 1,foo,baz qsv diff --drop-equal-fields csv_left.csv csv_right.csv Output: diffresult;col1;col2;col3 -;1,,bar +;1,,baz See jqnatividad#2000

This implements a new flag for the command `diff`. When activated, it drops the values of fields that are equal within a row of type `Modified` and replaces them with the empty string (an empty byte slice to be precise). For now, the value for replacing equal values is not configurable, but should be trivial to add in the future. Note that key field values are _not_ dropped and always appear in the output. Example: csv_left.csv col1,col2,col3 1,foo,bar csv_right.csv col1,col2,col3 1,foo,baz qsv diff --drop-equal-fields csv_left.csv csv_right.csv Output: diffresult,col1,col2,col3 -,1,,bar +,1,,baz See jqnatividad#2000

janriemer · 2024-09-08T14:49:30Z

Hey @jqnatividad @mfripp 👋

here is the current status of the feature requests in this issue

🎉 Add a flag for dropping equal values (diff: add flag --drop-equal-fields #2114)
⏳ Do not output columns, which don't have different field values
- this will require a change in csv-diff itself, because it is too costly (performance-wise) to implement it directly in diff command

For the other feature requests it is probably best to create separate issues for them, so that we don't lose the overview.

mfripp · 2024-09-08T14:55:42Z

Thanks, this is great to see!

jqnatividad · 2024-09-08T15:50:51Z

Just merged #2114 ... just in time for qsv 0.134.0! Thanks @janriemer !

jqnatividad added the enhancement New feature or request label Jul 27, 2024

jqnatividad assigned janriemer Jul 27, 2024

janriemer mentioned this issue Sep 8, 2024

diff: add flag --drop-equal-fields #2114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow `qsv diff` to show only fields that differ #2000

Allow `qsv diff` to show only fields that differ #2000

mfripp commented Jul 26, 2024 •

edited

Loading

jqnatividad commented Jul 27, 2024

janriemer commented Jul 27, 2024

janriemer commented Sep 8, 2024 •

edited

Loading

mfripp commented Sep 8, 2024

jqnatividad commented Sep 8, 2024

Allow qsv diff to show only fields that differ #2000

Allow qsv diff to show only fields that differ #2000

Comments

mfripp commented Jul 26, 2024 • edited Loading

jqnatividad commented Jul 27, 2024

janriemer commented Jul 27, 2024

With regard to your alternative solutions

janriemer commented Sep 8, 2024 • edited Loading

mfripp commented Sep 8, 2024

jqnatividad commented Sep 8, 2024

Allow `qsv diff` to show only fields that differ #2000

Allow `qsv diff` to show only fields that differ #2000

mfripp commented Jul 26, 2024 •

edited

Loading

janriemer commented Sep 8, 2024 •

edited

Loading