-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow qsv diff
to show only fields that differ
#2000
Comments
Thanks for the well thought-out feature request @mfripp ! Copying in @janriemer - csv-diff's maintainer... |
Thank you, @mfripp, for the detailed description and thoughts on this feature (and @jqnatividad for making me aware of it)! @jqnatividad Can you please assign this issue to me. Thank you. The possibility of getting the fields that are different is actually already in the implementation of Lines 245 to 251 in 08cfda6
So it shouldn't be too difficult to implement your idea (famous last words?). 🙂 Unfortunately, I'm a bit busy lately, so didn't have the time currently.😢 With regard to your alternative solutions
|
This implements a new flag for the command `diff`. When activated, it drops the values of fields that are equal within a row of type `Modified` and replaces them with the empty string (an empty byte slice to be precise). For now, the value for replacing equal values is not configurable, but should be trivial to add in the future. Note that key field values are _not_ dropped and always appear in the output. Example: csv_left.csv col1,col2,col3 1,foo,bar csv_right.csv col1,col2,col3 1,foo,baz qsv diff --drop-equal-fields csv_left.csv csv_right.csv Output: diffresult;col1;col2;col3 -;1,,bar +;1,,baz See jqnatividad#2000
This implements a new flag for the command `diff`. When activated, it drops the values of fields that are equal within a row of type `Modified` and replaces them with the empty string (an empty byte slice to be precise). For now, the value for replacing equal values is not configurable, but should be trivial to add in the future. Note that key field values are _not_ dropped and always appear in the output. Example: csv_left.csv col1,col2,col3 1,foo,bar csv_right.csv col1,col2,col3 1,foo,baz qsv diff --drop-equal-fields csv_left.csv csv_right.csv Output: diffresult,col1,col2,col3 -,1,,bar +,1,,baz See jqnatividad#2000
Hey @jqnatividad @mfripp 👋 here is the current status of the feature requests in this issue
For the other feature requests it is probably best to create separate issues for them, so that we don't lose the overview. |
Thanks, this is great to see! |
Just merged #2114 ... just in time for qsv 0.134.0! Thanks @janriemer ! |
Is your feature request related to a problem? Please describe.
In csv files with many columns, it can be difficult and unreliable to find the particular fields that differ between dropped and added rows. This requires carefully scanning across the output, using a grid-oriented csv viewer.
Describe the solution you'd like
One possible solution would be to add a
--drop-identical-fields
flag (or something similar), which will cause identical fields between a "-" and "+" row to be replaced with either empty values or a flag like "(same)". Then, before outputting the results, any columns that don't have any changes (i.e., the column is entirely full of empty fields or "(same)" markers) will be dropped. So the output file will only contain the key columns and any data columns that actually have differences, and even in those, it will only show values when there are differences. This will make it easy to see exactly what data is different between the two files.Describe alternatives you've considered
One alternative is to open the result in a spreadsheet and add flags to indicate where differences occur, but this is cumbersome. Currently I just scan visually across pairs of rows, but this is also cumbersome and error prone.
Another option might be to output a sort of "patch" format, with one row per different field. This could be a table where the first n fields are the index values, the next field is called "column" and gets the name of the field that differed, the next field is called "left_value" and has the value of this field from the left file, and the final field is called "right_value" and has the value from the right file. That might be clearer (no risk of conflict with existing empty fields or fields that already say "(same)"), but I'm not sure it's better.
Another option that might be better would be to use color to highlight the columns that are actually different, at least when output is sent to a TTY. This would be similar to the display in the GNU version of the
diff
command, VS Code's diff view, Apple's FileMerge viewer orvim -d file1 file2
.Additional context
(none)
The text was updated successfully, but these errors were encountered: