Avoiding lossy type conversions during import #132
Replies: 5 comments 1 reply
-
@mathemancer i think the easiest solution here is to not suggest a number type if there are values that start with |
Beta Was this translation helpful? Give feedback.
-
I'm going to give a quick definition to make things (hopefully) clearer and more precise. By "lossy" type conversion we mean any type conversion where the map f that takes a value from one type to a new value of a different type is not injective. This means we can lose information, since more than one value from the original type can be mapped to the same value of the target type. An example is both The problem with solving these piecemeal is that we'll keep adding these pieces and the overall system will be difficult to maintain. I'd rather have a type conversion system that guarantees that we can't lose data on import, and that the user can freely experiment with changing types without worrying that they'll somehow lose data during the conversion process. Another lossy type conversion is string to time interval. We'll map multiple strings to the same interval: both of the strings '14 days' and '2 weeks' map to the same time interval. It's possible the user is fine with this, but it may be that they had those strings written differently for a reason, and we can't recover the original strings from the resulting interval. We've also discussed lossy conversions from string to Finally, I think the proper solution to this will be usable both at import and for Readily apparent solutions:
Of these options, the least bad one (to my eye) is (1). I'm not really a big fan of it, though. |
Beta Was this translation helpful? Give feedback.
-
As far as avoiding lossy conversions, I can't think of a way to do that which isn't piecemeal, and which doesn't require noticing when any given conversion has the potential to lose information. |
Beta Was this translation helpful? Give feedback.
-
Option 5:
This would mean more data storage than (1) over time, but be even faster than (3) or (4). Very conceptually simple, quick undo. The main irritations would be around reflecting the DB, since we'd need to somehow mark the columns as "do not show" |
Beta Was this translation helpful? Give feedback.
-
All of these ideas so far fail if data is added to the data table with the new type. Given that there isn't a proper reverse map for a given type conversion (e.g., string to numeric), what should be done if the user wants to change back to string? Should zeroes be prepended? |
Beta Was this translation helpful? Give feedback.
-
Branching off from our discussions from Matrix.
Came across a particular scenario today, which is relevant to our discussions on type inference.
There is a column called upc, which holds barcode information. This is a string column.
It has several values that start with 0, such as 012839291012, 012875939012 etc.,
When we infer types on this column, we would most probably detect it as numeric, which will change the values to 12839291012 and 12875939012 (Removes the initial 0)
Now, even if the user changes the type back to string, the initial 0 is lost, creating invalid data.
Are we handling such scenarios currently? If not, this is something we need to take into account.
We need not have to worry about it if it's a user action (undo would handle it). But during import, we need to ensure there is no lossy conversion.
Beta Was this translation helpful? Give feedback.
All reactions