You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current mechanism for changing anything in a datashape seems to be constructing a new string for dshape with the needed changes and creating a new datashape. Most of the attributes for important datashape objects like Datashape and Record are defined as properties which cannot be altered. One could try to directly modify the _parameters attributes, but that's a pretty ugly hack and _parameters is often a tuple of tuples, which is immutable anyway.
With fairly large datashapes where one needs to alter one name or type in a record, it would be much easier to have a method or other way to change the name/type in place rather than having to construct an entirely new dshape string to build a new datashape.
This feels like it should be doable, but I'm not sure if allowing Record names/types to be altered has some potential downsides from an architectural perspective that outweigh the benefits.
The text was updated successfully, but these errors were encountered:
I think that most things assume that the dshapes are immutable. maybe we could implement something like namedtuple._replace which returns a copy with the changed values.
Also to note, I don't use string formatting to change fields, for example, with a record, you can say:
Actually, the main place I've needed this so far is in changing the name of a field, rather than its type. I'm doing this to avoid reserved words as column names in Teradata. It's a very manual process right now and actually leading to some bugs as I don't think we always construct a new dshape appropriately (obviously this is on my crappy code and not datashape, but just pointing out how having to handle this manually can lead to bugs).
I think the typehints argument would work well for fields where you want to change the type and you know it ahead of time, but it wouldn't work for the name changing use case I have and also wouldn't work great if you want to do a discover, eyeball the datashape, then change something (it could work for that but would involve recreating the datashape which seems inefficient). Being able to modify a datashape in place or without rescanning the source data would be ideal.
The current mechanism for changing anything in a datashape seems to be constructing a new string for
dshape
with the needed changes and creating a new datashape. Most of the attributes for important datashape objects like Datashape and Record are defined as properties which cannot be altered. One could try to directly modify the_parameters
attributes, but that's a pretty ugly hack and_parameters
is often a tuple of tuples, which is immutable anyway.With fairly large datashapes where one needs to alter one name or type in a record, it would be much easier to have a method or other way to change the name/type in place rather than having to construct an entirely new
dshape
string to build a new datashape.This feels like it should be doable, but I'm not sure if allowing Record names/types to be altered has some potential downsides from an architectural perspective that outweigh the benefits.
The text was updated successfully, but these errors were encountered: