Should datashape names/types be mutable/settable? #180

dan-coates · 2015-09-15T21:26:38Z

The current mechanism for changing anything in a datashape seems to be constructing a new string for dshape with the needed changes and creating a new datashape. Most of the attributes for important datashape objects like Datashape and Record are defined as properties which cannot be altered. One could try to directly modify the _parameters attributes, but that's a pretty ugly hack and _parameters is often a tuple of tuples, which is immutable anyway.

With fairly large datashapes where one needs to alter one name or type in a record, it would be much easier to have a method or other way to change the name/type in place rather than having to construct an entirely new dshape string to build a new datashape.

This feels like it should be doable, but I'm not sure if allowing Record names/types to be altered has some potential downsides from an architectural perspective that outweigh the benefits.

The text was updated successfully, but these errors were encountered:

llllllllll · 2015-09-15T21:35:21Z

I think that most things assume that the dshapes are immutable. maybe we could implement something like namedtuple._replace which returns a copy with the changed values.

Also to note, I don't use string formatting to change fields, for example, with a record, you can say:

od = OrderedDict(some_record.fields)
od[some_field] = int32
Record(od)

cpcloud · 2015-10-05T14:40:07Z

Mutability is very important for datashape because we depend on it being hashable in blaze.

I agree that a toplevel swap or replace(dshape, old_sub_dshape, new_sub_dshape) function would be useful

OTOH @octophat would an argument like typehints={'field_name': 'int64'} cover the use case you're thinking of?

dan-coates · 2015-10-06T14:30:05Z

Actually, the main place I've needed this so far is in changing the name of a field, rather than its type. I'm doing this to avoid reserved words as column names in Teradata. It's a very manual process right now and actually leading to some bugs as I don't think we always construct a new dshape appropriately (obviously this is on my crappy code and not datashape, but just pointing out how having to handle this manually can lead to bugs).

I think the typehints argument would work well for fields where you want to change the type and you know it ahead of time, but it wouldn't work for the name changing use case I have and also wouldn't work great if you want to do a discover, eyeball the datashape, then change something (it could work for that but would involve recreating the datashape which seems inefficient). Being able to modify a datashape in place or without rescanning the source data would be ideal.

cpcloud assigned cpcloud and unassigned cpcloud Oct 5, 2015

cpcloud added this to the 0.4.8 milestone Oct 5, 2015

cpcloud added the enhancement label Oct 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should datashape names/types be mutable/settable? #180

Should datashape names/types be mutable/settable? #180

dan-coates commented Sep 15, 2015

llllllllll commented Sep 15, 2015

cpcloud commented Oct 5, 2015

dan-coates commented Oct 6, 2015

Should datashape names/types be mutable/settable? #180

Should datashape names/types be mutable/settable? #180

Comments

dan-coates commented Sep 15, 2015

llllllllll commented Sep 15, 2015

cpcloud commented Oct 5, 2015

dan-coates commented Oct 6, 2015