Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlalchemy todict intbitset conversion fails on infinite sets #16

Open
dset0x opened this issue Sep 30, 2015 · 12 comments
Open

sqlalchemy todict intbitset conversion fails on infinite sets #16

dset0x opened this issue Sep 30, 2015 · 12 comments

Comments

@dset0x
Copy link
Contributor

dset0x commented Sep 30, 2015

Given an infinite intbitset,

intbitset(trailing_bits=True)

and this SQLAlchemy type,

class IntBitSetType(types.TypeDecorator):

    impl = types.BLOB

    def process_bind_param(self, value, dialect):
        if value is None:
            return None
        return intbitset(value).fastdump()

    def process_result_value(self, value, dialect):
        if value is None:
            return None
        return intbitset(value)

calling dict() on an instance that contains it, raises this exception:

...
File "/home/gs-sis/.virtualenvs/invenio/src/invenio-checker/invenio_checker/views/admin.py", line 201, in get_tasks_data
 rule_d = dict(rule)
   File "/home/gs-sis/.virtualenvs/invenio/src/invenio/invenio/ext/sqlalchemy/utils.py", line 116, in todict
     value = value.tolist()
   File "intbitset.pyx", line 770, in intbitset.intbitset.tolist (intbitset/intbitset.c:12711)
   File "intbitset.pyx", line 775, in intbitset.intbitset.tolist (intbitset/intbitset.c:12641)
OverflowError: It's impossible to retrieve a list of an infinite set

intbitset behaves reasonably here, but this conversion was not asked for. Why are these conversions forced in todict?

@jirikuncar jirikuncar added this to the v0.2.2 milestone Sep 30, 2015
@jirikuncar
Copy link
Contributor

intbitset(trailing_bits=True) is an infinite structure that can't be converted to list. IMHO this is r_wontfix issue.

@dset0x
Copy link
Contributor Author

dset0x commented Sep 30, 2015

The question here is: why we are converting in todict? All I requested in my code was a dictionary of my entry.

@jirikuncar
Copy link
Contributor

@dset0x you have answered yourself - because you are requesting a dict.

@jirikuncar
Copy link
Contributor

@dset0x
Copy link
Contributor Author

dset0x commented Sep 30, 2015

@dset0x you have answered yourself - because you are requesting a dict.

A dictionary can hold this object just like any other:

In [75942]: {'infinity': intbitset(trailing_bits=True)}
Out[75942]: {'infinity': intbitset([0], trailing_bits=True)}

see https://github.com/inveniosoftware/invenio-ext/blob/master/invenio_ext/sqlalchemy/utils.py#L133-L135

Iterables should too.

@jirikuncar
Copy link
Contributor

intbitset has it's own handling because we were not able to create a JSON out of it. Maybe it's time to use better JSON encoder/decorer that can handle it. See also https://github.com/inveniosoftware/invenio-ext/blob/master/invenio_ext/sqlalchemy/utils.py#L115-L116

@jirikuncar
Copy link
Contributor

There is still a question: Why do you need to store an intbitset with trailings_bit=True in database?

This structure does not have any Python equivalent except of an infinite generator.

@dset0x
Copy link
Contributor Author

dset0x commented Sep 30, 2015

There is still a question: Why do you need to store an intbitset with trailings_bit=True in database?

I wish to express infinity, therefore I store a dump of an expression of that (in the same format that I store non-infinity), instead of another magic value. This keeps my TypeDecorator above clean. I only afterwards interpret what infinity means for my logic. I do not necessarily iterate over it.

intbitset has it's own handling because we were not able to create a JSON out of it.

I don't understand what JSON has to do with a method that is called when __builtin__.dict is called. I do not wish to serialize my data. Perhaps this was added assuming that the only reason to call __builtin__.dict was to serialize?

@jirikuncar
Copy link
Contributor

(a) use the model directly
(b) serialize only fields you really need
(c) fix the serialization and check all dependent code

@dset0x
Copy link
Contributor Author

dset0x commented Sep 30, 2015

(a) use the model directly

I wanted a dictionary of the entire object, like SQLAlchemy normally hands over.

(b) serialize only fields you really need

Again, I am not serializing anything (But yes, I want this field in my dictionary).

(c) fix the serialization and check all dependent code

Since there is no reason to do these conversions, then this is the solution that will prevent future pain. I suspect however, that removing this code will break edge cases in all kinds of hard-to-find code, especially since __iter__ was overriden as well.

The use-cases are not obvious to me even in inveniosoftware/invenio@1ee2c0c, where this code seems to originate.

Perhaps someone can come up with a better plan to revert this.

kaplun referenced this issue in inveniosoftware/invenio Sep 30, 2015
* WebMessage: First module ported to Flask/Jinja2/WTForms/SQLAlchemy.
  Introduced tests for database independency using Flask-Test and
  SQLAlchemy. It includes example of dynamic menu loading latest
  messages and showing unread message counter.

* WebSearch: Initial implementation of new search UI using facets and
  search box autocompletion for several fields. The implementation is
  still using old Invenio search API and record formating. New AJAX tabs
  were introduced for record pages. They are using new javascript
  history API and there is a correct fallback link when the javascript
  is turned off.

* WebComment: Creates demo comments tree and review overview for record
  tabs.

* WebAccess: Reimplements some SQL queries in SQLAlchemy.

* WebStyle: Creates unified general page header and footer. There is
  also new page look and feel using Twitter-Bootstrap user interface.

* WebSession: Implements simple login method using Flask.
@kaplun
Copy link
Contributor

kaplun commented Oct 1, 2015

@jirikuncar indeed re-reading all this issue and the referred original code, seems like that the .todict() method was indeed trying to do too much. Looks like was more a .tojsondict() i.e. returning something that is safely serializable to JSON. However as @dset0x points out, the .todict() method is called indirectly also in contexts where JSON serialization is not needed, and where a real instance of an intbitset is perfectly OK.

So in your opinion can .todict() be amended to not be too smart and simply return a dictionary with values not converted/serialized?

@jirikuncar
Copy link
Contributor

We can amend it as anyway the intbitset will not be necessary for core packages anymore.

@jirikuncar jirikuncar modified the milestones: v0.2.2, v0.3.0, v0.4.0 Oct 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants