Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a frozen dictionary type. #22166

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

matts1
Copy link
Contributor

@matts1 matts1 commented Apr 29, 2024

This type is hashable, and thus can be used in a depset.

I'm trying to solve the following use case, which is very common when creating tree-like structures.

child = FooInfo(
    something = {"foo": "bar"},
    children = depset([])
)
parent = FooInfo(
    something = {},
    children = depset([child])
)

This type is hashable, and thus can be used in a depset.
@matts1
Copy link
Contributor Author

matts1 commented Apr 29, 2024

Note: I haven't added any test cases yet, both because I first want approval that it's the right thing to do before going to the effort of making test cases, and because I don't know where the right place is to put my test cases.

@matts1 matts1 marked this pull request as ready for review April 29, 2024 09:58
@github-actions github-actions bot added the awaiting-review PR is awaiting review from an assigned reviewer label Apr 29, 2024
@iancha1992 iancha1992 added the team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel label Apr 29, 2024
@tetromino
Copy link
Contributor

For your use case, can you instead use struct?

child = FooInfo(
    something = struct(foo = "bar"),
    children = depset([])
)
...

@matts1
Copy link
Contributor Author

matts1 commented May 1, 2024

Struct does indeed work for this specific use case. However:

  • Struct does not work when the keys are not strings
  • it makes it really awkward to use (especially until --incompatible_struct_has_no_methods is removed):
    • Iteration through all key-value pairs requires structs.to_dict
    • I'm unsure if struct iteration order is preserved - I would suspect not. Dict has a well-defined iteration order (insertion order)
    • Iteration through all keys is very awkward ([k for k in dir(struct) if k != "to_json" and k != "to_proto"])
    • It's very confusing for a user of such a type if they need to use getattr(d, key) instead of just d[key]. You have to explain to them the mental hoops they have to jump through, because it's very unintuitive for a user. This is generally ok if the type is hidden away in an abstraction that only you need to know about, but makes it very hard when you need to make a public API.

@tetromino
Copy link
Contributor

Interesting point. Then I guess the design dimensions here are:

  • do we need the frozenset() Starlark builtin method? (You provide a very good argument for yes)
  • is "frozenset" a good name for it? (I would say yes, given prior art: https://pypi.org/project/frozendict)
  • should this be a Bazel-only thing, or do we propose it as a language spec change? (I think if we agree to do this, it should be in the language spec)
  • should the value returned by frozenset() be deeply or shallowly frozen? (The name suggests deeply frozen, which may be confusing; shallowly frozen is much cheaper to construct, but much more expensive to check for hashability.)
  • should it be behind an incompatible flag?
  • should the type name or stringification of the object returned by frozenset() be distinguishable from dict in Starlark code?
  • in java, should it be implemented as a new subclass of Dict or Map, or a special way of constructing Dict?

@matts1
Copy link
Contributor Author

matts1 commented May 1, 2024

I'm a little confused by your use of the word frozenset here, while I used the word frozendict. I'm going to just assume that every time you wrote frozenset, you're referring to my frozendict type, since bazel doesn't have a set type.

Interesting point. Then I guess the design dimensions here are:

  • do we need the frozenset() Starlark builtin method? (You provide a very good argument for yes)
  • is "frozenset" a good name for it? (I would say yes, given prior art: https://pypi.org/project/frozendict)

I think frozendict is the right name for it. If you did intentionally write frozenset, then I think that just adds confusion, because a set is its own seperate type.

  • should this be a Bazel-only thing, or do we propose it as a language spec change? (I think if we agree to do this, it should be in the language spec)

I certainly wouldn't be opposed to a language spec change. I filed bazelbuild/starlark#272 a while back.

  • should the value returned by frozenset() be deeply or shallowly frozen? (The name suggests deeply frozen, which may be confusing; shallowly frozen is much cheaper to construct, but much more expensive to check for hashability.)

I'd personally be a fan of deeply frozen (moving the checkHashable check to the constructor), I just thought that might be contentious.

  • should it be behind an incompatible flag?

I don't think so. The only difference here is that we add a new builtin method, and that shouldn't be incompatible with anything. If I write frozendict = {"a": "b"}, that just works fine.

  • should the type name or stringification of the object returned by frozenset() be distinguishable from dict in Starlark code?

I'd personally go with no. I think that for the most part, a user doesn't really need to distinguish between a dict and a frozendict. This is, practically speaking, equivalent to what we currently have for a dict constructed by a different rule.

  • in java, should it be implemented as a new subclass of Dict or Map, or a special way of constructing Dict?

I originally added a field to the regular Dict type initially_frozen of type bool. I don't have a strong preference, but I figured that this would add extra memory consumption to every dict, while this current method would only change the vtables.

@tetromino
Copy link
Contributor

I'm a little confused by your use of the word frozenset here, while I used the word frozendict. I'm going to just assume that every time you wrote frozenset, you're referring to my frozendict type, since bazel doesn't have a set type.

Yes, typo, sorry! I was considering this feature in the context of a native set type which users have also been asking for for a while, and whether we'd need a frozenset in addition to a mutable set - and mistyped.

@comius
Copy link
Contributor

comius commented May 3, 2024

Hey @matts1,

do you care to explain more what you're doing, to avoid AB problem?

I've seen tree-like structures in the rules before and they were often an anti-pattern. That's because the targets already form a dag and often it's possible to use that without copying that structure in the providers. Once you copy the structure into providers, it's really painful to process since Starlark doesn't have recursion and it's hard to visit them. aspects can for example do much better job visiting dags. But there are some cases that are not well covered and complex providers seem necessary atm.

@matts1
Copy link
Contributor Author

matts1 commented May 4, 2024

I ran into this problem for rules_directory while trying to generate a tree for a directory within a single rule.

I had discussions in the starlark team chat last week. The tldr is that it seemed like this isn't especially high priority, and there would be better ways to do it (by resolving the todo to use checkhashable instead of isimmutable and fixing an existing bug with cyclic hashing crashing bazel)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants