Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added map-like and filter-like to sets. #1217

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Added map-like and filter-like to sets. #1217

wants to merge 4 commits into from

Conversation

ess476
Copy link

@ess476 ess476 commented Oct 4, 2017

This adds the features requested in #1179. It also fixes a spelling mistake in the in the sets testing code.

@jpolitz
Copy link
Member

jpolitz commented Nov 3, 2017

Thanks! These look like they'd work, but we have a few comments:

  1. Even though @shriram suggested these names, I don't really get them. What are they -like? I understand the motivation that there's semantic confusion with the properties of map (e.g. length might change) but I'm not sure I understand this new name. I'd prefer map plus documentation to this name.

  2. There should be a separate implementation for tree-sets and for list-sets. The current implementation is a bit of the worst of both worlds. It's quadratic on list-sets because of the re-insertion, where it could be linear just by mapping over the underlying elements and re-constructing a list-set directly. The tree-set implementation isn't asymptotically bad with this strategy, but if it were implemented with a fold over the elements (using, say fold-preorder), it would save many allocations and intermediate steps.

  3. Ideally, these should be implemented as methods first on the individual variants, and then the functions should just call those methods. This avoids having functionality duplicated in those two places, and gives the correct dynamic dispatch to the variants' methods.

@shriram
Copy link
Member

shriram commented Nov 3, 2017 via email

@ess476
Copy link
Author

ess476 commented Nov 4, 2017

Thanks for the feedback!

I'll try to implement the changes you suggested.

A quick question, with regard to point 2, were you saying that for list-sets map or filter (or both) can be linear?

If you're referring to map, since sets need to ensure that duplicates are never added how could any duplicates be eliminated in linear time?

Thanks,

-Ethan

@sorawee
Copy link
Contributor

sorawee commented Nov 4, 2017

Hmm. If we want to be efficient, for filter in list-set, I would write:

method filter(self, f) -> Set:
  list-set(self.to-list().filter(f))
end

.to-list() does not create a new list. It just returns an underlying list in the list-set back. .filter for list is more efficient than .fold in set since it's implement in JS. Then we use list-set which can be constructed in constant time. We can use list-set right away because of the precondition that .to-list() has unique elements, and .filter preserves that property.

For map in list-set, yes, the worst case is quadratic. Consider an input like this:

N = 100000
S = list-to-list-set(range(0, N))
fun f(x):
  if (x < (N / 2)) or (x == N - 1):
    0
  else:
    x
  end
end

T = S.map(f)

We would then have:

0, 1, 2, 3, ..., N/2 - 1, N/2, N/2 + 1, ..., N - 1, N
=maps=>
0, 0, 0, 0, ..., 0,         N/2, N/2 + 1, ..., N - 1, 0

There are N/2 of 0, and to remove duplicate 0, it needs to walk N/2 elements.

(Even though we use fold to flip the order of traversal or something like that, it is possible to construct a similar worst case).

@ess476
Copy link
Author

ess476 commented Nov 4, 2017

That makes a lot of sense. I guess I just didn't think about the speed of a filter implemented by fold vs the JS version.

Thanks for the help!

@ess476
Copy link
Author

ess476 commented Nov 4, 2017

Also if the JS implementations provided a significant speed up, I assume the original implementation of list-set map:

method map-like(self, f) -> Set:
  list-to-list-set(self.to-list().map(f))
end

would be faster than:

method map-like(self, f) -> Set:
  self.fold(lam(acc, x): acc.add(f(x)) end, list-set(empty))
end

@sorawee
Copy link
Contributor

sorawee commented Nov 4, 2017

The reason why my suggestion for filter is objectively OK is that both ways allocate new memory cells in the same amount, so we ought to choose the one that is more efficient.

The issue with

method map-like(self, f) -> Set:
  list-to-list-set(self.to-list().map(f))
end

is that it allocates n new memory cells from .map, and possibly another n new cells from list-to-list-set. While

method map-like(self, f) -> Set:
  self.fold(lam(acc, x): acc.add(f(x)) end, list-set(empty))
end

would allocate at most only n new memory cells from .add in .fold.

That said, I didn't mean to say which one is better, as I don't know if it's worth getting rid of intermediate structures or not, compared to using raw JS functions. Probably worth asking @jpolitz.

@schanzer
Copy link

@ess476 I'm assuming this is no longer on your radar?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants