Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/flatten asktell dict values #1409

Merged

Conversation

jlnav
Copy link
Member

@jlnav jlnav commented Aug 19, 2024

  • Packing kwargs into gen_specs["user"] now occurs in LibensembleGenerator parent class
  • multi-dim values from numpy flattened into e.g. x0, x1 when converted to list-of-dicts. These get packed back up into (x, int, (2,)) in list_dicts_to_np.

@jlnav jlnav marked this pull request as draft August 19, 2024 16:07
@jlnav jlnav marked this pull request as ready for review August 19, 2024 18:46
Copy link

codecov bot commented Aug 19, 2024

Codecov Report

Attention: Patch coverage is 87.75510% with 48 lines in your changes missing coverage. Please review.

Project coverage is 77.17%. Comparing base (833a9b0) to head (4541d8a).
Report is 276 commits behind head on experimental/jlnav_plus_shuds_asktell.

Files with missing lines Patch % Lines
libensemble/gen_classes/surmise.py 29.26% 29 Missing ⚠️
libensemble/gen_classes/aposmm.py 82.50% 3 Missing and 4 partials ⚠️
libensemble/sim_funcs/borehole_kills.py 0.00% 5 Missing ⚠️
libensemble/utils/runners.py 95.29% 2 Missing and 2 partials ⚠️
libensemble/gen_funcs/persistent_gen_wrapper.py 95.65% 0 Missing and 1 partial ⚠️
libensemble/libE.py 0.00% 0 Missing and 1 partial ⚠️
libensemble/utils/validators.py 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@                            Coverage Diff                             @@
##           experimental/jlnav_plus_shuds_asktell    #1409       +/-   ##
==========================================================================
- Coverage                                  90.56%   77.17%   -13.39%     
==========================================================================
  Files                                         88       94        +6     
  Lines                                       8055     8425      +370     
  Branches                                    1442     1503       +61     
==========================================================================
- Hits                                        7295     6502      -793     
- Misses                                       570     1698     +1128     
- Partials                                     190      225       +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jlnav jlnav requested a review from shuds13 August 19, 2024 19:15
Copy link
Member

@shuds13 shuds13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss the theory, but the implementation is partial, and would need to be made consistent.

jlnav and others added 8 commits August 20, 2024 13:44
bump pydantic versions, really trying to resolve warnings
Various fixes to run the proxystore test on extra-ci
…evelop/supercharge/redis-github-action-1.8.0

Bump supercharge/redis-github-action from 1.7.0 to 1.8.0
…ffix, instead of just checking if last char is digit. better decide output numpy array type for strings
Bumps [globus-compute-sdk](https://github.com/globus/globus-compute) from 2.26.0 to 2.27.0.
- [Release notes](https://github.com/globus/globus-compute/releases)
- [Changelog](https://github.com/globus/globus-compute/blob/main/docs/changelog.rst)
- [Commits](globus/globus-compute@2.26.0...2.27.0)

---
updated-dependencies:
- dependency-name: globus-compute-sdk
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
@jlnav jlnav requested a review from shuds13 August 22, 2024 14:44
…us-compute-sdk-2.27.0

Bump globus-compute-sdk from 2.26.0 to 2.27.0
dependabot bot and others added 5 commits August 22, 2024 19:20
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.23.6 to 1.23.7.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.23.6...v1.23.7)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
…evelop/crate-ci/typos-1.23.7

Bump crate-ci/typos from 1.23.6 to 1.23.7
Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.23.7 to 1.24.1.
- [Release notes](https://github.com/crate-ci/typos/releases)
- [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md)
- [Commits](crate-ci/typos@v1.23.7...v1.24.1)

---
updated-dependencies:
- dependency-name: crate-ci/typos
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
@shuds13
Copy link
Member

shuds13 commented Aug 23, 2024

This is the test data I'm using. I suggest it for unit tests because its awkward.

dtype = [('a', 'i4'), ('x', 'f4', (3,)), ('y', 'f4', (1,)), ('z', 'f4', (12,)), ('greeting', 'U10'), ('co2', 'f8')]
H = np.zeros(2, dtype=dtype)
H[0] = (1, [1.1, 2.2, 3.3],[10.1],[1,2,3,4,5,6,7,8,9,10,11,12], 'hello','1.23')
H[1] = (2, [4.4, 5.5, 6.6],[11.1],[51,52,53,54,55,56,57,58,59,60,61,62], 'goodbye','2.23')

list_dicts = np_to_list_dicts(H)
npp = list_dicts_to_np(list_dicts)

for field in H.dtype.names:
    print(f"Comparing {field}: {H[field]} {npp[field]}")

    if isinstance(H[field], np.ndarray):
        assert np.array_equal(H[field], npp[field]), f"Mismatch found in field {field}"

    elif isinstance(H[field], str) and isinstance(npp[field], str):
        assert H[field] == npp[field], f"Mismatch found in field {field}"

    elif np.isscalar(H[field]) and np.isscalar(npp[field]):
        assert np.isclose(H[field], npp[field]), f"Mismatch found in field {field}"

    else:
        raise TypeError(f"Unhandled or mismatched types in field {field}: {type(H[field])} vs {type(npp[field])}")

Get failures still.

The failure for y I think is because the information that it was a 1D array is lost when convert to dictionary. So this type of thing is why may need to reference original data for back/forth conversion. I'm not sure this can be done as standalone functions.

The string also fails, maybe you also lose the size 'U10' when convert to dictionary, so when it comes back, it is taking length of first string entry - but they are different lengths ('hello', 'goodbye')

…evelop/crate-ci/typos-1.24.1

Bump crate-ci/typos from 1.23.7 to 1.24.1
@jlnav
Copy link
Member Author

jlnav commented Aug 26, 2024

Going to implement your test data into the unit test.

Yeah, I'll see what can be done about (1,) as a size; its awkward because its not strictly necessary although it appears a lot in our examples. It's an additional way to specify a 1D element.

For the string, yes; currently taking the length of a string in the first entry; but going through the whole dict and finding the largest string also feels wrong. When I've placed strings into the History into the past I try to make sure they're all the same length, or very close. Maybe I could do instead: output_type = "U" + str(int(len(entry) * 1.5))

But yeah; without the original knowledge around, we'll lose the original string datatype.

@jlnav
Copy link
Member Author

jlnav commented Aug 26, 2024

Starting to lean towards a target_dtype parameter for list_dicts_to_np. It could be helpful, but there's also the dilemma where we're not using these conversion functions on generators that specify a dtype in the first place, our generators...

And as we've discussed it would be best not to force users of external gens to specify dtypes...?

… for np_to_list_dicts to unpack length-1 arrays/lists, into scalars
…cts_to_np passes through input as-is if its not a list (already numpy, no conversion necessary. _to_array did this previously)
…cribing the names of the fields, decides what fields are passed in, but their "actual datatypes" come from the sim / sim_specs
self.gen_specs["user"] = kwargs
if not persis_info:
self.persis_info = add_unique_random_streams({}, 4, seed=4321)[1]
self.persis_info["nworkers"] = 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats going on here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kwargs line is similar to #1307 (comment)

for persis_info, if libE ask/tell generators maintain the H, persis_info, gen_specs, libE_info, **kwargs signature, my thinking is users may prefer not needing to populate/maintain persis_info.

Its the circumstance where

aposmm = APOSMM(
    ub=1,
    lb=0,
)

is probably preferred to:

aposmm = APOSMM(
    None,
    {},
    {
        "user": {"ub": 1, "lb": 0},
    },
    {}
)

self._get_user_params(self.gen_specs["user"])
self.gen_specs["out"] = [("x", float, (self.n,))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this. We are trying to represent an external gen here. Most external gens are not going to have this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point; I don't think gen_specs["out"] is necessary for this specific class

self._get_user_params(self.gen_specs["user"])
self.gen_specs["out"] = [("x", float, (self.n,))]
Copy link
Member

@shuds13 shuds13 Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. This would overwrite any gen_specs['out'] supplied, but I can see that declaring the input/output types should probably come with the gen. Although its possible it may support more than one type or size of float or dimensionality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, declaring input/output types is often implied by libE's interface to be something you can do; and something we should wholeheartedly support. but for UniformSample, for example, does it create anything other than xs ?

I was thinking that it seems it doesn't matter if you decide to uniformly sample thetas instead:

gen_specs["out"] = ["theta", float, (3,)]

the fields are often nonetheless hardcoded like:

H_o["x"] = ...

Maybe that's just because its a very simple gen. But this is also true for our other gens; you really can't customize gen_specs["out"] except for sizes.

So might as well hardcode the gen's fields, I guess, and make sure its clear to a user that its really the size thats customizable.

Though I guess if fields really were configurable, we'd have a gen that resembles:

for field in gen_specs["out"]:
   if field == "theta":
      H["theta"] = do_experiment_with_theta()
   else:
      H["x"] = do_without_theta()

@@ -104,7 +117,7 @@ def ask(self, num_points: Optional[int] = 0) -> List[dict]:

def tell(self, results: List[dict]) -> None:
"""Send the results of evaluations to the generator."""
self.tell_numpy(list_dicts_to_np(results))
self.tell_numpy(list_dicts_to_np(results)) # OH, we need the union of sim_specs.out and gen_specs.out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using externally may not even have sim_specs, or gen_specs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I think we briefly discussed on Tuesday that we have to do the conversion anyway; and thankfully libE can handle receiving numpy arrays with dtypes it previously was unaware of

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the external interface, I would think each gen must need to say name/type for dictionaries entries (in VOCS). So user knows what they must supply. But they only allow scalars and something like sampling can take varying dimensions, what would the interface be. x0, x1, x2 ... Xn

…kers, use gen_specs.get("out") so if it isnt provided, the dtype discovery process commences
@jlnav jlnav merged commit 345aea3 into experimental/jlnav_plus_shuds_asktell Sep 12, 2024
14 of 16 checks passed
@jlnav jlnav deleted the feature/flatten_asktell_dict_values branch September 12, 2024 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants