-
Notifications
You must be signed in to change notification settings - Fork 865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce core dependencies, split in optional dependencies #2265
Conversation
Performance benchmarks:
|
Yes, I am strongly in favor of a simple high-level API for the core functionality. Why should a novice user bother with the internal module structure of MESA?
Again, yes. The main issue is what is part of the default set of things that users might use for 80% of the models?
As far as I know, during coding dependencies are not checked so while running an import error will automatically be raised.
I'll try to look at this asap.
This PR is getting a bit big in terms of affected files. Might it be possible to split it into two: update toml, update docs? |
Thanks for the review! On point 2 and 3, the biggest issue arises from the updated __all__ = [
"Model",
"Agent",
"time",
"space",
- "DataCollector",
- "batch_run",
- "experimental",
] DataCollector, batch_run and experimental are removed because they contain dependencies that are note installed by default anymore. Not everyone might need a DataCollector, batch_run or experimental feature, so that's the idea. The thing is, ideally you don't import those three if they are not used. As far as I have found out, you have two options:
|
I quickly checked the toml file. It seems sensible not to always install, e.g., networks or Solara. I am less sure about pandas. In my view this is such a standard default library, why not always include this. TQDM is very lightweight as far as I know, so why not always also install this? My personal view on this, in general, is that this fine sliceing of dependencies is a bit odd given Python's overall batteries included philosophy. In particular, if dependencies are stable and readily available (so not requiring conda forge or so), I rarely see the point of not simply installing everything. |
I think there is a middle way feasible here. Don't underestimate how heavy pandas is though. One example is that they currently don't have Python 3.13 wheels (months after the betas, and two weeks before stable release), and building (in CI) takes a full 4 minutes. But I agree there not that many scenarios you don't need it. Including |
I'm not sure of the performance impact, but one could import |
I think we can do this quite elegantly: class DataCollector:
....
@cached_property
def _pd(self):
"""Lazy-loaded pandas module using cached_property."""
try:
import pandas as pd
return pd
except ImportError as e:
raise ImportError(
"Pandas is required for this operation. Please install it using 'pip install pandas'."
) from e
def get_model_vars_dataframe(self):
...
return self._pd.DataFrame(self.model_vars) Also, in the datacollector, Pandas is generally only used on methods that are ran after the model is done, like However, I'm also still considering making |
I was stating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EwoutH This is awesome!
I had one question on the import and thinking through latest and stable readthedocs pages. Not sure the best answer, except may be calling explicitly that there is a difference.
My thought on the discussion is and to @quaquel's comments of "batteries included" can you set it up so all
is installed by default (e.g. if no specification install all or rec) this is good for new users but for more advanced users they can specify specific dependencies they want. (e.g. which sub categories they want)
Thanks for reviewing! I answered your comment on the docs. I will try to update the introduction tutorial, but we probably have to rewrite large parts of it. I did a small update in #2315, but removing the schedulers requires completely rewriting the Agent activation parts. |
Ha just saw #2315 and went right into the code so didn't see everyone else comments, as such I updated my comment. Thanks for all this work, this is such great progress! |
Think about what to do with that
Okay, I vastly simplified this PR, by keeping Pandas and tqdm core dependencies. Now only the pyproject.toml and Readme are updated. Please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry one last question about cookiecutter. But yes this looks fine and clear now.
Its indeed much better now, although pandas is really a mixed bag. Previously I would have said everyone who uses mesa should probably also use pandas anyway, but nowadays with polars, ibis and others this is much less true. And, as far as I remember, we only use pandas in one place, to convert the data collected data into a dataframe. I don't think that is really worth the dependency. But lets maybe do that in a separate PR, to move this one forward. |
I am inclined, as part of the broader overhoal of data collection, to no longer include any to_dataframe style helper method and leave this completely to the user. For example, as long as we store agent level data in a dict of dicts ( |
Fully agree
Also agree I'm merging, follow-up PRs can be made as everyone sees fit. |
pip install -U --pre mesa[network,viz] | ||
|
||
# This is equivalent to our recommended dependencies: | ||
pip install -U --pre mesa[rec] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold on. This will trip people over. Visualization is what a lot of people coming to Mesa want to use it for. The default needs to include visualization. I can do pip install solara
and have the common functionality working out of the box. Why do I have to do pip install mesa[rec]
specifically for just Mesa?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing a minimal install of Mesa is less common than doing an install of Mesa for visualization in a classroom setting. Can the split be mesa-minimal
/mesa-core
and mesa
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have multiple complex models that don't touch Solara, the batch_run or the DataCollector at all.
Also the initial error message when you can't install Solara is quite clear.
Also it's literally in our front-page Readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also typically want to avoid installing stuff like solara on an HPC. As long as it is well documented, which it is, I am fine with the current structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have similar situations (initially wanted to embed Mesa inside other simulation frameworks), but why make it more inconvenient to students and first time learners, who have to read the fine print of the README.md?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can put it in a few other places, like the visualization tutorial, and potentially also add a helpful warning on import somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I also didn't saw this properly before, but I side with @rht on this one. Viz should be in core imho. Mesa should be "batteries included". For hpc or similar environments you can always install with "--no-deps". And in the future it might be more convinient to have a "mesa-core" package with minimal dependencies. But for now it should be included
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know you where the person who opened #626 ;)
Let’s discuss at the next dev meeting.
This PR introduces a more flexible dependency management system for Mesa, allowing users to choose which additional components they need while maintaining core functionality.
Motivation
This change allows for more tailored installations, reducing unnecessary dependencies and potential conflicts, while ensuring core functionality remains intact.
Key Changes
pyproject.toml
to define new optional dependency groups:[network]
: Network-related dependencies[viz]
: Visualization dependencies[rec]
: Recommended dependencies (includes network and viz)[all]
: All dependencies, including developer dependenciesREADME.md
with new installation instructions for optional dependencies.Installation Options
Users can now customize their Mesa installation:
Impact
Points to Review
[network]
,[viz]
,[rec]
,[all]
) appropriate and sufficient?Closes #626