Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert ecoscope.base classes to dataframe accessors #258

Closed
wants to merge 30 commits into from
Closed

Conversation

atmorling
Copy link
Contributor

@atmorling atmorling commented Aug 26, 2024

Opening this as a draft to get eyes on:

  • Relocations and Trajectory are now dataframe accessors
  • Removed EcoDataFrame / SpeedDataFrame
  • _straighttrack_properties is now StraighttrackMixin and available to both Relocations and Trajectory
  • Removed custom cachedproperty in favour of built-in cached_property
  • Move analysis.proximity into Trajectory
  • Removes upper bound version pin on geopandas and numpy

Tasks remaining:

  • Update docstrings
  • Update notebooks
  • Re-add add_speedmap logic in a more generalised way
  • Fix Github CI - this is failing due to resolving geopandas<0.14.2, unsure why as yet

@atmorling
Copy link
Contributor Author

atmorling commented Aug 26, 2024

The use of relocations/trajectories used to look like:

trajectory = ecoscope.base.Trajectory.from_relocations(movebank_relocations)

With accessors this is now:

trajectory = movebank_relocations.trajectories.from_relocations()

The main change I want to advertise/socialise is that this PR makes a choice to have all accessor methods that used to return a Relocations/Trajectory instance now return a gpd.GeoDataFrame
The reason for this is IMO, this makes the using the accessors in conjunction with regular pandas/geopandas methods feel more consistent.

For example, by returning a GDF, getting turn angle looks like :

#all steps return a gdf
trajectory = movebank_relocations.trajectories.from_relocations()
trajectory = trajectory.loc[trajectory.groupby_col == "Habiba"]
trajectory["heading"] = [0, 90, 120, 60, 300]
turn_angle = trajectory.trajectories.get_turn_angle()

vs returning self (an instance of either Relocations or Trajectory):

trajectory = movebank_relocations.trajectories.from_relocations() 
# no longer a gdf
trajectory = trajectory.gdf.loc[trajectory.gdf.groupby_col == "Habiba"] 
trajectory.gdf["heading"] = [0, 90, 120, 60, 300]
turn_angle = trajectory.get_turn_angle()

@atmorling atmorling requested review from Yun-Wu, walljcg and cisaacstern and removed request for Yun-Wu and walljcg August 26, 2024 17:20
@atmorling atmorling marked this pull request as ready for review August 27, 2024 12:01
Copy link
Contributor

@cisaacstern cisaacstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just getting a few comments out to start. More to follow!

trajectory_gdf: Trajectory,
trajectory_gdf: gpd.GeoDataFrame,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love that this is now possible.

Do we want to be more opinionated about typing here, perhaps via pandera schemas?

Certainly the majority of gpd.GeoDataFrames will not work if passed here.

Not necessarily a question to be solved by this PR, but something to consider as a follow-on perhaps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way of putting this, if this can only operate on Trajectories, than it should be a method (i.e. accessor) on a Trajectory. (Which is the accessors paradigm, is a GeoDataFrame that adheres to a validated schema.)

Copy link
Contributor

@cisaacstern cisaacstern Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

traj = ...  # but, how do we get this? 🤷 

traj.ecoscope.calculate_etd()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plain_gdf = ...

traj = plain_gdf.ecoscope.EcoTrajectory()
turn_angle = traj.ecotrajectory.get_turn_angle()
etd = traj.ecotrajectory.calculate_etd()

🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Typing for alex...)

plain_gdf.EcoscopeBase.init()
inited_gdf.relocations.thing

Copy link
Contributor

@cisaacstern cisaacstern Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from typing import TypeVar

import pandera as pa
import pandas as pd

S = TypeVar("S")


class RelocationsSchema(pa.Schema): ...

class TrajectorySchema(pa.Schema): ...

class GDFWithSchema(Generic[S]):  ...


@pd.api.extensions.register_dataframe_accessor("ecoscope")
class ecoscope():

    def __init__(self, pandas_obj):
        assert isinstance(obj, gpd.GeoDataFrame)
        self._gdf = pandas_obj

    def Relocations(..., parser=...) -> GDFWithSchema[RelocationsSchema]: 
        # possibly coerce to schema here

        assert pa.validate(self._gdf, RelocationsSchema)
        return self._gdf

    def is_trajectory():
        return pa.validate(self._gdf, TrajectorySchema)

    def Trajectory(..., parser=...) -> GDFWithSchema[TrajectorySchema]:
        relocs = self.Relocations(parser=parser)
        ...
        
@pd.api.extensions.register_dataframe_accessor("trajectory")
class trajectory():

    def __init__(self, pandas_obj):
        pa.validate(self._gdf, TrajectorySchema)
        self._gdf = pandas_obj

    def get_turn_angle(...) ...:

    def calculate_etd(...) ...:

@pd.api.extensions.register_dataframe_accessor("is_trajectory")
class is_trajectory():
    def __call__(self) -> bool:
        return pa.validate(self._gdf, TrajectorySchema)

Copy link
Contributor

@cisaacstern cisaacstern Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plain_gdf = ...

reloc = plain_gdf.ecoscope.Relocations()
traj = reloc.ecoscope.Trajectory()

# ~ OR ~

traj = plain_gdf.ecoscope.Trajectory(parser=...)


traj.trajectory.get_turn_angle()
traj.trajectory.calculate_etd()

...

if not plain_gdf.is_trajectory(): ... # can accessors implement __call__()?

# ~ OR ~

if not plain_gdf.ecoscope.is_trajectory(): ...

plain_gdf.trajectory.get_turn_angle()  # -> validation error

Comment on lines +29 to +36
cache-environment: true
cache-downloads: true
init-shell: bash

- name: Install pip dependencies and our package
shell: bash -leo pipefail {0}
run: |
python -m pip install ".[all]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that's a much more intelligible separation between base environment and package under test.

@atmorling
Copy link
Contributor Author

Just to summarise the conversation with @cisaacstern, let's hold off on merging this for now and have a deeper discussion about how the core API is structured.

The main things to hash out are

  • With accessors, how do we validate that when we call random_gdf.trajectory.get_turn_angle(), that random_gdf conforms to our notion of a "Trajectory"
  • Can roll analysis module methods that require Relocation/Trajectory "objects" into accessors as well?

@atmorling
Copy link
Contributor Author

I'm going to close this issue in favour of the discussed rework that includes much broader API changes than described here. See https://github.com/wildlife-dynamics/compose/issues/27 for more info.

I've split some of the smaller changes in this PR that don't strictly relate to the core change out into their own PRs, rather than deferring them until the broader API change is complete.

@atmorling atmorling closed this Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants