Output metadata access and consolidation #6913

abkfenris · 2022-03-03T19:24:11Z

abkfenris
Mar 3, 2022

Right now there is two different sources of metadata for outputs, in the output definition and what is yielded from within the body of an op or asset.

Currently you cannot access either set of metadata from a downstream op, and you ~~can~~ cannot access metadata yielded from within an op in an IOManager. The second is specifically making it hard to use IOManagers for my use case (and others from what I've seen in the Slack, issues, and discussions).

With metadata otherwise being a first class citizen in Dagster this seems like an important bit to standardize.

Current state of output metadata in 0.14.1

A test of how things work in 0.14.1, I created an op that has both an output definition metadata, and that logs metadata, and an IOManager that logs the metadata that it receives. From within the definition, the metadata values are A, and those emitted within the op contain B:

class TestIOManager(IOManager):
    def handle_output(self, context, obj):
        context.log.warning(f"IO metadata: {context.metadata}")

    def load_input(self, context):
        pass


@io_manager
def test_io_manager(init_context):
    return TestIOManager()

@op(
    out=Out(
        metadata={"Out definition": "A", "Overlapping": "A"}, io_manager_key="test_io"
    )
)
def test_op(context):
    context.add_output_metadata({"Add_output_metadata": "B", "Overlapping": "B"})
    return "abc"


@job(resource_defs={"test_io": test_io_manager})
def test_job():
    test_op()

Currently the IOManager has no way of seeing what metadata about an output is emitted from within an op, however the step output only shows the metadata that is emitted from within the op. Downstream ops don't get access to either set of metadata. Additionally the two sources of metadata can lead to confusion.

Some impacts of the current state of output metadata

Right now if you want to pass a specific path into an IOManager, you can pass a fixed one in via the output definition metadata.

However if you have a dynamically generated path (say you are archiving data), you either need to create an IOManager that can figure out a path based off of a partition, or you need to pass custom classes around containing a path.

T = TypeVar("T")

@dataclass
class PathObj(Generic[T]):
    path: Path
    obj: T

@op
def dataframe_op() -> PathObj[pd.DataFrame]:
    ...
    return PathObj("/somewhere/special.csv", df)

class PandasIOManager(IOManager):
    def handle_output(self, context, obj: PathObj[pd.DataFrame]):
        obj.obj.to_csv(obj.path)

This promotes everyone to come up with their own solution to this problem which makes it harder to share and reuse IOManagers. If there was a way to pass a path generated by an op to the IOManager it would help solve these.

Personally, for my usage, much of my data is being archived to be served by external services. Either I can subclass IOManager for each output to generate the correct path and formatting, or I could use a few more general purpose IOManagers and pass paths in.

Some ideas

I've got a few ideas on how to manage these issues, but these definitely aren't exhaustive.

Merging output metadata

If metadata is both defined on an output definition and logged from within an op, the metadata should be merged. I propose that the logged metadata takes precedence, as the op has more information about the data/context when it yields the metadata, than what is in the definition.

IOManager specific metadata/configuration

Part of the issue seems to be that while there are both two sources of output metadata, there are also two different types of output metadata. There is metadata that describes the output itself, then there is metadata that is used by an IOManager.

I propose that there is separate IOManager metadata, or configuration that can both be defined in the output definition and yielded from an op. This IO metadata wouldn't be part of the 'public' metadata records that are either shown as part of the STEP_OUTPUT events, or asset records.

IOManagers would be able to get access to public metadata via context.metadata and then for the IOManager specific metadata via context.io_metadata or similar in both the input and output contexts.

For instance, when swapping from local storage to cloud storage, you probably don't want to include a specific path in the op generated metadata, but you may want to give the IOManager / filesystem a hint as to where the file should go, and then get the absolute path and/or url as part of the regular metadata. Standardizing on a name for the key would help here, see #6763

`context.upstream_metadata` or `context.upstream_metadata_for_input()`

Within an op, there are times that you need access to metadata from an output. Previously the question was 'which metadata', but if the metadata gets merged, then this is an easier question to answer.

If it's cheap enough to provide all the metadata before running an op, adding a property context.upstream_metadata with a dict of input names to dicts of metadata. Otherwise a method that returns the metadata for a given input name.

Similar properties or methods could give ops access to their own input and output metadata.

`context.set_output_asset_key()`

AssetKey also are limited by this separation between definition and logged metadata. Similar to how there are dynamic paths that some data should be persisted to, some AssetKeys may be dynamic also. Currently you can connect an asset key to an output is if it is in the output definition, or if the IOManager sets it, but not in a op.

Some related conversations/issues

sryza · 2022-03-10T20:08:05Z

sryza
Mar 10, 2022

Hey @abkfenris - thanks for this really thoughtful writeup. Just to clarify and make sure I'm parsing it correctly, when you say "you can access metadata yielded from within an op in an IOManager", should "can" be "cannot"?

1 reply

abkfenris Mar 10, 2022
Author

Oops, correct.

sryza · 2022-03-14T03:34:55Z

sryza
Mar 14, 2022

@abkfenris all the issues you're bringing up here make a lot of sense.

The most "controversial" question here, in my mind, is whether runtime metadata yielded for an output should be available to the downstream load_input and op contexts. The risk is that we'd be offering two ways of doing the same thing. Users who want to pass runtime data to a downstream op would need to ask themselves whether they should pass that data using Outputs or using output metadata. If we want to go this direction, I think we'd need clear guidelines for when to use one or the other. Do you have thoughts on what those guidelines should be?

Commenting on your proposals:

context.upstream_metadata or context.upstream_metadata_for_input()

This depends somewhat on the output to the above question, but, at the very least, I agree that downstream ops should have access to the metadata on the upstream output definition. E.e. with something like your suggestion context.upstream_metadata_for_input(). Or maybe context.upstream_output_def_for_input(...).metadata.

Merging output metadata

I think there's value in being able to distinguish between "definition-level" metadata and "runtime" metadata, especially in debugging. My hunch is that we should keep these separate in our internal data model, though we could provide a utility that returns a merged version.

context.set_output_asset_key()

Setting an asset key dynamically on an output makes sense to me. I would lean towards return Output(asset_key=...), because it doesn't involve mutating state. Thoughts?

IOManager specific metadata/configuration

Do you have examples in mind of metadata that would be io manager-specific vs. not io manager-specific?

1 reply

abkfenris Mar 14, 2022
Author

A use case

I think for some of this it may help to have a more concrete use case to reason about. I can speak to my own, but I've seen quite a few folks asking questions that seem to have their own use cases where metadata is useful.

Much of my work involves managing the data workflows for NERACOOS. Like much of the geosciences we're dealing with N-dimensional data as NetCDF (and sometimes Zarr) and connecting it up to various types of services that we run such as ERDDAP and THREDDS (which are directed to use by NOAA). Most (but not all) of time in Python we can use xarray (usually imported as xr) to handle NetCDF. We also have various tools like compliance checkers for the formatting of data that is getting sent to NOAA. So we've got a mix of systems that really like to have specifically formatted files placed in narrowly defined locations mixing with Python native workflows.

NetCDF, Zarr, and many of the other N-dimensional data formats are also interesting in the fact that they have their own rich metadata models, but we can largely ignore the specifics of their metadata model for now (at some point I want to dig into TableSchema and the fancier metadata display model for working with N-dimensional data).

The controversial question: does having metadata accessible downstream add confusion?

I'm going to ask you a question back: is it Dagster's role to enforce best practices, or to enable workflows and progressively steer users towards best or at least better practices?

If it's the first, then yes, users should be making wrapper objects that bundle their key metadata alongside of the data and to pass those around.

Most of what I've seen, makes me think that Dagster leans towards the second. You can start using Dagster by slapping an @op on an existing function. Later you can start filling your op out with types, context access, yielding events, but those are not requirements to use it.

Metadata could also be progressively typed on both inputs and outputs, so something like:

@op(out=Out(xr.Dataset, metadata={'path': Expected(PathMetadataValue)})
def give_me_a_dataset() -> xr.Dataset:
   ...

@op(ins={'ds': In(xr.Dataset, upstream_metadata={'path': PathMetadataValue})
def something_where_the_path_matters(ds: xr.Dataset):
    ...

Expected() meaning that the metadata value isn't known now, but by the time the op returns, it will include the path as part of the metadata, either via Output() or context.add_output_metadata.

I'll ponder further on specific guidance, but I wanted to keep this discussion going before I run away into the woods for a week.

This is where my experience with NetCDF's data and metadata model helps guide my opinion.

Lets say my data is 32, 0, 273.15.

Some metadata gives the values meaning: Fahrenheit, Celsius, Kelvin. The metadata also can define where those values are stored /path/to/the_freezing_point_of_water.nc.

If I'm adding 10 to those values, then I don't need to know the metadata.

However if I need to know how many kilojoules of energy it takes to raise the temperature 1º, then the units metadata is necessary, where as if I need to orchestrate an external service to upload that file, I need to know what the path is.

Metadata, like the units are key to understanding the data, and in a reasonable world of data formats, should live alongside of the data, however this is not always the world that we live in. The path that the data actually lives at in the system is not key to understanding the data, but it may be needed to work with other tools.

Within my graph, I may want to run a NetCDF dataset through the compliance checker.

Despite the compliance checker being a native Python library, it cannot handle xarray dataset objects getting passed directly into it (it's something I'm slowly nudging them upstream about too).

If that dataset has already been persisted by an IOManager, then within my compliance op, I could get the path from the metadata and give that to the library. When I get those compliance results back, if there is an AssetKey associated with the metadata, then I could yield an AssetObservation with those results and track those over time.

At the same time, I may have further ops that can work directly on the xarray dataset object and don't care about the path involved.

Proposals

context.upstream_metadata or context.upstream_metadata_for_input()

This depends somewhat on the output to the above question, but, at the very least, I agree that downstream ops should have access to the metadata on the upstream output definition. E.e. with something like your suggestion context.upstream_metadata_for_input(). Or maybe context.upstream_output_def_for_input(...).metadata.

Ya, I think the answers to above and some of the thoughts below will help steer this one.

Merging output metadata

I think there's value in being able to distinguish between "definition-level" metadata and "runtime" metadata, especially in debugging. My hunch is that we should keep these separate in our internal data model, though we could provide a utility that returns a merged version.

I can't speak to your internal usage much, but as a user I would lean towards a having a single 'just give me the metadata that was attached to this output' method. Data usually accumulates more and more metadata, rather than having the metadata get replaced wholesale.

OpExecutionContext.add_output_metadata might need to be changed to add or replace metadata, where as there may need to be a OpExecutionContext.set_output_metadata for replacement of a whole metadata dictionary (as well as allowing to prune keys).

context.set_output_asset_key()

Setting an asset key dynamically on an output makes sense to me. I would lean towards return Output(asset_key=...), because it doesn't involve mutating state. Thoughts?

I was going off of OpExecutionContext.add_output_metadata and the motion that seems to be happening towards needing to use Output() less due to ease typing errors, but I think it should be possible both ways.

IOManager specific metadata/configuration

Do you have examples in mind of metadata that would be io manager-specific vs. not io manager-specific?

Ya, I'm not sure I captured it that well when trying to describe it generally above, I'll try a more specific example.

For NERACOOS there are a handful of weather and wave models that I end up working with among other types of NetCDF data. Some of that data it just fleeting, some of it needs to be persisted in specific locations so that downstream services have access to it.

One of the wave models has a really cranky workflow upstream of us. It's supposed to run daily, but sometimes several days get missed, other times there are multiple runs in a single day, and sometimes the forecast runs out different number of hours.

I have a sensor that can see when the source changes, but without opening the file (a few hundred megs) I don't know what I'm getting until I have it. Once I have the file, I can compute a path that will work for the ERDDAP and THREDDS services that we use to give the public access to the data.

Right now, within my op, I persist the dataset, and I return both the xr.Dataset and a path. Since the path however describes the dataset, having them connected within the system would reduce the likelihood to say read the wrong path variable in later, not that I would ever do such a thing...

Since IOManagers are designed to encapsulate the transfer of data between ops, I would like to be able to create a NetCDFIOManager to deal this this data.

So with my cranky wave forecast data, today I might find out that I'm actually receiving yesterday's forecast, and generate a desired path cranky_wave_forecast/2022-03-13_waves.nc. Developing locally the NetCDFIOManager might persist it to /home/abkfenris/neracoos/datasets/cranky_wave_forecast/2022-03-13_waves.nc, however in production it might be s3://neracoos-data/cranky_wave_forecast/2022-03-13_waves.nc`.

So desired_path: cranky_wave_forecast/2022-03-13_waves.nc would be IO Manager specific metadata, in that it's helping control the io manager. IO Manager config is probably a better way to think about it than metadata since it's controlling the operation of the IO Manager.

path: /home/abkfenris/neracoos/datasets/cranky_wave_forecast/2022-03-13_waves.nc or path: s3://neracoos-data/cranky_wave_forecast/2022-03-13_waves.nc would be output specific metadata, as it's the realized path, even though it's been generated by the IO Manager.

If a desired path isn't given to the IO Manager metadata, then fall back to using OutputContext.get_run_scoped_output_identifier() to generate a path and include the path in the metadata for downstream ops.

I think directing an IO manager to a specific path is probably the most common instance of the pattern, hence #6763 , but instead of making a special case for desired paths, it would be better to have fuller access to metadata, and have a pseudo/community standard way of communicating a desired path to an IO Manager.

Hopefully that helps make some of my thinking clearer, but I can understand if I just further muddied the waters.

geoHeil · 2022-05-18T20:28:25Z

geoHeil
May 18, 2022

Great writeup! In general, I agree with all the points brought up by @abkfenris - and totally feel the same pain in my use cases with regards to metadata.

0 replies

geoHeil · 2022-05-19T14:54:16Z

geoHeil
May 19, 2022

When creating a class like

from typing import Tuple, TypeVar, Generic
from dataclasses import dataclass

T = TypeVar("T")

@dataclass
class DynamicallyPartitionedMetadata(Generic[T]):
    metadata: str
    obj: T
    latest_data_update : str

I observe this failure:

TypeError: Subscripted generics cannot be used with class and instance checks

when executing:

if isinstance(obj, DynamicallyPartitionedMetadata[pd.DataFrame]):

in an IO manager to check the output type (and handle it dynamically. Is there a way to fix it?

1 reply

sryza Jun 18, 2022

I believe the error you're seeing is an aspect of how Python typing works, that's out of Dagster's control. However, I believe you can accomplish what you're trying to accomplish by replacing your if statement with this:

if isinstance(obj, DynamicallyPartitionedMetadata) and isinstance(obj.obj, pd.DataFrame):

or this:

output_type = context.dagster_type.typing_type
if isinstance(obj, DynamicallyPartitionedMetadata) and output_type.__args__[0] == pd.DataFrame:

https://stackoverflow.com/questions/48572831/how-to-access-the-type-arguments-of-typing-generic

geoHeil · 2022-05-19T15:04:08Z

geoHeil
May 19, 2022

For a sensor, dagster has the notion of state and i.e. in the case of a backfill, a specific run can easily be started from dagit.
How would this be possible in the case of dynamic partitions to be able to submit a specific partition ID directly from dagit as well?

2 replies

geoHeil May 19, 2022

Though a backfill will only work for a stateful source - not a stateless one.
Nonetheless - even for development purposes it would be nice if I do not always resort to stopping dagit, deleting the sqlit file and restarting it ;)

geoHeil May 19, 2022

Furthermore, when thinking about schema migration i.e. having a historized asset (ingestion step 1) and a cleaned one (assuming a bug was detected in the cleaning step and all dynamic partitions which are stored need to be re-triggered. How could this work well?

geoHeil · 2022-05-19T16:15:46Z

geoHeil
May 19, 2022

I have two assets

Ingestion (with dynamic partitions
Cleaned. Should be triggered when a new dynamic partition arrives for the Ingestion asset. Both assets are executed in the same job/graph/DAG

I am using the DynamicallyPartitionedMetadata mentioned above.
And an asset definition of:


@asset
def cleaned_asset(
    context, raw_asset: DynamicallyPartitionedMetadata[pd.DataFrame]
) -> DynamicallyPartitionedMetadata[pd.DataFrame]

this fails with the following exception:

dagster.core.errors.DagsterInvalidDefinitionError: Invalid type: dagster_type must be an instance of DagsterType or a Python type: got DynamicallyPartitionedMetadata[pandas.core.frame.DataFrame]

File "/datasets/__init__.py", line 8, in <module>
    socrata_assets_prod = AssetGroup.from_package_module(
  File "/lib/python3.9/site-packages/dagster/core/asset_defs/asset_group.py", line 390, in from_package_module
    return AssetGroup.from_modules(
  File "/lib/python3.9/site-packages/dagster/core/asset_defs/asset_group.py", line 459, in from_modules
    for module in modules:
  File "/lib/python3.9/site-packages/dagster/core/asset_defs/asset_group.py", line 723, in _find_modules_in_package
    submodule = import_module(f"{package_module.__name__}.{modname}")
  File "/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/datasets/socrata/assets/buffalony__ru4s-wz29.py", line 176, in <module>
    def ru4s_wz29_normalized_asset(
  File "/lib/python3.9/site-packages/dagster/core/asset_defs/decorators.py", line 305, in inner
    op = _Op(
  File "/lib/python3.9/site-packages/dagster/core/definitions/decorators/op_decorator.py", line 116, in __call__
    resolved_input_defs = resolve_checked_solid_fn_inputs(
  File "/lib/python3.9/site-packages/dagster/core/definitions/decorators/solid_decorator.py", line 394, in resolve_checked_solid_fn_inputs
    input_def.combine_with_inferred(
  File "/lib/python3.9/site-packages/dagster/core/definitions/input.py", line 255, in combine_with_inferred
    dagster_type = _checked_inferred_type(inferred, decorator_name=decorator_name)
  File "/lib/python3.9/site-packages/dagster/core/definitions/input.py", line 281, in _checked_inferred_type
    raise DagsterInvalidDefinitionError(

When instead trying to manually supply the type information:

from dagster import PythonObjectDagsterType
ntype = PythonObjectDagsterType(python_type=DynamicallyPartitionedMetadata[pd.DataFrame])

@asset
def ru4s_wz29_normalized_asset(
    context, ru4s_wz29_raw_asset: ntype
) -> ntype:

the error message switches to:

Param "python_type" must be a class. Got DynamicallyPartitionedMetadata[pandas.core.frame.DataFrame] of type <class 'typing._GenericAlias'>.

The type validation works for a non-generic type like:

@dataclass
class DynamicallyPartitionedMetadataPandas():
    metadata: str
    obj: pd.DataFrame
    latest_data_update: str

How could dagster potentially support generic types?

1 reply

geoHeil May 20, 2022

Given an InputContext - how can I access the ASSET_MATERIALIZAtION events? Usually, it is possible to use context.instance.get_event_records but this does not work when I need to retrieve the latest_partition (in the IO manager) when trying to read the asset.

geoHeil · 2022-06-06T13:49:31Z

geoHeil
Jun 6, 2022

It will also be important to enable overwriting the "latest update" state field - perhaps from some configuration value.

0 replies

geoHeil · 2022-06-18T04:52:50Z

geoHeil
Jun 18, 2022

Has there been any progress on this with https://docs.dagster.io/concepts/ops-jobs-graphs/dynamic-graphs#returning-dynamic-outputs from dagster 0.15.0 @sryza ?

0 replies

andreqaugusto · 2022-11-18T02:46:16Z

andreqaugusto
Nov 18, 2022

+1 and would love to access metadata from the yielded Output.

My use case (today) is the following: I have to build a pipeline that does a query in a database, parse the data and save it to a Google Sheet (for our business users). The catch is that I have to save the data as multiple sheets, each one with a partition of the data itself.

Borrowing from the OP, my use case would be something like this:

class PandastoSheetsIOManager(IOManager):
    def handle_output(self, context, obj: pd.DataFrame):
      # logic to save the data using gspread
      sheets = ...
      sheets.save(file_name=...)

    def load_input(self, context):
        pass


@io_manager
def pandas_sheets_io_manager(init_context):
    return PandastoSheetsIOManager()

@op(out=DynamicOut(io_manager_key="sheets_io"))
def sheet_op(context):
    # do a query in database and return it as a Pandas df
    df = context.resources.query(...) 
    # transform the data as needed
    df = ...
    # select the unique users that we are going to separate into different sheets
    users = df['Users'].unique()
    # output the pieces of each dataframe 
    for user in users:
        yield DynamicOutput(
            df[df["Users"] == user],
            mapping_key=user,
            metadata={"file_name": user},
        )


@job(resource_defs={"sheets_io": pandas_sheets_io_manager})
def sheets_job():
    sheet_op()

In this case, technically I can obtain the information in the IOManager about the user that is provided by the op metadata by simple doing something similar as what I did in the op itself:

class PandastoSheetsIOManager(IOManager):
    def handle_output(self, context, obj: pd.DataFrame):
      # since the obj DataFrame has a single user in it per construction, we can obtain it
      user = obj['User'].unique()[0]
      # logic to save the data using gspread
      sheet = ...
      sheet.save(file_name=user)

However, I see the solution above as an anti-pattern. The IOManager should handle the logic between dagster and the storage solution without coupling any OP-logic in it. In the above solution I am essentially forbidding any reutilization of that IOManager (what if I had another job to export Sheets of another entity, like customers?).

The ideal would be that the Output metadata to be available to the IOManager. In that scenario, we can leverage the metadata to make informed choices of where to store the data:

class PandastoSheetsIOManager(IOManager):
    def handle_output(self, context, obj: pd.DataFrame):
      # imagine that the output metadata is available as the output_metadata property
      file_name = context.output_metadata.get("sheet_name", "sheet")  # type: ignore
      # now we can use the metadata to save the sheet. if it was not provided, we have a default (see above)
      sheet = ...
      sheet.save(file_name=file_name)

This feature (accessing the Output metadata) was so "obvious" to me that I was really surprised when I found out that dagster wasn't doing that already (since the framework gives so much power and flexibility with its contexts and data). Really hope that this feature gets implemented in the near-future. Meanwhile, I will just do the same thing as what I wrote in the second-to-last code-block and hope nobody asks me to do more Sheets reports :)

1 reply

andreqaugusto Nov 18, 2022

Relevant info: my question in Slack that brought me here in first place: https://dagster.slack.com/archives/C01U954MEER/p1668717043490779

ei-grad · 2023-04-11T19:07:23Z

ei-grad
Apr 11, 2023

Someone had to do it.

# license: WTFPL

class MetadataIOManager(IOManager):

    def load_input(self, context: InputContext) -> RawMetadataValue:
        e = context.instance.event_log_storage.get_event_records(EventRecordsFilter(
            event_type=DagsterEventType.ASSET_MATERIALIZATION,
            asset_key=context.asset_key,
            asset_partitions=[context.partition_key],
        ), limit=1)
        if len(e) == 0:
            raise Exception('Asset materialization event not found.')
        context.log.info("Using materialization event from run %s",
                         e[0].event_log_entry.run_id)
        d = e[0].event_log_entry.dagster_event.event_specific_data
        return d.materialization.metadata['value'].value

    def handle_output(self, context: OutputContext, obj: RawMetadataValue) -> None:
        context.add_output_metadata({'value': obj})


@io_manager(
    config_schema={},
    description="IO manager that stores and retrieves values from asset metadata.",
)
def metadata_io_manager(init_context: InitResourceContext):
    return MetadataIOManager()

1 reply

smallstepman Sep 17, 2024

the mental gymnastics...

saved me a bunch of time, thanks!

would be nice if dagster team included sth similar as part of InputContext API

pbower · 2024-04-24T03:28:24Z

pbower
Apr 24, 2024

Hi there,

Wondering if anyone is still considering this? It seems very strange that I can't override my IO Manager path for example across a partitioned asset.

There are various use cases where this is a key requirement.

Thanks,
Pete

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output metadata access and consolidation #6913

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 11 comments 8 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Output metadata access and consolidation #6913

Current state of output metadata in 0.14.1

Some impacts of the current state of output metadata

Some ideas

Merging output metadata

IOManager specific metadata/configuration

context.upstream_metadata or context.upstream_metadata_for_input()

context.set_output_asset_key()

Some related conversations/issues

Replies: 11 comments · 8 replies

abkfenris Mar 10, 2022 Author

abkfenris Mar 14, 2022 Author

A use case

The controversial question: does having metadata accessible downstream add confusion?

Proposals

`context.upstream_metadata` or `context.upstream_metadata_for_input()`

`context.set_output_asset_key()`

Replies: 11 comments 8 replies

abkfenris Mar 10, 2022
Author

abkfenris Mar 14, 2022
Author