-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review use of FunctionNode #121
Comments
What percent of these are only being used in the "fitting" part of a pipeline and not the "prediction" part? |
And are these all RDD to RDD? |
Just randomly chiming in - the aggregation pattern is everywhere in every query processing engine (and you're totally right, it's also in decade-old databases!), so I guess there's a reason. |
So after taking a closer look, it seems to me like the cases we're using FunctionNode right now fall under either:
Some questions I have about the Aggregators are:
|
In several pipelines, we use
FunctionNode
to handle cases where, for example, anEstimator[A,B]
doesn't return aTransformer[A,B]
, but instead returns aTransformer[C,D]
, or where there is no good meaning for a single-item transformation.Currently, FunctionNode feels like a "catch-all" because the Transformer/Estimator APIs don't sufficiently cover some of the data transformation operations we need to support.
One example of this is
NGramsCounts
which takes aSeq[Seq[T]]
and returns a model of typeNGrams[T] => Int
.Other examples include
Windower
andSampler
which are used in the RandomPatchCifar pipeline. These nodes are different in that they do not operate on single items and are thus not transformers, but act as something like anAggregator
if we were going to draw a database analogy.The text was updated successfully, but these errors were encountered: