Multiple Aggregates #254

preethiraghavan1 · 2022-03-25T00:57:44Z

Description

Please include a summary of the change, the motivation, and any additional context that will help others understand your PR. If it closes one or more open issues, please tag them as described here.

Affected Dependencies

How has this been tested?

Unit test

Checklist

[ x] I have followed the Contribution Guidelines and Code of Conduct
[ x] I have commented my code following the OpenMined Styleguide
[x ] I have labeled this PR with the relevant Type labels
[ x] My changes are covered by tests

Merging from upstream.

dvadym

Thanks the approach is great! I left comments

dvadym · 2022-03-28T14:10:59Z

pipeline_dp/private_beam.py

+
+    def expand(self, pcol: pvalue.PCollection):
+        columns = {
+            self.col_name[i]: pcol | "agg " + str(i) >> self._getTransform(


This is the right idea to add numbers to solve problem with duplicating labels!

Nit: f"Aggregation{i}"

Comment about adding numbers to label names: In BeamBackend such function was implemented with UniqueLabelGenerator class. But here it's simple enough, so I think the current approach to add numbers instead of using UniqueLabelGenerator makes sense.

dvadym · 2022-03-29T14:29:49Z

pipeline_dp/private_beam.py

+class Aggregate(PrivatePTransform):
+    """Transform class for performing multiple aggregations on a PrivatePCollection."""
+
+    def __init__(self, label=None):


I've though in more details, in most use cases aggregations will share the same parameters (and also sharing the same parameters will help to optimize performance and utility of queries). Could you please

add argument params of type AggregateParams.

2.add argument partition_extractor_fn

Those arguments will be used in each aggregation

dvadym · 2022-03-29T14:46:34Z

pipeline_dp/private_beam.py

+            col_name: name of the column for the resulting aggregate value.
+            agg_type: type of pipeline_dp.Metrics identifying the aggregate
+            to calculate."""
+        return _Aggregate([args], col_name=[col_name], agg_type=[agg_type])


It looks that we can have only one class Aggregate, w/o _Aggregate namely

aggregate_value returns self. aggregate_value saves in some member variable information about aggregations.

expand works as in _Aggregate

The advantage is that it will be simpler and no need to create multiple instances of _Aggregate. WDYT?

dvadym · 2022-03-29T16:22:47Z

pipeline_dp/private_beam.py

+_agg_named_tuple_cache = {}
+
+
+def _get_or_create_named_tuple(type_name: str,


Nice, this is a correct approach to generate dynamic tuples!

dvadym · 2022-03-29T16:29:02Z

pipeline_dp/private_beam.py

+    def __init__(self, label=None):
+        super().__init__(return_anonymized=True, label=label)
+
+    def aggregate_value(self, *args, col_name: str,


Since all shared parameters will be provided in the constructor, there are just a few parameters that's needed value_extractor

dvadym · 2022-03-29T16:34:39Z

pipeline_dp/private_beam.py

+            'AggregatesTuple', tuple(["pid"] + [k for k in x[1]]),
+            tuple([x[0]] + [x[1][k][0] for k in x[1]])))
+
+    def _getTransform(self, agg_type: pipeline_dp.Metrics, *args):


I'd suggest to use DPEngine.aggregate instead of PrivatePTransforms Mean, Sum, Count. The benefits are that in future we can optimize performance/utility by computing multiple aggregations with DPEngine.aggregate

dvadym · 2022-04-11T16:58:47Z

class Aggregate can aggregate only for one partition, but by multiple values.

For each value aggregation, we run DPEngine.aggregate we need to have AggregateParams:

class AggregateParams:
    metrics: Iterable[Metrics]
    max_partitions_contributed: int
    max_contributions_per_partition: int
    budget_weight: float = 1
    min_value: float = None
    max_value: float = None
    public_partitions: Any = None
    noise_kind: NoiseKind = NoiseKind.LAPLACE
    custom_combiners: Iterable['CustomCombiner'] = None

Some of those parameters are common for all values to aggregate (i.e. they needed to be specified in constructor in Aggregate), some of them specific for value (i.e. in aggregate_value arguments).

1. def __init__(self, partition_extractor, common_params:CommonAggregateParams, public_partitions=None)
2.def aggregate_value(self, value_extractor, value_params:ValueParams,output_col:str)

class CommonAggregateParams:  # name AggregateParams is already taken, maybe other name?
  max_partitions_contributed: int
  max_contributions_per_partition: int
  noise_kind: NoiseKind = NoiseKind.LAPLACE

class ValueParams:
    budget_weight: float = 1
    min_value: float = None
    max_value: float = None

Preethi Raghavan and others added 8 commits March 15, 2022 20:48

Initial commit for multiple aggregates

2f951b0

returning named tuple from aggregates

bb8eb7b

tests multiple aggregates

f6bf7be

renaming the cache and tuples

f8d5d06

added summ aggregate

7fbc14e

comments and argument type

4e3f4e2

Merge branch 'main' into agg

f21c1d6

Merging from upstream.

fix lint changes

0ab8ac3

preethiraghavan1 requested a review from dvadym March 25, 2022 22:00

dvadym reviewed Mar 29, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple Aggregates #254

Multiple Aggregates #254

preethiraghavan1 commented Mar 25, 2022

dvadym left a comment

dvadym Mar 28, 2022

dvadym Mar 29, 2022

dvadym Mar 29, 2022

dvadym Mar 29, 2022

dvadym Mar 29, 2022

dvadym Mar 29, 2022

dvadym commented Apr 11, 2022

		_agg_named_tuple_cache = {}


		def _get_or_create_named_tuple(type_name: str,

Multiple Aggregates #254

Are you sure you want to change the base?

Multiple Aggregates #254

Conversation

preethiraghavan1 commented Mar 25, 2022

Description

Affected Dependencies

How has this been tested?

Checklist

dvadym left a comment

Choose a reason for hiding this comment

dvadym Mar 28, 2022

Choose a reason for hiding this comment

dvadym Mar 29, 2022

Choose a reason for hiding this comment

dvadym Mar 29, 2022

Choose a reason for hiding this comment

dvadym Mar 29, 2022

Choose a reason for hiding this comment

dvadym Mar 29, 2022

Choose a reason for hiding this comment

dvadym Mar 29, 2022

Choose a reason for hiding this comment

dvadym commented Apr 11, 2022