Bounded sum with unknown n #7

silviacasac · 2021-07-21T11:00:18Z

First pull request for the Bounded sum with unknown n transformation, which corresponds to this GitHub permalink.

Files:

BoundedSumKnownN.tex -- the LaTeX document with the proof.
BoundedSumKnownN.pdf -- the PDF generated from the TeX document.

This proof document makes use of the updated versions of the "List of definitions used in the proofs" and "List of definitions used in the pseudocode" documents, both of which will be re-uploaded to their respective Git branches later today (July 21).

bounded_sum_unknown_n/BoundedSumUnknownN.tex

bounded_sum_unknown_n/thing

Shoeboxam · 2021-08-04T17:15:42Z

bounded_sum_unknown_n/BoundedSumUnknownN.tex

+\[
+   = \Big|\sum_i u_i \cdot h_v(u_i) - \sum_i u_i \cdot h_w(u_i)\Big| = \Big|\sum_i u_i \cdot (h_v(u_i) - h_w(u_i))\Big|.
+\]
+Note that we have the inequality $|\function(v) - \function(w)| \leq |\sum_i v_i - \sum_i w_i|$ above (instead of an equality) due to the definition of \texttt{saturating\_add}. The equality case holds whenever $\sum_i v_i \in \texttt{[get\_min\_value(T), get\_max\_value(T)]}$ and $\sum_i w_i \in \texttt{[get\_min\_value(T),}$  $\texttt{get\_max\_value(T)]}$. In any of the possible cases where $\sum_i v_i > \texttt{get\_max\_value(T)}$ or $\sum_i v_i < \texttt{get\_min\_value(T)}$ and $\sum_i w_i > \texttt{get\_max\_value(T)}$ or $\sum_i w_i < \texttt{get\_min\_value(T)}$, the difference $|\sum_i v_i - \sum_i w_i|$ will always upper bound the value $|\function(v) - \function(w)|$, and hence it is sufficient to carry our proof by only considering the quantity $|\sum_i v_i - \sum_i w_i|$.


My internal thinking when I chose the saturating add was that the saturating add is essentially computing clamp(sum + v, T::MIN, T::MAX) in a loop, which is equivalent to sum + clamp(v, T::MIN + sum, T::MAX - sum). In this calculation the magnitude of v can only be reduced, so the global sensitivity is unaffected.

Since the clamping bounds are dependent on the previous computation, the unknown n sum is sensitive to ordering. Perhaps you could attack it by issuing multiple queries with different sort directions or shuffling, and measure the (potentially very large) differences between the queries.

We might need a different input metric, one that cares about order like hamming but edits the dataset with additions/removals? The float sum in the known n sum and mean are also sensitive to ordering.

silviacasac · 2021-08-05T21:32:20Z

Remaining things to check:

Whether it is OK that the castings to not appear in the proof.
(Maybe) Use absolute scalar clamping to make the proof more concise?
Important. Floating point issues both in the addition of the elements. We are working on it.

SalilVadhan · 2021-08-06T17:55:11Z

bounded_sum_unknown_n/BoundedSumUnknownN.tex

+
+\begin{proof}
+    \textbf{(Appropriate output domain).} In the case of \texttt{MakeBoundedSum}, this corresponds to showing that for every vector $v$ in \texttt{VectorDomain(IntervalDomain(L, U))}, where \texttt{L} and \texttt{U} have type \texttt{T}, the element \texttt{function(v)} belongs to \texttt{AllDomain(T)}.
+    The type signature of \texttt{function} as defined in line \ref{line:fn} automatically enforces that \texttt{function(v)} has type \texttt{T}. Since the Rust code successfully compiles, by the type signature the appropriate output domain property must hold. Otherwise, the code will raise an exception for incorrect input type. It is also necessary to check that \texttt{function(v)} is contained within the interval \texttt{[get\_min\_value(T), get\_max\_value(T)]}. This is enforced by the use of the function \texttt{saturating\_add} in line~\ref{line:saturate}, as described in \href{https://www.overleaf.com/project/60d215bf90b337ac02200a99}{``List of definitions used in the pseudocode"}.


Why is it necessary to check that function(v) is in the interval? Doesn't that hold by the definition of the type T? Or is it a separate requirement of AllDomain(T)?

SalilVadhan · 2021-08-06T17:57:29Z

bounded_sum_unknown_n/BoundedSumUnknownN.tex

+    If the sum of all the vector elements in \texttt{data} is greater than \texttt{get\_max\_value(T)}, then \texttt{result} will be equal to \texttt{get\_max\_value(T)}. If the sum of all the vector elements in \texttt{data} is less than \texttt{get\_min\_value(T)}, then \texttt{result} will be equal to \texttt{get\_min\_value(T)}. Otherwise, \texttt{result} will be equal to the sum of all the vector elements in \texttt{data}, and it will be contained within the interval \texttt{[get\_min\_value(T), get\_max\_value(T)]}. Therefore, \texttt{function(v)} is guaranteed to be in \texttt{output\_domain} in all cases.
+
+    \smallskip
+    \textbf{(Domain-metric compatibility).} For \texttt{MakeBoundedSum}, this corresponds to showing that \texttt{VectorDomain(IntervalDomain (L, U))} is compatible with symmetric distance, and that \texttt{AllDomain(T)} is compatible with absolute distance. Both follow directly from the definition of symmetric distance and absolute distance, as stated in \href{https://www.overleaf.com/project/60d215bf90b337ac02200a99}{``List of definitions used in the pseudocode"}, along with the \textit{appropriate output domain property} shown above, which ensures that \texttt{output\_domain} is indeed \texttt{AllDomain(T)}.


absolute distance is not a metric itself; the metric is AbsoluteDistance(T). Presumably the latter is not defined for all types T, but requires some traits for T - what traits are those, and are they satisfied by the preconditions?

SalilVadhan · 2021-08-06T18:00:54Z

bounded_sum_unknown_n/BoundedSumUnknownN.tex

+    \smallskip
+    \textbf{(Stability guarantee).} Throughout the stability guarantee proof, we can assume that $\function(v)$ and $\function(w)$ are in the correct output domain, by the appropriate output domain property shown above. 
+
+    Since by assumption $\Relation(\din, \dout) = \True$, by the \texttt{MakeBoundedSum} stability relation (as defined in line~\ref{line:rel} in the pseudocode), we have that $\dout \geq \din \cdot \max{(|\texttt{U}|, |\texttt{L}|)}$. Moreover, $v, w$ are assumed to be $\din$-close. By the definition of the symmetric difference metric, this is equivalent to stating that $d_{Sym}(v, w) = |\MultiSet(v) \Delta \MultiSet(w)| \leq \din$.


Don't leave out the inf_cast here! Also, what about any rounding or overflow that might happen in the multiplication?

SalilVadhan · 2021-08-06T18:02:20Z

bounded_sum_unknown_n/BoundedSumUnknownN.tex

+    \]
+    as we want to see. 
+
+Let $u$ denote the vector formed by all the elements of $v$ and $w$ \textit{without multiplicities} (i.e., $u$ contains exactly once each of the elements in $\MultiSet(v) \cup \MultiSet(w)$, in any order). Let $u_i$ denote the $i$-th element of $u$, and similarly for $v$ and $w$, and let $m$ denote the length of $u$.  


Rather than defining the vector u, can't we instead sum over all elements of type T?

SalilVadhan · 2021-08-06T18:05:02Z

bounded_sum_unknown_n/BoundedSumUnknownN.tex

+\[
+   = \Big|\sum_i u_i \cdot h_v(u_i) - \sum_i u_i \cdot h_w(u_i)\Big| = \Big|\sum_i u_i \cdot (h_v(u_i) - h_w(u_i))\Big|.
+\]
+We remark that we have the inequality $|\function(v) - \function(w)| \leq |\sum_i v_i - \sum_i w_i|$ above (instead of an equality) due to the definition of \texttt{saturating\_add}. The equality case holds whenever $\sum_i v_i \in \texttt{[get\_min\_value(T), get\_max\_value(T)]}$ and $\sum_i w_i \in \texttt{[get\_min\_value(T),}$  $\texttt{get\_max\_value(T)]}$. In any of the possible cases where $\sum_i v_i > \texttt{get\_max\_value(T)}$ or $\sum_i v_i < \texttt{get\_min\_value(T)}$ and $\sum_i w_i > \texttt{get\_max\_value(T)}$ or $\sum_i w_i < \texttt{get\_min\_value(T)}$, the difference $|\sum_i v_i - \sum_i w_i|$ will always upper bound the value $|\function(v) - \function(w)|$, and hence it is sufficient to carry our proof by only considering the quantity $|\sum_i v_i - \sum_i w_i|$.


It seems that the saturation happens at every step of addition, not only at the end after computing the entire sum. Consider what hapens when adding a bunch of 1's (say max_value+5) and then adding bunch of -1's (equal to max_value-min_value). Applying saturated_add at every step will result in min_value, but applying it only at the end will give min_value+5.

similarly, if rounding is necessary (e.g. for floating point numbers), every step will be rounded, not just at the end.

SalilVadhan · 2021-08-06T18:06:49Z

bounded_sum_unknown_n/BoundedSumUnknownN.tex

+as we wanted to show.
+\end{proof}
+
+\silvia{Flag: need to account for rounding errors in the stability relation given the non-closure of float addition, as discussed on the week of August 2nd. We are figuring this out.}


need to state the assumptions you need on the type T to avoid this issue so that we can abstract them as a trait

SalilVadhan · 2021-08-06T18:44:25Z

bounded_sum_unknown_n/BoundedSumUnknownN.tex

+\[
+    d_{Abs}(\function(v), \function(w)) \leq \Big|\sum_i u_i \cdot (h_v(u_i) - h_w(u_i))\Big| \leq |u_i| \cdot \sum_i |h_v(u_i) - h_w(u_i)|.
+\]
+By the appropriate output domain property $u_i \in \texttt{[L, U]} \, \forall i$ it follows that $|u_i| \leq \max{(|\texttt{U}|, |\texttt{L}|)}$ for all $i$. Hence,


This is not by the appropriate output domain property; it is by the hypothesis that v,w are in the input_domain.

SalilVadhan · 2021-08-06T18:45:35Z

bounded_sum_unknown_n/BoundedSumUnknownN.tex

+    d_{Sym}(v, w) = \sum_z \Big|h_v(z) - h_w(z)\Big| = \sum_i \Big|h_v(u_i) - h_w(u_i)\Big|;
+\]
+\[
+    d_{Abs}(\function(v), \function(w)) = |\function(v) - \function(w)| \leq \Big|\sum_i v_i - \sum_i w_i\Big| = 


spurious equal sign at the end. suggest using eqnarray* to format your sequences of equations/inequalities more nicely.

SalilVadhan · 2021-08-06T18:45:53Z

bounded_sum_unknown_n/BoundedSumUnknownN.tex

+\]
+By the appropriate output domain property $u_i \in \texttt{[L, U]} \, \forall i$ it follows that $|u_i| \leq \max{(|\texttt{U}|, |\texttt{L}|)}$ for all $i$. Hence,
+\[
+    d_{Abs}(\function(v), \function(w)) \leq |u_i| \cdot \sum_i |h_v(u_i) - h_w(u_i)| \leq 


spurious \leq at the end of the line

silviacasac added 2 commits July 21, 2021 12:49

Create thing

086be23

Add files via upload

ac3e001

silviacasac requested review from pdurbin, raprasad, ecowan, Shoeboxam, andrewvyrros, SalilVadhan, gracetian6 and cwagaman July 21, 2021 11:00

silviacasac self-assigned this Jul 21, 2021

silviacasac added 2 commits July 27, 2021 00:42

Add files via upload

81684e8

typos

da189ee

Shoeboxam reviewed Aug 4, 2021

View reviewed changes

bounded_sum_unknown_n/BoundedSumUnknownN.tex Outdated Show resolved Hide resolved

Shoeboxam reviewed Aug 4, 2021

View reviewed changes

bounded_sum_unknown_n/thing Outdated Show resolved Hide resolved

Shoeboxam reviewed Aug 4, 2021

View reviewed changes

silviacasac added 3 commits August 5, 2021 13:21

Mike's corrections

8ea39bc

Delete thing

e5ba347

Corrections

25ae70a

silviacasac added 2 commits August 6, 2021 12:50

Add files via upload

59568e8

relation

c979263

SalilVadhan reviewed Aug 6, 2021

View reviewed changes

Shoeboxam mentioned this pull request Sep 23, 2021

Bounded Sum Transformation (Unknown N) -- Proof opendp/opendp#329

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bounded sum with unknown n #7

Bounded sum with unknown n #7

silviacasac commented Jul 21, 2021

Shoeboxam Aug 4, 2021 •

edited

Loading

Shoeboxam Aug 6, 2021 •

edited

Loading

silviacasac commented Aug 5, 2021 •

edited

Loading

SalilVadhan Aug 6, 2021

SalilVadhan Aug 6, 2021

SalilVadhan Aug 6, 2021

SalilVadhan Aug 6, 2021

SalilVadhan Aug 6, 2021

SalilVadhan Aug 6, 2021

SalilVadhan Aug 6, 2021

SalilVadhan Aug 6, 2021

SalilVadhan Aug 6, 2021

SalilVadhan Aug 6, 2021

Bounded sum with unknown n #7

Are you sure you want to change the base?

Bounded sum with unknown n #7

Conversation

silviacasac commented Jul 21, 2021

Shoeboxam Aug 4, 2021 • edited Loading

Choose a reason for hiding this comment

Shoeboxam Aug 6, 2021 • edited Loading

Choose a reason for hiding this comment

silviacasac commented Aug 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shoeboxam Aug 4, 2021 •

edited

Loading

Shoeboxam Aug 6, 2021 •

edited

Loading

silviacasac commented Aug 5, 2021 •

edited

Loading