论文记录__Stochastic gradient descent with differentially private updates

对着背影说爱祢 提交于 2020-02-07 10:37:46

记录几条疑问

  • The sample size required for a target utility level increases with the privacy constraint.
  • Optimization methods for large data sets must also be scalable.
  • SGD algorithms satisfy asymptotic guarantees


Introduction

  • 主要工作简介:
    \quad In this paper we derive differentially private versions of single-point SGD and mini-batch SGD, and evaluate them on real and synthetic data sets.

  • 更多运用SGD的原因:
    \quad Stochastic gradient descent (SGD) algorithms are simple and satisfy the same asymptotic guarantees as more computationally intensive learning methods.

  • 由于asymptotic guarantees带来的影响:
    \quad to obtain reasonable performance on finite data sets practitioners must take care in setting parameters such as the learning rate (step size) for the updates.

  • 上述影响的应对之策:
    \quad Grouping updates into “minibatches” to alleviate some of this sensitivity and improve the performance of SGD. This can improve the robustness of the updating at a moderate expense in terms of computation, but also introduces the batch size as a free parameter.


Preliminaries

  • 优化目标:
    \quad solve a regularized convex optimization problem :w=argminwRdλ2w2+1nΣi=1nl(w,xi,yi)w^* = \mathop{ \textbf{argmin} } \limits_{ w \in \mathbb{R}^d} \frac{\lambda}{2} \Vert w \Vert ^2 + \frac{1}{n} \mathop{ \Sigma }\limits_{i=1}^n \mathbb{l} (w,x_i,y_i)
    \quad where ww is the normal vector to the hyperplane separator, and l\mathbb{l} is a convex loss function.
    \quadl\mathbb{l} 选为 logistic loss, 即 l(w,x,y)=log(1+eywTx)\mathbb{l} (w,x,y)=log(1+e^{-yw^Tx}), 则 \Rightarrow Logistic Regression
    \quadl\mathbb{l} 选为 hinge loss, 即 l(w,x,y)=\mathbb{l} (w,x,y)= max(0,1ywTx)(0,1-yw^Tx), 则 \Rightarrow SVM

  • 优化算法:
    \quad SGD with mini-batch updates :wt+1=wtηt(λwt+1bΣ(xi,yi)Btl(wt,xi,yi))w_{t+1} = w_t - \eta_t \Big( \lambda w_t + \frac{1}{b} \mathop{\Sigma}\limits_{ (x_i,y_i) \in B_t} \triangledown \mathbb{l} (w_t,x_i,y_i) \Big)
    \quad where ηt\eta_t is a learning rate, the update at each step tt is based on a small subset BtB_t of examples of size bb.



SGD with Differential Privacy

  • 满足差分隐私的 mini-batch SGD :
    \quad A differentially-private version of the mini-batch update :wt+1=wtηt(λwt+1bΣ(xi,yi)Btl(wt,xi,yi)+1bZt)w_{t+1} = w_t - \eta_t \Big( \lambda w_t + \frac{1}{b} \mathop{\Sigma}\limits_{ (x_i,y_i) \in B_t} \triangledown \mathbb{l} (w_t,x_i,y_i) \,+ \frac{1}{b}Z_t \Big)
    \quad where ZtZ_t is a random noise vector in Rd\mathbb R ^d drawn independently from the density: ρ(z)e(α/2)z\rho(z) \propto e^{-(\alpha/2) \|z\|}

  • 使用上式的 mini-batch update 时, 此种updates满足α\alpha-differentially private的条件:
    \quad Theorem\mathcal{Theorem \,} If the initialization point wow_o is chosen independent of the sensitive data, the batches BtB_t are disjoint, and if l(w,x,y)1\| \triangledown \mathbb l(w,x,y)\| \leq 1 for all ww, and all (xi,yi)(x_i,y_i), then SGD with mini-batch updates is α\alpha-differentially private.



Experiments

  • 实验现象:
    \quad batch size 为1时DP-SGD的方差比普通的SGD更大。但 batch size 调大后则方差减小了很多。
    在这里插入图片描述

  • 由此而总结出的经验:
    \quad In terms of objective value, guaranteeing differential privacy can come for “free” using SGD with moderate batch size.

  • 实际上 batch size 带来的影响是先减后增
    \quad increasing the batch size improved the performance of private SGD, but there is a limit , much larger batch sizes actually degrade performance.
    在这里插入图片描述


额外记录几条经验

  • 数据维度dd与隐私保护参数会影响实验所需的数据量:
    \quad Differentially private learning algorithms often have a sample complexity that scales linearly with the data dimension dd and inversely with the privacy risk α\alpha. Thus a moderate reduction in α\alpha or increase in dd may require more data.
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!