Skip to content

Latest commit

 

History

History
13 lines (11 loc) · 605 Bytes

Recurrent_Neural_Network_Regularization.md

File metadata and controls

13 lines (11 loc) · 605 Bytes

Recurrent Neural Network Regularization

arXiv

drop

  1. The main idea is to apply the dropout operator only to the non-recurrent connections $$ \left(\begin{array}l i\f\o\g\end{array}\right)= \left(\begin{array}l \text{sigmoid}\\text{sigmoid}\\text{sigmoid}\\tanh\end{array}\right)T_{2n,4n} \left(\begin{array}l \text{Dropout}(h_t^{l-1})\h^l_{t-1}\end{array}\right) $$

在非循环的部分dropout, $l$ 隐含层,$t$ 时间,对于第一层,应在输入上dropout

  1. The optimal dropout probability
    1. MACHINE TRANSLATION: 0.2.