基础

基础¶

这一部分cover一些深度学习的基础概念！

深度学习所处领域

LR无法提取features，尤其是pixels类的，所以需要DL进行representation layers来做特征提取
与其用手动的特征找猫的眼睛 e.g，可以让Neural Network automatically learn useful representation of data

模型训练

Epochs：一个epoch指的是用了trainset的所有数据进行了一轮训练
- The longer you train your model, the more tightly it will fit the training data (but may also overfit the test data).
- To find the right number of epochs, monitor the loss on the validation data when training. Stop training when the validation loss begins increasing (rule of thumb).
Batches：一次训练的时候只会用一个batch，# of batches越多越准确但更慢，有个speed-accuracy的
- Batch Gradient Descent (Full data)：只有一个batch（batch size就是length）
- Mini batch Gradient Descent：现在经常被误称为Batch Gradient Descent
  - 最后的example可以丢掉也可以as a smaller batch at the end，在上面的例子里**一个epoch有四个updates**
- Stochastic: batch size = 1
Batch size：The number of examples used per gradient update.
- A batch size of 1 is stochastic gradient descent. Many updates per epoch, but each is inaccurate.
- A batch size of len(training_set) is batch gradient descent. One accurate update per epoch, but slow to compute.
- Intermediate sizes (the default in Keras is 32) are called mini-batch gradient descent.
In general, you will not need to change this parameter unless your examples are very small (structured data, in which case you can use a larger batch size).

定义：Dropout is a regularization technique to deal with overfitting problem and improve generalization
- dropout的含义：node is not used in this in the next round of training and and all its connections will be not be updated
- dropout的方式：randomly dropout probability for each hidden layers
Prevents co-adaptation

of activation units
- Each hidden unit learns certain features independently - the features that each neuron is learning our robust features they are not dependent upon the presence or not presence of the subsequence of the previous layers!
- 这样各个hidden unit学习的是different features
Probabilistically drop input features or activation units in hidden layers
- Layer dependent dropout probability ( ~0.2 for input, ~0.5 for hidden)
- 为什么要input小：you do not want to use a high value of report for the input 因为 most the input features will be gone right in the training而我们希望保留！但也可以允许一定的dropout
Test的时候怎么处理dropout：apply dropout probability！这样test的时候其实layer都在

无监督学习

DL常见超参及调整策略