Stanford CS231n Lecture4 Backpropagation and Neural Networks

use the global gradient and local gradient to obtain all gradients
patterns
- add gate
  - gradient distributor
  - send the global gradient as it is
- max gate
  - gradient router
  - global gradient is sent to the large value, and zero is sent to the small value
- mul gate
  - gradient swithcer
  - exchange each other’s values and send the values obtained by multiplying the global gradient
gradients add at branches
- \[{df \over dx} = \sum_i {df \over dq_i} {dq_i \over dx}\]
vectorized operations
- Q. what is the size of the Jacobian matrix?
  - A. 4096 x 4096
  - entire minibatch (e.g. 100) of examples at one time: (409,600 x 409,600) matrix
  - in practice, Jacobian Matrix is used because multivariables are processed
vectorized example

This is written by me after taking CS231n Spring 2017 provided by Stanford University. If you have questions, you can leave a reply on this post.