Forward propagation & Back propagation

Forward propagation

순한맛

훈련 샘플이 1개라고 가정하자.

매 layer에서

1. w^T x + b 해야 되니까

w의 길이 = 입력값의 크기

b의 길이 = 입력값의 크기

w를 세로로 쌓는 횟수 = 출력값의 크기

매운맛

훈련 샘플이 m개라고 가정하자.

오른쪽으로 가면 $\rightarrow$ 샘플의 번호 증가

아래로 가면 $\rightarrow$ 유닛(노드)의 번호 증가

저 그림 하나가 샘플 하나에 대한 그림이긴 한데

저 노드들이 사실 화면 안쪽으로 쭉 이어져있고 가려져 있다고 생각하면 안쪽이 더 번호가 큰 샘플에 해당

중요

X(= A^[0]), Z, A 모두 0열이 바깥으로 나오는 형태로 칩이 꽂혀있다고 생각하니 쉽네!

(W 1행) x + (b 1행) = (z 1행) 생각하면 W의 모양도 쉽네!

각 출력 노드에 대한 가중치가 (n_x, 1) 행벡터!!

Back propagation

주의

위 식을 행렬 미분으로 해보려고 할 때 잘 안되는 이유 :

z가 열벡터면 dz (= $\frac{\partial L}{\partial z}$)는 행벡터가 되야 하는데 열벡터로 표기해서

$\rightarrow$ 아예 element-wise 증명하거나 dz 같은 식들을 $\frac{\partial L}{\partial z}^T$로 이해해야 한다

Example

$\cfrac{\partial L}{\partial W} = \begin{bmatrix}
\cfrac{\partial L}{\partial W_{ji}}
\end{bmatrix} = \begin{bmatrix}
\cfrac{\partial L}{\partial z_j} \ \cfrac{\partial z_j}{\partial W_{ji}}
\end{bmatrix} = \begin{bmatrix}
\cfrac{\partial L}{\partial z_j} \ a_i
\end{bmatrix} = a\ \cfrac{\partial L}{\partial z}\\ \\ \\ \\

\therefore\ dW = \left(\cfrac{\partial L}{\partial W}\right)^T = \left(\cfrac{\partial L}{\partial z}\right)^T a^T = dz\ a^T$

$\cfrac{\partial L}{\partial z^{[\ell-1]}} = \cfrac{\partial L}{\partial z^{[\ell]}}\ \cfrac{\partial z^[\ell]}{\partial a^{[\ell-1]}}\ \cfrac{\partial a^{[\ell - 1]}}{\partial z^{[\ell-1]}}\\ = \cfrac{\partial L}{\partial z^{[\ell]}}\ \begin{bmatrix} \cfrac{\partial z^{[\ell]}_i}{\partial a^{[\ell-1]}_j}
\end{bmatrix}\ \begin{bmatrix} \cfrac{\partial a^{[\ell-1]}_i}{\partial z^{[\ell-1]}_j}
\end{bmatrix} \\
= \cfrac{\partial L}{\partial z^{[\ell]}}\ \begin{bmatrix} W^{[\ell]}_{ij}
\end{bmatrix}\ diag((g^{[\ell-1]})'(z^{[\ell - 1]}))\\ = \cfrac{\partial L}{\partial z^{[\ell]}}\ W^{[\ell]}\ * (g^{[\ell-1]})'(z^{[\ell - 1]})\\ \\

\therefore\ dz^{[\ell-1]} = \left(\cfrac{\partial L}{\partial z^{[\ell-1]}}\right)^T = W^{[l]T}\ dz^{[\ell]} * (g^{[\ell-1]})'(z^{[\ell - 1]})$

이거 정독

https://explained.ai/matrix-calculus/#sec6.1

The matrix calculus you need for deep learning

Most of us last saw calculus in school, but derivatives are a critical part of machine learning, particularly deep neural networks, which are trained by optimizing a loss function. This article is an attempt to explain all the matrix calculus you need in o

explained.ai

'AI > DL' 카테고리의 다른 글

Matrix differentiation에 관한 엄청난 직관 (0)	2022.05.11