Gradient Through Concatenation

Concatenation of vectors is a common operation in Deep Learning Networks. How can we compute derivative of the output in the computational graph?

We can write the operation as

$$z = x|y$$

Where $|$ is concat operator. We are interested in computing ${\partial z}/{\partial x}$ and ${\partial z}/{\partial y}$

Assuming $x\in \mathbb{R}^m$ and $x\in \mathbb{R}^n$. We can rewrite the concat operation as

$$z = \begin{bmatrix}I_m & 0\end{bmatrix}x+\begin{bmatrix}0 & I_n\end{bmatrix}y$$

with $I_k$ as identity matrix of size $k \times k$. Then we have

$$\begin{aligned} \frac{\partial z}{\partial x} &= \begin{bmatrix}I_m & 0\end{bmatrix} & \frac{\partial z}{\partial y} &= \begin{bmatrix}0 & I_n\end{bmatrix} \end{aligned}$$




tags: #computational-graph