矩阵求导公式的数学推导(矩阵求导——基础篇)
矩阵求导公式的数学推导(矩阵求导——基础篇)
一. 向量变元的实值标量函数
1、四个法则
1.1 常数求导:
与一元函数常数求导相同:结果为零向量
\frac{\partial c}{ \partial \pmb{x}}=\pmb{0}_{n \times 1} \\\\ \tag{1}
其中, 为常数。
证明:
\begin{align} \frac{\partial{c}}{\partial{\pmb{x}}} &= \begin{bmatrix} \frac{\partial{c}}{\partial{x_1}} \\ \frac{\partial{c}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{c}}{\partial{x_n}} \end{bmatrix} \\\\ &= \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0\end{bmatrix} \\\\ &=\pmb{0}_{n \times 1}\end{align} \\\\ \tag{2}
证毕。
**1.2 线性法则
与一元函数求导线性法则相同:相加再求导等于求导再相加,常数提外面
\frac{\partial{[c_1f(\pmb{x})+c_2g(\pmb{x})]}}{\partial{\pmb{x}}} = c_1\frac{\partial f(\pmb{x})}{\partial{\pmb{x}}} + c_2\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} \\\\ \tag{3}
其中, 为常数。
证明:
\begin{align} \frac{\partial{[c_1f(\pmb{x})+c_2g(\pmb{x})]}}{\partial{\pmb{x}}} &= \begin{bmatrix} \frac{\partial{(c_1f+c_2g)}}{\partial{x_1}} \\ \frac{\partial{(c_1f+c_2g)}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{(c_1f+c_2g)}}{\partial{x_n}} \end{bmatrix} \\\\ &= \begin{bmatrix} c_1\frac{\partial{f}}{\partial{x_1}}+c_2\frac{\partial{g}}{\partial{x_1}} \\ c_1\frac{\partial{f}}{\partial{x_2}}+c_2\frac{\partial{g}}{\partial{x_2}} \\ \vdots \\ c_1\frac{\partial{f}}{\partial{x_n}}+c_2\frac{\partial{g}}{\partial{x_n}} \end{bmatrix} \\\\ &=c_1\begin{bmatrix} \frac{\partial{f}}{\partial{x_1}} \\ \frac{\partial{f}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{f}}{\partial{x_n}} \end{bmatrix} + c_2\begin{bmatrix} \frac{\partial{g}}{\partial{x_1}} \\ \frac{\partial{g}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{g}}{\partial{x_n}} \end{bmatrix} \\\\ &=c_1\frac{\partial f(\pmb{x})}{\partial{\pmb{x}}} + c_2\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} \end{align} \\\\ \tag{4}
证毕。
1.3 乘积法则
与一元函数求导乘积法则相同:前导后不导 加 前不导后导
\frac{\partial{[f(\pmb{x})g(\pmb{x})]}}{\partial{\pmb{x}}} = \frac{\partial f(\pmb{x})}{\partial{\pmb{x}}}g(\pmb{x}) +f(\pmb{x})\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} \\\\ \tag{5}
证明:
\begin{align} \frac{\partial{[f(\pmb{x})g(\pmb{x})]}}{\partial{\pmb{x}}} &= \begin{bmatrix} \frac{\partial{(fg)}}{\partial{x_1}} \\ \frac{\partial{(fg)}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{(fg)}}{\partial{x_n}} \end{bmatrix} \\\\ &= \begin{bmatrix} \frac{\partial{f}}{\partial{x_1}}g+f\frac{\partial{g}}{\partial{x_1}} \\ \frac{\partial{f}}{\partial{x_2}}g+f\frac{\partial{g}}{\partial{x_2}}\\ \vdots \\ \frac{\partial{f}}{\partial{x_n}}g+f\frac{\partial{g}}{\partial{x_n}} \end{bmatrix} \\\\ &=\begin{bmatrix} \frac{\partial{f}}{\partial{x_1}} \\ \frac{\partial{f}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{f}}{\partial{x_n}} \end{bmatrix}g + f\begin{bmatrix} \frac{\partial{g}}{\partial{x_1}} \\ \frac{\partial{g}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{g}}{\partial{x_n}} \end{bmatrix} \\\\ &=\frac{\partial f(\pmb{x})}{\partial{\pmb{x}}}g(\pmb{x}) +f(\pmb{x})\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} \end{align} \\\\ \tag{6}
证毕。
**1.4 商法则
与一元函数求导商法则相同:(上导下不导 减 上不导下导)除以(下的平方):
\frac{\partial{\left[\frac{f(\pmb{x})}{g(\pmb{x})}\right]}}{\partial{\pmb{x}}} = \frac{1}{g^2(\pmb{x})}\left[ \frac{\partial f(\pmb{x})}{\partial{\pmb{x}}}g(\pmb{x}) -f(\pmb{x})\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} \right] \\\\ \tag{7}
其中, 。
证明:
\begin{align} \frac{\partial{\left[\frac{f(\pmb{x})}{g(\pmb{x})}\right]}}{\partial{\pmb{x}}} &= \begin{bmatrix} \frac{\partial{(\frac{f}{g})}}{\partial{x_1}} \\ \frac{\partial{(\frac{f}{g})}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{(\frac{f}{g})}}{\partial{x_n}} \end{bmatrix} \\\\ &= \begin{bmatrix} \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_1}}g -f\frac{\partial g}{\partial{x_1}} \right) \\ \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_2}}g -f\frac{\partial g}{\partial{x_2}} \right)\\ \vdots \\ \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_n}}g -f\frac{\partial g}{\partial{x_n}} \right) \end{bmatrix} \\\\ &= \frac{1}{g^2}\left( \begin{bmatrix} \frac{\partial{f}}{\partial{x_1}} \\ \frac{\partial{f}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{f}}{\partial{x_n}} \end{bmatrix}g - f\begin{bmatrix} \frac{\partial{g}}{\partial{x_1}} \\ \frac{\partial{g}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{g}}{\partial{x_n}} \end{bmatrix} \right) \\\\ &=\frac{1}{g^2(\pmb{x})}\left[ \frac{\partial f(\pmb{x})}{\partial{\pmb{x}}}g(\pmb{x}) -f(\pmb{x})\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} \right] \end{align} \\\\ \tag{8}
证毕。
2、几个公式
2.1
\frac{\partial( \pmb{x}^T \pmb{a})}{\partial{\pmb{x}}} = \frac{\partial( \pmb{a}^T\pmb{x})}{\partial{\pmb{x}}} = \pmb{a} \\\\ \tag{9}
其中, 为常数向量, 。
证明:
\begin{align} \frac{\partial( \pmb{x}^T \pmb{a})}{\partial{\pmb{x}}} &= \frac{\partial( \pmb{a}^T\pmb{x})}{\partial{\pmb{x}}} \\\\ &= \frac{\partial( a_1x_1+a_2x_2+\cdots+a_nx_n)}{\partial{\pmb{x}}} \\\\ &= \begin{bmatrix} \frac{\partial( a_1x_1+a_2x_2+\cdots+a_nx_n)}{\partial{x_1}} \\ \frac{\partial( a_1x_1+a_2x_2+\cdots+a_nx_n)}{\partial{x_2}} \\ \vdots \\ \frac{\partial( a_1x_1+a_2x_2+\cdots+a_nx_n)}{\partial{x_n}} \end{bmatrix} \\\\ &= \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{bmatrix} \\\\ &= \pmb{a} \end{align} \\\\ \tag{10}
证毕。
2.2
\frac{\partial( \pmb{x}^T \pmb{x})}{\partial{\pmb{x}}} = 2\pmb{x} \\\\ \tag{11}
证明:
\begin{align} \frac{\partial( \pmb{x}^T \pmb{x})}{\partial{\pmb{x}}} &= \frac{\partial( x_1^2+x_2^2+\cdots+x_n^2)}{\partial{\pmb{x}}} \\\\ &= \begin{bmatrix} \frac{\partial( x_1^2+x_2^2+\cdots+x_n^2)}{\partial{x_1}} \\ \frac{\partial( x_1^2+x_2^2+\cdots+x_n^2)}{\partial{x_2}} \\ \vdots \\ \frac{\partial( x_1^2+x_2^2+\cdots+x_n^2)}{\partial{x_n}} \end{bmatrix} \\\\ &= \begin{bmatrix} 2x_1 \\ 2x_2 \\ \vdots \\ 2x_n \end{bmatrix} \\\\ &= 2\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \\\\ &= 2\pmb{x} \end{align} \\\\ \tag{12}
证毕。
2.3
\frac{\partial( \pmb{x}^T \pmb{A}\pmb{x})}{\partial{\pmb{x}}} = \pmb{A}\pmb{x}+\pmb{A}^T \pmb{x} \\\\ \tag{13}
其中, 是常数矩阵, 。
证明:
\begin{align} \frac{\partial( \pmb{x}^T \pmb{A}\pmb{x})}{\partial{\pmb{x}}} &= \frac{\partial(a_{11}x_1x_1+a_{12}x_1x_2+\cdots+a_{1n}x_1x_n \\ +a_{21}x_2x_1+a_{22}x_2x_2+\cdots+a_{2n}x_2x_n \\ + \cdots \\ +a_{n1}x_nx_1+a_{n2}x_nx_2+\cdots+a_{nn}x_nx_n)}{\partial{\pmb{x}}} \\\\ &= \begin{bmatrix} \frac{\partial(a_{11}x_1x_1+a_{12}x_1x_2+\cdots+a_{1n}x_1x_n \\ +a_{21}x_2x_1+a_{22}x_2x_2+\cdots+a_{2n}x_2x_n \\ + \cdots \\ +a_{n1}x_nx_1+a_{n2}x_nx_2+\cdots+a_{nn}x_nx_n)}{\partial{x_1}} \\ \frac{\partial(a_{11}x_1x_1+a_{12}x_1x_2+\cdots+a_{1n}x_1x_n \\ +a_{21}x_2x_1+a_{22}x_2x_2+\cdots+a_{2n}x_2x_n \\ + \cdots \\ +a_{n1}x_nx_1+a_{n2}x_nx_2+\cdots+a_{nn}x_nx_n)}{\partial{x_2}} \\ \vdots \\ \frac{\partial(a_{11}x_1x_1+a_{12}x_1x_2+\cdots+a_{1n}x_1x_n \\ +a_{21}x_2x_1+a_{22}x_2x_2+\cdots+a_{2n}x_2x_n \\ + \cdots \\ +a_{n1}x_nx_1+a_{n2}x_nx_2+\cdots+a_{nn}x_nx_n)}{\partial{x_n}} \end{bmatrix} \\\\ &= \begin{bmatrix} (a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n)+(a_{11}x_1+a_{21}x_2+\cdots+a_{n1}x_n) \\ (a_{21}x_1+a_{22}x_2+\cdots+a_{2n}x_n)+(a_{12}x_1+a_{22}x_2+\cdots+a_{n2}x_n) \\ \vdots \\ (a_{n1}x_1+a_{n2}x_2+\cdots+a_{nn}x_n)+(a_{1n}x_1+a_{2n}x_2+\cdots+a_{nn}x_n) \end{bmatrix} \\\\ &= \begin{bmatrix} a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n \\ a_{21}x_1+a_{22}x_2+\cdots+a_{2n}x_n \\ \vdots \\ a_{n1}x_1+a_{n2}x_2+\cdots+a_{nn}x_n \end{bmatrix} +\begin{bmatrix} a_{11}x_1+a_{21}x_2+\cdots+a_{n1}x_n \\ a_{12}x_1+a_{22}x_2+\cdots+a_{n2}x_n \\ \vdots \\ a_{1n}x_1+a_{2n}x_2+\cdots+a_{nn}x_n \end{bmatrix} \\\\ &= \begin{bmatrix} a_{11}&a_{12}&\cdots&a_{1n}\\ a_{21}&a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{n1}&a_{n2}&\cdots&a_{nn} \end{bmatrix}\begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_n \end{bmatrix} +\begin{bmatrix} a_{11}&a_{21}&\cdots&a_{n1}\\ a_{12}&a_{22}&\cdots&a_{n2}\\ \vdots&\vdots&\ddots&\vdots\\ a_{1n}&a_{2n}&\cdots&a_{nn} \end{bmatrix}\begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_n \end{bmatrix} \\\\ &= \pmb{A}\pmb{x}+\pmb{A}^T \pmb{x} \end{align} \\\\ \tag{14}
证毕。
2.4
\frac{\partial( \pmb{a}^T\pmb{x}\pmb{x}^T\pmb{b})}{\partial{\pmb{x}}} = \pmb{a}\pmb{b}^T\pmb{x}+\pmb{b}\pmb{a}^T\pmb{x} \\\\ \tag{15}
其中, 为常数向量, 。
证明:
因为 ,所以有
\frac{\partial( \pmb{a}^T\pmb{x}\pmb{x}^T\pmb{b})}{\partial{\pmb{x}}} = \frac{\partial( \pmb{x}^T\pmb{a}\pmb{b}^T\pmb{x})}{\partial{\pmb{x}}} \\\\ \tag{16}
又因为 是 常数矩阵,由 式得:
\frac{\partial( \pmb{a}^T\pmb{x}\pmb{x}^T\pmb{b})}{\partial{\pmb{x}}} = \frac{\partial( \pmb{x}^T\pmb{a}\pmb{b}^T\pmb{x})}{\partial{\pmb{x}}}=\pmb{a}\pmb{b}^T\pmb{x}+\pmb{b}\pmb{a}^T\pmb{x} \\\\ \tag{17}
证毕。
二. 矩阵变元的实值标量函数
1、四个法则
1.1 常数求导
与一元函数常数求导相同:结果为零矩阵
\frac{\partial c}{ \partial \pmb{X}}=\pmb{0}_{m \times n} \\\\ \tag{18}
其中, 为常数。
证明:
\begin{align} \frac{\partial{c}}{\partial{\pmb{X}}} &= \begin{bmatrix} \frac{\partial{c}}{\partial{x_{11}}}&\frac{\partial{c}}{\partial{x_{12}}}&\cdots&\frac{\partial{c}}{\partial{x_{1n}}} \\ \frac{\partial{c}}{\partial{x_{21}}}&\frac{\partial{c}}{\partial{x_{22}}}&\cdots&\frac{\partial{c}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{c}}{\partial{x_{m1}}}&\frac{\partial{c}}{\partial{x_{m2}}}&\cdots&\frac{\partial{c}}{\partial{x_{mn}}} \end{bmatrix}_{m \times n} \\\\ &= \begin{bmatrix} 0&0&\cdots&0 \\ 0&0&\cdots&0 \\ \vdots &\vdots & \vdots & \vdots\\ 0&0&\cdots&0 \end{bmatrix}_{m \times n} \\\\ &=\pmb{0}_{m \times n}\end{align} \\\\ \tag{19}
证毕。
**1.2 线性法则
与一元函数求导线性法则相同:相加再求导等于求导再相加,常数提外面
\frac{\partial{[c_1f(\pmb{X})+c_2g(\pmb{X})]}}{\partial{\pmb{X}}} = c_1\frac{\partial f(\pmb{X})}{\partial{\pmb{X}}} + c_2\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} \\\\ \tag{20}
其中, 为常数。
证明:
\begin{align} \frac{\partial{[c_1f(\pmb{X})+c_2g(\pmb{X})]}}{\partial{\pmb{X}}} &= \begin{bmatrix} \frac{\partial{(c_1f+c_2g)}}{\partial{x_{11}}} & \frac{\partial{(c_1f+c_2g)}}{\partial{x_{12}}} &\cdots &\frac{\partial{(c_1f+c_2g)}}{\partial{x_{1n}}} \\ \frac{\partial{(c_1f+c_2g)}}{\partial{x_{21}}} & \frac{\partial{(c_1f+c_2g)}}{\partial{x_{22}}} &\cdots &\frac{\partial{(c_1f+c_2g)}}{\partial{x_{2n}}} \\ \vdots & \vdots& \vdots& \vdots \\ \frac{\partial{(c_1f+c_2g)}}{\partial{x_{m1}}} & \frac{\partial{(c_1f+c_2g)}}{\partial{x_{m2}}} &\cdots &\frac{\partial{(c_1f+c_2g)}}{\partial{x_{mn}}} \end{bmatrix} \\\\ &= \begin{bmatrix} c_1\frac{\partial{f}}{\partial{x_{11}}}+c_2\frac{\partial{g}}{\partial{x_{11}}}&c_1\frac{\partial{f}}{\partial{x_{12}}}+c_2\frac{\partial{g}}{\partial{x_{12}}}&\cdots&c_1\frac{\partial{f}}{\partial{x_{1n}}}+c_2\frac{\partial{g}}{\partial{x_{1n}}} \\ c_1\frac{\partial{f}}{\partial{x_{21}}}+c_2\frac{\partial{g}}{\partial{x_{21}}}&c_1\frac{\partial{f}}{\partial{x_{22}}}+c_2\frac{\partial{g}}{\partial{x_{22}}}&\cdots&c_1\frac{\partial{f}}{\partial{x_{2n}}}+c_2\frac{\partial{g}}{\partial{x_{2n}}} \\ \vdots & \vdots& \vdots& \vdots \\ c_1\frac{\partial{f}}{\partial{x_{m1}}}+c_2\frac{\partial{g}}{\partial{x_{m1}}}&c_1\frac{\partial{f}}{\partial{x_{m2}}}+c_2\frac{\partial{g}}{\partial{x_{m2}}}&\cdots&c_1\frac{\partial{f}}{\partial{x_{mn}}}+c_2\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix} \\\\ &=c_1 \begin{bmatrix} \frac{\partial{f}}{\partial{x_{11}}}&\frac{\partial{f}}{\partial{x_{12}}}&\cdots&\frac{\partial{f}}{\partial{x_{1n}}} \\ \frac{\partial{f}}{\partial{x_{21}}}&\frac{\partial{f}}{\partial{x_{22}}}&\cdots&\frac{\partial{f}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{f}}{\partial{x_{m1}}}&\frac{\partial{f}}{\partial{x_{m2}}}&\cdots&\frac{\partial{f}}{\partial{x_{mn}}} \end{bmatrix} + c_2\begin{bmatrix}\frac{\partial{g}}{\partial{x_{11}}}&\frac{\partial{g}}{\partial{x_{12}}}&\cdots&\frac{\partial{g}}{\partial{x_{1n}}} \\ \frac{\partial{g}}{\partial{x_{21}}}&\frac{\partial{g}}{\partial{x_{22}}}&\cdots&\frac{\partial{g}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{g}}{\partial{x_{m1}}}&\frac{\partial{g}}{\partial{x_{m2}}}&\cdots&\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix}\\\\ &=c_1\frac{\partial f(\pmb{X})}{\partial{\pmb{X}}} + c_2\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} \end{align} \\\\ \tag{21}
证毕。
**1.3 乘积法则
与一元函数求导乘积法则相同:前导后不导 加 前不导后导
\frac{\partial{[f(\pmb{X})g(\pmb{X})]}}{\partial{\pmb{X}}} = \frac{\partial f(\pmb{X})}{\partial{\pmb{X}}}g(\pmb{X}) +f(\pmb{X})\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} \\\\ \tag{22}
证明:
\begin{align} \frac{\partial{[f(\pmb{X})g(\pmb{X})]}}{\partial{\pmb{X}}} &= \begin{bmatrix} \frac{\partial{(fg)}}{\partial{x_{11}}} & \frac{\partial{(fg)}}{\partial{x_{12}}} & \cdots & \frac{\partial{(fg)}}{\partial{x_{1n}}} \\ \frac{\partial{(fg)}}{\partial{x_{21}}} & \frac{\partial{(fg)}}{\partial{x_{22}}} & \cdots & \frac{\partial{(fg)}}{\partial{x_{2n}}} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial{(fg)}}{\partial{x_{m1}}} & \frac{\partial{(fg)}}{\partial{x_{m2}}} & \cdots & \frac{\partial{(fg)}}{\partial{x_{mn}}} \end{bmatrix} \\\\ &= \begin{bmatrix} \frac{\partial{f}}{\partial{x_{11}}}g+f\frac{\partial{g}}{\partial{x_{11}}} & \frac{\partial{f}}{\partial{x_{12}}}g+f\frac{\partial{g}}{\partial{x_{12}}} & \cdots & \frac{\partial{f}}{\partial{x_{1n}}}g+f\frac{\partial{g}}{\partial{x_{1n}}} \\ \frac{\partial{f}}{\partial{x_{21}}}g+f\frac{\partial{g}}{\partial{x_{21}}} & \frac{\partial{f}}{\partial{x_{22}}}g+f\frac{\partial{g}}{\partial{x_{22}}} & \cdots & \frac{\partial{f}}{\partial{x_{2n}}}g+f\frac{\partial{g}}{\partial{x_{2n}}}\\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial{f}}{\partial{x_{m1}}}g+f\frac{\partial{g}}{\partial{x_{m1}}} & \frac{\partial{f}}{\partial{x_{m2}}}g+f\frac{\partial{g}}{\partial{x_{m2}}} & \cdots & \frac{\partial{f}}{\partial{x_{mn}}}g+f\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix} \\\\ &=\begin{bmatrix} \frac{\partial{f}}{\partial{x_{11}}}&\frac{\partial{f}}{\partial{x_{12}}}&\cdots&\frac{\partial{f}}{\partial{x_{1n}}} \\ \frac{\partial{f}}{\partial{x_{21}}}&\frac{\partial{f}}{\partial{x_{22}}}&\cdots&\frac{\partial{f}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{f}}{\partial{x_{m1}}}&\frac{\partial{f}}{\partial{x_{m2}}}&\cdots&\frac{\partial{f}}{\partial{x_{mn}}} \end{bmatrix}g + f\begin{bmatrix}\frac{\partial{g}}{\partial{x_{11}}}&\frac{\partial{g}}{\partial{x_{12}}}&\cdots&\frac{\partial{g}}{\partial{x_{1n}}} \\ \frac{\partial{g}}{\partial{x_{21}}}&\frac{\partial{g}}{\partial{x_{22}}}&\cdots&\frac{\partial{g}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{g}}{\partial{x_{m1}}}&\frac{\partial{g}}{\partial{x_{m2}}}&\cdots&\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix} \\\\ &=\frac{\partial f(\pmb{X})}{\partial{\pmb{X}}}g(\pmb{X}) +f(\pmb{X})\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} \end{align} \\\\ \tag{23}
证毕。
**1.4 商法则
与一元函数求导商法则相同:(上导下不导 减 上不导下导)除以(下的平方):
\frac{\partial{\left[\frac{f(\pmb{X})}{g(\pmb{X})}\right]}}{\partial{\pmb{X}}} = \frac{1}{g^2(\pmb{X})}\left[ \frac{\partial f(\pmb{X})}{\partial{\pmb{X}}}g(\pmb{X}) -f(\pmb{X})\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} \right] \\\\ \tag{24}
其中, 。
证明:
\begin{align} \frac{\partial{\left[\frac{f(\pmb{X})}{g(\pmb{X})}\right]}}{\partial{\pmb{X}}} &= \begin{bmatrix} \frac{\partial{(\frac{f}{g})}}{\partial{x_{11}}} & \frac{\partial{(\frac{f}{g})}}{\partial{x_{12}}}&\cdots&\frac{\partial{(\frac{f}{g})}}{\partial{x_{1n}}} \\ \frac{\partial{(\frac{f}{g})}}{\partial{x_{21}}} & \frac{\partial{(\frac{f}{g})}}{\partial{x_{22}}}&\cdots&\frac{\partial{(\frac{f}{g})}}{\partial{x_{2n}}} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial{(\frac{f}{g})}}{\partial{x_{m1}}} & \frac{\partial{(\frac{f}{g})}}{\partial{x_{m2}}}&\cdots&\frac{\partial{(\frac{f}{g})}}{\partial{x_{mn}}} \end{bmatrix} \\\\ &= \begin{bmatrix} \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{11}}}g -f\frac{\partial g}{\partial{x_{11}}} \right) & \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{12}}}g -f\frac{\partial g}{\partial{x_{12}}} \right) & \cdots & \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{1n}}}g -f\frac{\partial g}{\partial{x_{1n}}} \right) \\ \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{21}}}g -f\frac{\partial g}{\partial{x_{21}}} \right) & \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{22}}}g -f\frac{\partial g}{\partial{x_{22}}} \right) & \cdots & \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{2n}}}g -f\frac{\partial g}{\partial{x_{2n}}} \right)\\ \vdots & \vdots & \vdots & \vdots \\ \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{m1}}}g -f\frac{\partial g}{\partial{x_{m1}}} \right) & \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{m2}}}g -f\frac{\partial g}{\partial{x_{m2}}} \right) & \cdots & \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{mn}}}g -f\frac{\partial g}{\partial{x_{mn}}} \right) \end{bmatrix} \\\\ &= \frac{1}{g^2}\left( \begin{bmatrix} \frac{\partial{f}}{\partial{x_{11}}}&\frac{\partial{f}}{\partial{x_{12}}}&\cdots&\frac{\partial{f}}{\partial{x_{1n}}} \\ \frac{\partial{f}}{\partial{x_{21}}}&\frac{\partial{f}}{\partial{x_{22}}}&\cdots&\frac{\partial{f}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{f}}{\partial{x_{m1}}}&\frac{\partial{f}}{\partial{x_{m2}}}&\cdots&\frac{\partial{f}}{\partial{x_{mn}}} \end{bmatrix}g - f \begin{bmatrix}\frac{\partial{g}}{\partial{x_{11}}}&\frac{\partial{g}}{\partial{x_{12}}}&\cdots&\frac{\partial{g}}{\partial{x_{1n}}} \\ \frac{\partial{g}}{\partial{x_{21}}}&\frac{\partial{g}}{\partial{x_{22}}}&\cdots&\frac{\partial{g}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{g}}{\partial{x_{m1}}}&\frac{\partial{g}}{\partial{x_{m2}}}&\cdots&\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix} \right) \\\\ &= \frac{1}{g^2(\pmb{X})}\left[ \frac{\partial f(\pmb{X})}{\partial{\pmb{X}}}g(\pmb{X}) -f(\pmb{X})\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} \right] \end{align} \\\\ \tag{25}
证毕。
2、几个公式
2.1
\frac{\partial( \pmb{a}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} = \pmb{a}\pmb{b}^T \\\\ \tag{26}
其中, 为常数向量,。
证明:
\begin{align} \frac{\partial( \pmb{a}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} &= \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{\pmb{X}}} \\\\ &= \begin{bmatrix} \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{11}}} & \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{12}}} & \cdots & \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{1n}}} \\ \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{21}}} & \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{22}}} & \cdots & \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{2n}}} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{m1}}} & \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{m2}}} & \cdots & \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{mn}}} \end{bmatrix}_{m \times n} \\\\ &= \begin{bmatrix} a_1b_1 & a_1b_2 & \cdots & a_1b_n \\ a_2b_1 & a_2b_2 & \cdots & a_2b_n \\ \vdots & \vdots & \vdots & \vdots \\ a_mb_1 & a_mb_2 & \cdots & a_mb_n \end{bmatrix}_{m \times n} \\\\ &= \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_m \end{bmatrix} [b_1,b_2,\cdots,b_n] \\\\ &= \pmb{a}\pmb{b}^T \end{align} \\\\ \tag{27}
证毕。
2.2
\frac{\partial( \pmb{a}^T\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} = \pmb{b}\pmb{a}^T \\\\ \tag{28}
其中, 为常数向量,。
证明:
因为标量的转置等于标量自己,所以有
\frac{\partial(\pmb{a}^T\pmb{X}^T\pmb{b})}{\partial\pmb{X}}=\frac{\partial(\pmb{a}^T\pmb{X}^T\pmb{b})^T}{\partial\pmb{X}}=\frac{\partial(\pmb{b}^T\pmb{X}\pmb{a})}{\partial\pmb{X}} \\\\ \tag{29}
由 式得:
\frac{\partial(\pmb{a}^T\pmb{X}^T\pmb{b})}{\partial\pmb{X}}=\frac{\partial(\pmb{b}^T\pmb{X}\pmb{a})}{\partial\pmb{X}} = \pmb{b}\pmb{a}^T \\\\ \tag{30}
证毕。
2.3
\frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} = \pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X} \\\\ \tag{31}
其中, 为常数向量,。
证明(右击公式,选择在新标签页中打开图片,公式就可以放大了~):
\begin{align} \frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} &= \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{\pmb{X}}} \\\\ &= \begin{bmatrix} \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{11}}} & \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{12}}} & \cdots & \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{1n}}} \\ \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{21}}} & \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{22}}} & \cdots &\frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{2n}}} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{m1}}} & \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{m2}}} & \cdots & \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{mn}}} \\ \end{bmatrix}_{m \times n} \\\\ &= \begin{bmatrix} (a_1b_1x_{11}+a_1b_2x_{21}+\cdots+a_1b_mx_{m1})+(b_1a_1x_{11}+b_1a_2x_{21}+\cdots+b_1a_mx_{m1}) & (a_1b_1x_{12}+a_1b_2x_{22}+\cdots+a_1b_mx_{m2})+(b_1a_1x_{12}+b_1a_2x_{22}+\cdots+b_1a_mx_{m2}) & \cdots & (a_1b_1x_{1n}+a_1b_2x_{2n}+\cdots+a_1b_mx_{mn})+(b_1a_1x_{1n}+b_1a_2x_{2n}+\cdots+b_1a_mx_{mn}) \\ (a_2b_1x_{11}+a_2b_2x_{21}+\cdots+a_2b_mx_{m1})+(b_2a_1x_{11}+b_2a_2x_{21}+\cdots+b_2a_mx_{m1}) & (a_2b_1x_{12}+a_2b_2x_{22}+\cdots+a_2b_mx_{m2})+(b_2a_1x_{12}+b_2a_2x_{22}+\cdots+b_2a_mx_{m2}) & \cdots & (a_2b_1x_{1n}+a_2b_2x_{2n}+\cdots+a_2b_mx_{mn})+(b_2a_1x_{1n}+b_2a_2x_{2n}+\cdots+b_2a_mx_{mn}) \\ \vdots & \vdots & \vdots & \vdots \\ (a_mb_1x_{11}+a_mb_2x_{21}+\cdots+a_mb_mx_{m1})+(b_ma_1x_{11}+b_ma_2x_{21}+\cdots+b_ma_mx_{m1}) & (a_mb_1x_{12}+a_mb_2x_{22}+\cdots+a_mb_mx_{m2})+(b_ma_1x_{12}+b_ma_2x_{22}+\cdots+b_ma_mx_{m2}) & \cdots & (a_mb_1x_{1n}+a_mb_2x_{2n}+\cdots+a_mb_mx_{mn})+(b_ma_1x_{1n}+b_ma_2x_{2n}+\cdots+b_ma_mx_{mn}) \end{bmatrix} \\\\ &= \begin{bmatrix} a_1b_1x_{11}+a_1b_2x_{21}+\cdots+a_1b_mx_{m1} & a_1b_1x_{12}+a_1b_2x_{22}+\cdots+a_1b_mx_{m2} & \cdots & a_1b_1x_{1n}+a_1b_2x_{2n}+\cdots+a_1b_mx_{mn} \\ a_2b_1x_{11}+a_2b_2x_{21}+\cdots+a_2b_mx_{m1} & a_2b_1x_{12}+a_2b_2x_{22}+\cdots+a_2b_mx_{m2} & \cdots & a_2b_1x_{1n}+a_2b_2x_{2n}+\cdots+a_2b_mx_{mn} \\ \vdots & \vdots & \vdots & \vdots \\ a_mb_1x_{11}+a_mb_2x_{21}+\cdots+a_mb_mx_{m1} & a_mb_1x_{12}+a_mb_2x_{22}+\cdots+a_mb_mx_{m2} & \cdots & a_mb_1x_{1n}+a_mb_2x_{2n}+\cdots+a_mb_mx_{mn} \end{bmatrix} + \begin{bmatrix} b_1a_1x_{11}+b_1a_2x_{21}+\cdots+b_1a_mx_{m1} & b_1a_1x_{12}+b_1a_2x_{22}+\cdots+b_1a_mx_{m2} & \cdots & b_1a_1x_{1n}+b_1a_2x_{2n}+\cdots+b_1a_mx_{mn} \\ b_2a_1x_{11}+b_2a_2x_{21}+\cdots+b_2a_mx_{m1} & b_2a_1x_{12}+b_2a_2x_{22}+\cdots+b_2a_mx_{m2} & \cdots & b_2a_1x_{1n}+b_2a_2x_{2n}+\cdots+b_2a_mx_{mn} \\ \vdots & \vdots & \vdots & \vdots \\ b_ma_1x_{11}+b_ma_2x_{21}+\cdots+b_ma_mx_{m1} & b_ma_1x_{12}+b_ma_2x_{22}+\cdots+b_ma_mx_{m2} & \cdots & b_ma_1x_{1n}+b_ma_2x_{2n}+\cdots+b_ma_mx_{mn} \end{bmatrix} \\\\ &= \begin{bmatrix} a_1b_1 & a_1b_2 & \cdots & a_1b_m \\ a_2b_1 & a_2b_2 & \cdots & a_2b_m \\ \vdots & \vdots & \vdots & \vdots \\ a_mb_1 & a_mb_2 & \cdots & a_mb_m \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n} \\ x_{21} & x_{22} & \cdots & x_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ x_{m1} & x_{m2} & \cdots & x_{mn} \end{bmatrix} + \begin{bmatrix} b_1a_1 & b_1a_2 & \cdots & b_1a_m \\ b_2a_1 & b_2a_2 & \cdots & b_2a_m \\ \vdots & \vdots & \vdots & \vdots \\ b_ma_1 & b_ma_2 & \cdots & b_ma_m \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n} \\ x_{21} & x_{22} & \cdots & x_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ x_{m1} & x_{m2} & \cdots & x_{mn} \end{bmatrix} \\\\ &= \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_m \end{bmatrix} [b_1, b_2, \cdots, b_m] \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n} \\ x_{21} & x_{22} & \cdots & x_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ x_{m1} & x_{m2} & \cdots & x_{mn} \end{bmatrix} + \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_m \end{bmatrix} [a_1, a_2, \cdots, a_m] \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n} \\ x_{21} & x_{22} & \cdots & x_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ x_{m1} & x_{m2} & \cdots & x_{mn} \end{bmatrix} \\\\ &= \pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X} \end{align} \\\\ \tag{32}
证毕。
2.4
\frac{\partial( \pmb{a}^T\pmb{X}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} = \pmb{X}\pmb{b}\pmb{a}^T+\pmb{X}\pmb{a}\pmb{b}^T \\\\ \tag{33}
其中, 为常数向量,。
证明:
我们来看一下 式:
\begin{align*} \text{D}_{\pmb{X}}f(\pmb{X})&= \frac{\partial f(\pmb{X})}{\partial \pmb{X}^T_{m\times n}} \\\\ &= \left[ \matrix{ \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial x_{mn}} } \right]_{n\times m} \end{align*} \\\\ \tag{本质篇_9}
再来看一下 式:
\begin{align*} \nabla_{\pmb{X}}f(\pmb{X})&= \frac{\partial f(\pmb{X})}{\partial \pmb{X}_{m\times n}} \\\\ &= \left[ \matrix{ \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{12}}&\cdots&\frac{\partial f}{\partial x_{1n}} \\ \frac{\partial f}{\partial x_{21}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{2n}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{m1}}&\frac{\partial f}{\partial x_{m2}}&\cdots&\frac{\partial f}{\partial x_{mn}} } \right]_{m\times n} \end{align*} \\\\ \tag{本质篇_11}
正如本质篇_三._2.5.1 总结的那样,这两个结果互为转置,即:
\frac{\partial f(\pmb{X})}{\partial{\pmb{X}}^T_{m\times n}} = \left(\frac{\partial f(\pmb{X})}{\partial{\pmb{X}_{m\times n}}}\right)^T \\\\ \tag{34}
所以,我们把 式中的分母的矩阵变元写为转置,就有:
\begin{align} \frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}^T} &= \left(\frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}}\right)^T \\\\ &= (\pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X})^T \\\\ &= \pmb{X}^T\pmb{b}\pmb{a}^T+\pmb{X}^T\pmb{a}\pmb{b}^T \end{align} \\\\ \tag{35}
对于 式,我们将其写为如下形式:
\frac{\partial( \pmb{a}^T\pmb{X}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} =\frac{\partial( \pmb{a}^T(\pmb{X}^T)(\pmb{X}^T)^T\pmb{b})}{\partial{(\pmb{X}}^T)^T} \\\\ \tag{36}
然后对 式使用 式,得:
\begin{align} \frac{\partial( \pmb{a}^T\pmb{X}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} &=\frac{\partial( \pmb{a}^T(\pmb{X}^T)(\pmb{X}^T)^T\pmb{b})}{\partial{(\pmb{X}}^T)^T} \\\\ &= (\pmb{X}^T)^T\pmb{b}\pmb{a}^T+(\pmb{X}^T)^T\pmb{a}\pmb{b}^T \\\\ &= \pmb{X}\pmb{b}\pmb{a}^T+\pmb{X}\pmb{a}\pmb{b}^T \end{align} \\\\ \tag{37}
证毕。
三. 完
本文到这里就结束了,相信大家也和我一样,会觉的后面那几个求导公式,如果按照定义去推导的话,十分的麻烦,而且容易出错。
所以, 在下一篇文章中,我们将介绍向量变元的实值标量函数、矩阵变元的实值标量函数进阶的矩阵求导的技巧:矩阵的迹 与一阶实矩阵微分 ,它们可以极大地化简我们的推导过程。
欢迎大家点赞、关注、收藏、转发噢~
矩阵求导系列其他文章:
对称矩阵的求导,以多元正态分布的极大似然估计为例(矩阵求导——补充篇) - Iterator的文章 - 知乎
矩阵求导公式的数学推导(矩阵求导——进阶篇) - Iterator的文章 - 知乎
矩阵求导的本质与分子布局、分母布局的本质(矩阵求导——本质篇) - Iterator的文章 - 知乎