Logo
0

矩阵求导公式的数学推导(矩阵求导——基础篇)

矩阵求导公式的数学推导(矩阵求导——基础篇)

一. 向量变元的实值标量函数

1、四个法则

1.1 常数求导

与一元函数常数求导相同:结果为零向量

\frac{\partial c}{ \partial \pmb{x}}=\pmb{0}_{n \times 1} \\\\ \tag{1}

其中, cc 为常数。

证明:

\begin{align} \frac{\partial{c}}{\partial{\pmb{x}}} &= \begin{bmatrix} \frac{\partial{c}}{\partial{x_1}} \\ \frac{\partial{c}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{c}}{\partial{x_n}} \end{bmatrix} \\\\ &= \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0\end{bmatrix} \\\\ &=\pmb{0}_{n \times 1}\end{align} \\\\ \tag{2}

证毕。

**1.2 线性法则

与一元函数求导线性法则相同:相加再求导等于求导再相加,常数提外面

\frac{\partial{[c_1f(\pmb{x})+c_2g(\pmb{x})]}}{\partial{\pmb{x}}} = c_1\frac{\partial f(\pmb{x})}{\partial{\pmb{x}}} + c_2\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} \\\\ \tag{3}

其中, c1,c2c_1,c_2 为常数。

证明:

\begin{align} \frac{\partial{[c_1f(\pmb{x})+c_2g(\pmb{x})]}}{\partial{\pmb{x}}} &= \begin{bmatrix} \frac{\partial{(c_1f+c_2g)}}{\partial{x_1}} \\ \frac{\partial{(c_1f+c_2g)}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{(c_1f+c_2g)}}{\partial{x_n}} \end{bmatrix} \\\\ &= \begin{bmatrix} c_1\frac{\partial{f}}{\partial{x_1}}+c_2\frac{\partial{g}}{\partial{x_1}} \\ c_1\frac{\partial{f}}{\partial{x_2}}+c_2\frac{\partial{g}}{\partial{x_2}} \\ \vdots \\ c_1\frac{\partial{f}}{\partial{x_n}}+c_2\frac{\partial{g}}{\partial{x_n}} \end{bmatrix} \\\\ &=c_1\begin{bmatrix} \frac{\partial{f}}{\partial{x_1}} \\ \frac{\partial{f}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{f}}{\partial{x_n}} \end{bmatrix} + c_2\begin{bmatrix} \frac{\partial{g}}{\partial{x_1}} \\ \frac{\partial{g}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{g}}{\partial{x_n}} \end{bmatrix} \\\\ &=c_1\frac{\partial f(\pmb{x})}{\partial{\pmb{x}}} + c_2\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} \end{align} \\\\ \tag{4}

证毕。

1.3 乘积法则

与一元函数求导乘积法则相同:前导后不导 前不导后导

\frac{\partial{[f(\pmb{x})g(\pmb{x})]}}{\partial{\pmb{x}}} = \frac{\partial f(\pmb{x})}{\partial{\pmb{x}}}g(\pmb{x}) +f(\pmb{x})\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} \\\\ \tag{5}

证明:

\begin{align} \frac{\partial{[f(\pmb{x})g(\pmb{x})]}}{\partial{\pmb{x}}} &= \begin{bmatrix} \frac{\partial{(fg)}}{\partial{x_1}} \\ \frac{\partial{(fg)}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{(fg)}}{\partial{x_n}} \end{bmatrix} \\\\ &= \begin{bmatrix} \frac{\partial{f}}{\partial{x_1}}g+f\frac{\partial{g}}{\partial{x_1}} \\ \frac{\partial{f}}{\partial{x_2}}g+f\frac{\partial{g}}{\partial{x_2}}\\ \vdots \\ \frac{\partial{f}}{\partial{x_n}}g+f\frac{\partial{g}}{\partial{x_n}} \end{bmatrix} \\\\ &=\begin{bmatrix} \frac{\partial{f}}{\partial{x_1}} \\ \frac{\partial{f}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{f}}{\partial{x_n}} \end{bmatrix}g + f\begin{bmatrix} \frac{\partial{g}}{\partial{x_1}} \\ \frac{\partial{g}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{g}}{\partial{x_n}} \end{bmatrix} \\\\ &=\frac{\partial f(\pmb{x})}{\partial{\pmb{x}}}g(\pmb{x}) +f(\pmb{x})\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} \end{align} \\\\ \tag{6}

证毕。

**1.4 商法则

与一元函数求导商法则相同:(上导下不导 上不导下导)除以(下的平方):

\frac{\partial{\left[\frac{f(\pmb{x})}{g(\pmb{x})}\right]}}{\partial{\pmb{x}}} = \frac{1}{g^2(\pmb{x})}\left[ \frac{\partial f(\pmb{x})}{\partial{\pmb{x}}}g(\pmb{x}) -f(\pmb{x})\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} \right] \\\\ \tag{7}

其中, g(x)0g(\pmb{x})\neq0

证明:

\begin{align} \frac{\partial{\left[\frac{f(\pmb{x})}{g(\pmb{x})}\right]}}{\partial{\pmb{x}}} &= \begin{bmatrix} \frac{\partial{(\frac{f}{g})}}{\partial{x_1}} \\ \frac{\partial{(\frac{f}{g})}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{(\frac{f}{g})}}{\partial{x_n}} \end{bmatrix} \\\\ &= \begin{bmatrix} \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_1}}g -f\frac{\partial g}{\partial{x_1}} \right) \\ \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_2}}g -f\frac{\partial g}{\partial{x_2}} \right)\\ \vdots \\ \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_n}}g -f\frac{\partial g}{\partial{x_n}} \right) \end{bmatrix} \\\\ &= \frac{1}{g^2}\left( \begin{bmatrix} \frac{\partial{f}}{\partial{x_1}} \\ \frac{\partial{f}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{f}}{\partial{x_n}} \end{bmatrix}g - f\begin{bmatrix} \frac{\partial{g}}{\partial{x_1}} \\ \frac{\partial{g}}{\partial{x_2}} \\ \vdots \\ \frac{\partial{g}}{\partial{x_n}} \end{bmatrix} \right) \\\\ &=\frac{1}{g^2(\pmb{x})}\left[ \frac{\partial f(\pmb{x})}{\partial{\pmb{x}}}g(\pmb{x}) -f(\pmb{x})\frac{\partial g(\pmb{x})}{\partial{\pmb{x}}} \right] \end{align} \\\\ \tag{8}

证毕。

2、几个公式

2.1

\frac{\partial( \pmb{x}^T \pmb{a})}{\partial{\pmb{x}}} = \frac{\partial( \pmb{a}^T\pmb{x})}{\partial{\pmb{x}}} = \pmb{a} \\\\ \tag{9}

其中, a\pmb{a} 为常数向量, a=(a1,a2,,an)T\pmb{a}=(a_1,a_2,\cdots,a_n)^T

证明:

\begin{align} \frac{\partial( \pmb{x}^T \pmb{a})}{\partial{\pmb{x}}} &= \frac{\partial( \pmb{a}^T\pmb{x})}{\partial{\pmb{x}}} \\\\ &= \frac{\partial( a_1x_1+a_2x_2+\cdots+a_nx_n)}{\partial{\pmb{x}}} \\\\ &= \begin{bmatrix} \frac{\partial( a_1x_1+a_2x_2+\cdots+a_nx_n)}{\partial{x_1}} \\ \frac{\partial( a_1x_1+a_2x_2+\cdots+a_nx_n)}{\partial{x_2}} \\ \vdots \\ \frac{\partial( a_1x_1+a_2x_2+\cdots+a_nx_n)}{\partial{x_n}} \end{bmatrix} \\\\ &= \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{bmatrix} \\\\ &= \pmb{a} \end{align} \\\\ \tag{10}

证毕。

2.2

\frac{\partial( \pmb{x}^T \pmb{x})}{\partial{\pmb{x}}} = 2\pmb{x} \\\\ \tag{11}

证明:

\begin{align} \frac{\partial( \pmb{x}^T \pmb{x})}{\partial{\pmb{x}}} &= \frac{\partial( x_1^2+x_2^2+\cdots+x_n^2)}{\partial{\pmb{x}}} \\\\ &= \begin{bmatrix} \frac{\partial( x_1^2+x_2^2+\cdots+x_n^2)}{\partial{x_1}} \\ \frac{\partial( x_1^2+x_2^2+\cdots+x_n^2)}{\partial{x_2}} \\ \vdots \\ \frac{\partial( x_1^2+x_2^2+\cdots+x_n^2)}{\partial{x_n}} \end{bmatrix} \\\\ &= \begin{bmatrix} 2x_1 \\ 2x_2 \\ \vdots \\ 2x_n \end{bmatrix} \\\\ &= 2\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \\\\ &= 2\pmb{x} \end{align} \\\\ \tag{12}

证毕。

2.3

\frac{\partial( \pmb{x}^T \pmb{A}\pmb{x})}{\partial{\pmb{x}}} = \pmb{A}\pmb{x}+\pmb{A}^T \pmb{x} \\\\ \tag{13}

其中, An×n\pmb{A}_{n \times n} 是常数矩阵, An×n=(aij)i=1,j=1n,n\pmb{A}_{n \times n}=(a_{ij})_{i=1,j=1}^{n,n}

证明:

\begin{align} \frac{\partial( \pmb{x}^T \pmb{A}\pmb{x})}{\partial{\pmb{x}}} &= \frac{\partial(a_{11}x_1x_1+a_{12}x_1x_2+\cdots+a_{1n}x_1x_n \\ +a_{21}x_2x_1+a_{22}x_2x_2+\cdots+a_{2n}x_2x_n \\ + \cdots \\ +a_{n1}x_nx_1+a_{n2}x_nx_2+\cdots+a_{nn}x_nx_n)}{\partial{\pmb{x}}} \\\\ &= \begin{bmatrix} \frac{\partial(a_{11}x_1x_1+a_{12}x_1x_2+\cdots+a_{1n}x_1x_n \\ +a_{21}x_2x_1+a_{22}x_2x_2+\cdots+a_{2n}x_2x_n \\ + \cdots \\ +a_{n1}x_nx_1+a_{n2}x_nx_2+\cdots+a_{nn}x_nx_n)}{\partial{x_1}} \\ \frac{\partial(a_{11}x_1x_1+a_{12}x_1x_2+\cdots+a_{1n}x_1x_n \\ +a_{21}x_2x_1+a_{22}x_2x_2+\cdots+a_{2n}x_2x_n \\ + \cdots \\ +a_{n1}x_nx_1+a_{n2}x_nx_2+\cdots+a_{nn}x_nx_n)}{\partial{x_2}} \\ \vdots \\ \frac{\partial(a_{11}x_1x_1+a_{12}x_1x_2+\cdots+a_{1n}x_1x_n \\ +a_{21}x_2x_1+a_{22}x_2x_2+\cdots+a_{2n}x_2x_n \\ + \cdots \\ +a_{n1}x_nx_1+a_{n2}x_nx_2+\cdots+a_{nn}x_nx_n)}{\partial{x_n}} \end{bmatrix} \\\\ &= \begin{bmatrix} (a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n)+(a_{11}x_1+a_{21}x_2+\cdots+a_{n1}x_n) \\ (a_{21}x_1+a_{22}x_2+\cdots+a_{2n}x_n)+(a_{12}x_1+a_{22}x_2+\cdots+a_{n2}x_n) \\ \vdots \\ (a_{n1}x_1+a_{n2}x_2+\cdots+a_{nn}x_n)+(a_{1n}x_1+a_{2n}x_2+\cdots+a_{nn}x_n) \end{bmatrix} \\\\ &= \begin{bmatrix} a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n \\ a_{21}x_1+a_{22}x_2+\cdots+a_{2n}x_n \\ \vdots \\ a_{n1}x_1+a_{n2}x_2+\cdots+a_{nn}x_n \end{bmatrix} +\begin{bmatrix} a_{11}x_1+a_{21}x_2+\cdots+a_{n1}x_n \\ a_{12}x_1+a_{22}x_2+\cdots+a_{n2}x_n \\ \vdots \\ a_{1n}x_1+a_{2n}x_2+\cdots+a_{nn}x_n \end{bmatrix} \\\\ &= \begin{bmatrix} a_{11}&a_{12}&\cdots&a_{1n}\\ a_{21}&a_{22}&\cdots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{n1}&a_{n2}&\cdots&a_{nn} \end{bmatrix}\begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_n \end{bmatrix} +\begin{bmatrix} a_{11}&a_{21}&\cdots&a_{n1}\\ a_{12}&a_{22}&\cdots&a_{n2}\\ \vdots&\vdots&\ddots&\vdots\\ a_{1n}&a_{2n}&\cdots&a_{nn} \end{bmatrix}\begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_n \end{bmatrix} \\\\ &= \pmb{A}\pmb{x}+\pmb{A}^T \pmb{x} \end{align} \\\\ \tag{14}

证毕。

2.4

\frac{\partial( \pmb{a}^T\pmb{x}\pmb{x}^T\pmb{b})}{\partial{\pmb{x}}} = \pmb{a}\pmb{b}^T\pmb{x}+\pmb{b}\pmb{a}^T\pmb{x} \\\\ \tag{15}

其中, a,b\pmb{a},\pmb{b} 为常数向量, a=(a1,a2,,an)T,b=(b1,b2,,bn)T\pmb{a}=(a_1,a_2,\cdots,a_n)^T,\pmb{b}=(b_1,b_2,\cdots,b_n)^T

证明:

因为 aTx=xTa,xTb=bTx\pmb{a}^T\pmb{x}=\pmb{x}^T\pmb{a},\pmb{x}^T\pmb{b}=\pmb{b}^T\pmb{x} ,所以有

\frac{\partial( \pmb{a}^T\pmb{x}\pmb{x}^T\pmb{b})}{\partial{\pmb{x}}} = \frac{\partial( \pmb{x}^T\pmb{a}\pmb{b}^T\pmb{x})}{\partial{\pmb{x}}} \\\\ \tag{16}

又因为 abT\pmb{a}\pmb{b}^Tn×nn \times n 常数矩阵,由 (13)(13) 式得:

\frac{\partial( \pmb{a}^T\pmb{x}\pmb{x}^T\pmb{b})}{\partial{\pmb{x}}} = \frac{\partial( \pmb{x}^T\pmb{a}\pmb{b}^T\pmb{x})}{\partial{\pmb{x}}}=\pmb{a}\pmb{b}^T\pmb{x}+\pmb{b}\pmb{a}^T\pmb{x} \\\\ \tag{17}

证毕。

二. 矩阵变元的实值标量函数

1、四个法则

1.1 常数求导

与一元函数常数求导相同:结果为零矩阵

\frac{\partial c}{ \partial \pmb{X}}=\pmb{0}_{m \times n} \\\\ \tag{18}

其中, cc 为常数。

证明:

\begin{align} \frac{\partial{c}}{\partial{\pmb{X}}} &= \begin{bmatrix} \frac{\partial{c}}{\partial{x_{11}}}&\frac{\partial{c}}{\partial{x_{12}}}&\cdots&\frac{\partial{c}}{\partial{x_{1n}}} \\ \frac{\partial{c}}{\partial{x_{21}}}&\frac{\partial{c}}{\partial{x_{22}}}&\cdots&\frac{\partial{c}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{c}}{\partial{x_{m1}}}&\frac{\partial{c}}{\partial{x_{m2}}}&\cdots&\frac{\partial{c}}{\partial{x_{mn}}} \end{bmatrix}_{m \times n} \\\\ &= \begin{bmatrix} 0&0&\cdots&0 \\ 0&0&\cdots&0 \\ \vdots &\vdots & \vdots & \vdots\\ 0&0&\cdots&0 \end{bmatrix}_{m \times n} \\\\ &=\pmb{0}_{m \times n}\end{align} \\\\ \tag{19}

证毕。

**1.2 线性法则

与一元函数求导线性法则相同:相加再求导等于求导再相加,常数提外面

\frac{\partial{[c_1f(\pmb{X})+c_2g(\pmb{X})]}}{\partial{\pmb{X}}} = c_1\frac{\partial f(\pmb{X})}{\partial{\pmb{X}}} + c_2\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} \\\\ \tag{20}

其中, c1,c2c_1,c_2 为常数。

证明:

\begin{align} \frac{\partial{[c_1f(\pmb{X})+c_2g(\pmb{X})]}}{\partial{\pmb{X}}} &= \begin{bmatrix} \frac{\partial{(c_1f+c_2g)}}{\partial{x_{11}}} & \frac{\partial{(c_1f+c_2g)}}{\partial{x_{12}}} &\cdots &\frac{\partial{(c_1f+c_2g)}}{\partial{x_{1n}}} \\ \frac{\partial{(c_1f+c_2g)}}{\partial{x_{21}}} & \frac{\partial{(c_1f+c_2g)}}{\partial{x_{22}}} &\cdots &\frac{\partial{(c_1f+c_2g)}}{\partial{x_{2n}}} \\ \vdots & \vdots& \vdots& \vdots \\ \frac{\partial{(c_1f+c_2g)}}{\partial{x_{m1}}} & \frac{\partial{(c_1f+c_2g)}}{\partial{x_{m2}}} &\cdots &\frac{\partial{(c_1f+c_2g)}}{\partial{x_{mn}}} \end{bmatrix} \\\\ &= \begin{bmatrix} c_1\frac{\partial{f}}{\partial{x_{11}}}+c_2\frac{\partial{g}}{\partial{x_{11}}}&c_1\frac{\partial{f}}{\partial{x_{12}}}+c_2\frac{\partial{g}}{\partial{x_{12}}}&\cdots&c_1\frac{\partial{f}}{\partial{x_{1n}}}+c_2\frac{\partial{g}}{\partial{x_{1n}}} \\ c_1\frac{\partial{f}}{\partial{x_{21}}}+c_2\frac{\partial{g}}{\partial{x_{21}}}&c_1\frac{\partial{f}}{\partial{x_{22}}}+c_2\frac{\partial{g}}{\partial{x_{22}}}&\cdots&c_1\frac{\partial{f}}{\partial{x_{2n}}}+c_2\frac{\partial{g}}{\partial{x_{2n}}} \\ \vdots & \vdots& \vdots& \vdots \\ c_1\frac{\partial{f}}{\partial{x_{m1}}}+c_2\frac{\partial{g}}{\partial{x_{m1}}}&c_1\frac{\partial{f}}{\partial{x_{m2}}}+c_2\frac{\partial{g}}{\partial{x_{m2}}}&\cdots&c_1\frac{\partial{f}}{\partial{x_{mn}}}+c_2\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix} \\\\ &=c_1 \begin{bmatrix} \frac{\partial{f}}{\partial{x_{11}}}&\frac{\partial{f}}{\partial{x_{12}}}&\cdots&\frac{\partial{f}}{\partial{x_{1n}}} \\ \frac{\partial{f}}{\partial{x_{21}}}&\frac{\partial{f}}{\partial{x_{22}}}&\cdots&\frac{\partial{f}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{f}}{\partial{x_{m1}}}&\frac{\partial{f}}{\partial{x_{m2}}}&\cdots&\frac{\partial{f}}{\partial{x_{mn}}} \end{bmatrix} + c_2\begin{bmatrix}\frac{\partial{g}}{\partial{x_{11}}}&\frac{\partial{g}}{\partial{x_{12}}}&\cdots&\frac{\partial{g}}{\partial{x_{1n}}} \\ \frac{\partial{g}}{\partial{x_{21}}}&\frac{\partial{g}}{\partial{x_{22}}}&\cdots&\frac{\partial{g}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{g}}{\partial{x_{m1}}}&\frac{\partial{g}}{\partial{x_{m2}}}&\cdots&\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix}\\\\ &=c_1\frac{\partial f(\pmb{X})}{\partial{\pmb{X}}} + c_2\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} \end{align} \\\\ \tag{21}

证毕。

**1.3 乘积法则

与一元函数求导乘积法则相同:前导后不导 前不导后导

\frac{\partial{[f(\pmb{X})g(\pmb{X})]}}{\partial{\pmb{X}}} = \frac{\partial f(\pmb{X})}{\partial{\pmb{X}}}g(\pmb{X}) +f(\pmb{X})\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} \\\\ \tag{22}

证明:

\begin{align} \frac{\partial{[f(\pmb{X})g(\pmb{X})]}}{\partial{\pmb{X}}} &= \begin{bmatrix} \frac{\partial{(fg)}}{\partial{x_{11}}} & \frac{\partial{(fg)}}{\partial{x_{12}}} & \cdots & \frac{\partial{(fg)}}{\partial{x_{1n}}} \\ \frac{\partial{(fg)}}{\partial{x_{21}}} & \frac{\partial{(fg)}}{\partial{x_{22}}} & \cdots & \frac{\partial{(fg)}}{\partial{x_{2n}}} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial{(fg)}}{\partial{x_{m1}}} & \frac{\partial{(fg)}}{\partial{x_{m2}}} & \cdots & \frac{\partial{(fg)}}{\partial{x_{mn}}} \end{bmatrix} \\\\ &= \begin{bmatrix} \frac{\partial{f}}{\partial{x_{11}}}g+f\frac{\partial{g}}{\partial{x_{11}}} & \frac{\partial{f}}{\partial{x_{12}}}g+f\frac{\partial{g}}{\partial{x_{12}}} & \cdots & \frac{\partial{f}}{\partial{x_{1n}}}g+f\frac{\partial{g}}{\partial{x_{1n}}} \\ \frac{\partial{f}}{\partial{x_{21}}}g+f\frac{\partial{g}}{\partial{x_{21}}} & \frac{\partial{f}}{\partial{x_{22}}}g+f\frac{\partial{g}}{\partial{x_{22}}} & \cdots & \frac{\partial{f}}{\partial{x_{2n}}}g+f\frac{\partial{g}}{\partial{x_{2n}}}\\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial{f}}{\partial{x_{m1}}}g+f\frac{\partial{g}}{\partial{x_{m1}}} & \frac{\partial{f}}{\partial{x_{m2}}}g+f\frac{\partial{g}}{\partial{x_{m2}}} & \cdots & \frac{\partial{f}}{\partial{x_{mn}}}g+f\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix} \\\\ &=\begin{bmatrix} \frac{\partial{f}}{\partial{x_{11}}}&\frac{\partial{f}}{\partial{x_{12}}}&\cdots&\frac{\partial{f}}{\partial{x_{1n}}} \\ \frac{\partial{f}}{\partial{x_{21}}}&\frac{\partial{f}}{\partial{x_{22}}}&\cdots&\frac{\partial{f}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{f}}{\partial{x_{m1}}}&\frac{\partial{f}}{\partial{x_{m2}}}&\cdots&\frac{\partial{f}}{\partial{x_{mn}}} \end{bmatrix}g + f\begin{bmatrix}\frac{\partial{g}}{\partial{x_{11}}}&\frac{\partial{g}}{\partial{x_{12}}}&\cdots&\frac{\partial{g}}{\partial{x_{1n}}} \\ \frac{\partial{g}}{\partial{x_{21}}}&\frac{\partial{g}}{\partial{x_{22}}}&\cdots&\frac{\partial{g}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{g}}{\partial{x_{m1}}}&\frac{\partial{g}}{\partial{x_{m2}}}&\cdots&\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix} \\\\ &=\frac{\partial f(\pmb{X})}{\partial{\pmb{X}}}g(\pmb{X}) +f(\pmb{X})\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} \end{align} \\\\ \tag{23}

证毕。

**1.4 商法则

与一元函数求导商法则相同:(上导下不导 上不导下导)除以(下的平方):

\frac{\partial{\left[\frac{f(\pmb{X})}{g(\pmb{X})}\right]}}{\partial{\pmb{X}}} = \frac{1}{g^2(\pmb{X})}\left[ \frac{\partial f(\pmb{X})}{\partial{\pmb{X}}}g(\pmb{X}) -f(\pmb{X})\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} \right] \\\\ \tag{24}

其中, g(X)0g(\pmb{X})\neq0

证明:

\begin{align} \frac{\partial{\left[\frac{f(\pmb{X})}{g(\pmb{X})}\right]}}{\partial{\pmb{X}}} &= \begin{bmatrix} \frac{\partial{(\frac{f}{g})}}{\partial{x_{11}}} & \frac{\partial{(\frac{f}{g})}}{\partial{x_{12}}}&\cdots&\frac{\partial{(\frac{f}{g})}}{\partial{x_{1n}}} \\ \frac{\partial{(\frac{f}{g})}}{\partial{x_{21}}} & \frac{\partial{(\frac{f}{g})}}{\partial{x_{22}}}&\cdots&\frac{\partial{(\frac{f}{g})}}{\partial{x_{2n}}} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial{(\frac{f}{g})}}{\partial{x_{m1}}} & \frac{\partial{(\frac{f}{g})}}{\partial{x_{m2}}}&\cdots&\frac{\partial{(\frac{f}{g})}}{\partial{x_{mn}}} \end{bmatrix} \\\\ &= \begin{bmatrix} \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{11}}}g -f\frac{\partial g}{\partial{x_{11}}} \right) & \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{12}}}g -f\frac{\partial g}{\partial{x_{12}}} \right) & \cdots & \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{1n}}}g -f\frac{\partial g}{\partial{x_{1n}}} \right) \\ \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{21}}}g -f\frac{\partial g}{\partial{x_{21}}} \right) & \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{22}}}g -f\frac{\partial g}{\partial{x_{22}}} \right) & \cdots & \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{2n}}}g -f\frac{\partial g}{\partial{x_{2n}}} \right)\\ \vdots & \vdots & \vdots & \vdots \\ \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{m1}}}g -f\frac{\partial g}{\partial{x_{m1}}} \right) & \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{m2}}}g -f\frac{\partial g}{\partial{x_{m2}}} \right) & \cdots & \frac{1}{g^2}\left( \frac{\partial f}{\partial{x_{mn}}}g -f\frac{\partial g}{\partial{x_{mn}}} \right) \end{bmatrix} \\\\ &= \frac{1}{g^2}\left( \begin{bmatrix} \frac{\partial{f}}{\partial{x_{11}}}&\frac{\partial{f}}{\partial{x_{12}}}&\cdots&\frac{\partial{f}}{\partial{x_{1n}}} \\ \frac{\partial{f}}{\partial{x_{21}}}&\frac{\partial{f}}{\partial{x_{22}}}&\cdots&\frac{\partial{f}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{f}}{\partial{x_{m1}}}&\frac{\partial{f}}{\partial{x_{m2}}}&\cdots&\frac{\partial{f}}{\partial{x_{mn}}} \end{bmatrix}g - f \begin{bmatrix}\frac{\partial{g}}{\partial{x_{11}}}&\frac{\partial{g}}{\partial{x_{12}}}&\cdots&\frac{\partial{g}}{\partial{x_{1n}}} \\ \frac{\partial{g}}{\partial{x_{21}}}&\frac{\partial{g}}{\partial{x_{22}}}&\cdots&\frac{\partial{g}}{\partial{x_{2n}}} \\ \vdots &\vdots & \vdots & \vdots\\ \frac{\partial{g}}{\partial{x_{m1}}}&\frac{\partial{g}}{\partial{x_{m2}}}&\cdots&\frac{\partial{g}}{\partial{x_{mn}}} \end{bmatrix} \right) \\\\ &= \frac{1}{g^2(\pmb{X})}\left[ \frac{\partial f(\pmb{X})}{\partial{\pmb{X}}}g(\pmb{X}) -f(\pmb{X})\frac{\partial g(\pmb{X})}{\partial{\pmb{X}}} \right] \end{align} \\\\ \tag{25}

证毕。

2、几个公式

2.1

\frac{\partial( \pmb{a}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} = \pmb{a}\pmb{b}^T \\\\ \tag{26}

其中, am×1,bn×1\pmb{a}_{m \times 1},\pmb{b}_{n \times 1} 为常数向量,a=(a1,a2,,am)T,b=(b1,b2,,bn)T\pmb{a}_=(a_1,a_2,\cdots,a_m)^T,\pmb{b}=(b_1,b_2,\cdots,b_n)^T

证明:

\begin{align} \frac{\partial( \pmb{a}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} &= \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{\pmb{X}}} \\\\ &= \begin{bmatrix} \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{11}}} & \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{12}}} & \cdots & \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{1n}}} \\ \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{21}}} & \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{22}}} & \cdots & \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{2n}}} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{m1}}} & \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{m2}}} & \cdots & \frac{\partial(a_1b_1x_{11}+a_1b_2x_{12}+\cdots+a_1b_nx_{1n} \\ +a_2b_1x_{21}+a_2b_2x_{22}+\cdots+a_2b_nx_{2n}\\ +\cdots \\ +a_mb_1x_{m1}+a_mb_2x_{m2}+\cdots+a_mb_nx_{mn})}{\partial{x_{mn}}} \end{bmatrix}_{m \times n} \\\\ &= \begin{bmatrix} a_1b_1 & a_1b_2 & \cdots & a_1b_n \\ a_2b_1 & a_2b_2 & \cdots & a_2b_n \\ \vdots & \vdots & \vdots & \vdots \\ a_mb_1 & a_mb_2 & \cdots & a_mb_n \end{bmatrix}_{m \times n} \\\\ &= \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_m \end{bmatrix} [b_1,b_2,\cdots,b_n] \\\\ &= \pmb{a}\pmb{b}^T \end{align} \\\\ \tag{27}

证毕。

2.2

\frac{\partial( \pmb{a}^T\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} = \pmb{b}\pmb{a}^T \\\\ \tag{28}

其中, an×1,bm×1\pmb{a}_{n \times 1},\pmb{b}_{m \times 1} 为常数向量,a=(a1,a2,,an)T,b=(b1,b2,,bm)T\pmb{a}_=(a_1,a_2,\cdots,a_n)^T,\pmb{b}=(b_1,b_2,\cdots,b_m)^T

证明:

因为标量的转置等于标量自己,所以有

\frac{\partial(\pmb{a}^T\pmb{X}^T\pmb{b})}{\partial\pmb{X}}=\frac{\partial(\pmb{a}^T\pmb{X}^T\pmb{b})^T}{\partial\pmb{X}}=\frac{\partial(\pmb{b}^T\pmb{X}\pmb{a})}{\partial\pmb{X}} \\\\ \tag{29}

(26)(26) 式得:

\frac{\partial(\pmb{a}^T\pmb{X}^T\pmb{b})}{\partial\pmb{X}}=\frac{\partial(\pmb{b}^T\pmb{X}\pmb{a})}{\partial\pmb{X}} = \pmb{b}\pmb{a}^T \\\\ \tag{30}

证毕。

2.3

\frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} = \pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X} \\\\ \tag{31}

其中, am×1,bm×1\pmb{a}_{m \times 1},\pmb{b}_{m \times 1} 为常数向量,a=(a1,a2,,am)T,b=(b1,b2,,bm)T\pmb{a}_=(a_1,a_2,\cdots,a_m)^T,\pmb{b}=(b_1,b_2,\cdots,b_m)^T

证明(右击公式,选择在新标签页中打开图片,公式就可以放大了~)

\begin{align} \frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} &= \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{\pmb{X}}} \\\\ &= \begin{bmatrix} \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{11}}} & \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{12}}} & \cdots & \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{1n}}} \\ \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{21}}} & \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{22}}} & \cdots &\frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{2n}}} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{m1}}} & \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{m2}}} & \cdots & \frac{\partial( [(a_1b_1)(x_{11}x_{11}+x_{12}x_{12}+\cdots+x_{1n}x_{1n})]+[(a_1b_2)(x_{11}x_{21}+x_{12}x_{22}+\cdots+x_{1n}x_{2n})]+\cdots+[(a_1b_m)(x_{11}x_{m1}+x_{12}x_{m2}+\cdots+x_{1n}x_{mn})] \\ +[(a_2b_1)(x_{21}x_{11}+x_{22}x_{12}+\cdots+x_{2n}x_{1n})]+[(a_2b_2)(x_{21}x_{21}+x_{22}x_{22}+\cdots+x_{2n}x_{2n})]+\cdots+[(a_2b_m)(x_{21}x_{m1}+x_{22}x_{m2}+\cdots+x_{2n}x_{mn})] \\ +\cdots \\ +[(a_mb_1)(x_{m1}x_{11}+x_{m2}x_{12}+\cdots+x_{mn}x_{1n})]+[(a_mb_2)(x_{m1}x_{21}+x_{m2}x_{22}+\cdots+x_{mn}x_{2n})]+\cdots+[(a_mb_m)(x_{m1}x_{m1}+x_{m2}x_{m2}+\cdots+x_{mn}x_{mn})] )}{\partial{x_{mn}}} \\ \end{bmatrix}_{m \times n} \\\\ &= \begin{bmatrix} (a_1b_1x_{11}+a_1b_2x_{21}+\cdots+a_1b_mx_{m1})+(b_1a_1x_{11}+b_1a_2x_{21}+\cdots+b_1a_mx_{m1}) & (a_1b_1x_{12}+a_1b_2x_{22}+\cdots+a_1b_mx_{m2})+(b_1a_1x_{12}+b_1a_2x_{22}+\cdots+b_1a_mx_{m2}) & \cdots & (a_1b_1x_{1n}+a_1b_2x_{2n}+\cdots+a_1b_mx_{mn})+(b_1a_1x_{1n}+b_1a_2x_{2n}+\cdots+b_1a_mx_{mn}) \\ (a_2b_1x_{11}+a_2b_2x_{21}+\cdots+a_2b_mx_{m1})+(b_2a_1x_{11}+b_2a_2x_{21}+\cdots+b_2a_mx_{m1}) & (a_2b_1x_{12}+a_2b_2x_{22}+\cdots+a_2b_mx_{m2})+(b_2a_1x_{12}+b_2a_2x_{22}+\cdots+b_2a_mx_{m2}) & \cdots & (a_2b_1x_{1n}+a_2b_2x_{2n}+\cdots+a_2b_mx_{mn})+(b_2a_1x_{1n}+b_2a_2x_{2n}+\cdots+b_2a_mx_{mn}) \\ \vdots & \vdots & \vdots & \vdots \\ (a_mb_1x_{11}+a_mb_2x_{21}+\cdots+a_mb_mx_{m1})+(b_ma_1x_{11}+b_ma_2x_{21}+\cdots+b_ma_mx_{m1}) & (a_mb_1x_{12}+a_mb_2x_{22}+\cdots+a_mb_mx_{m2})+(b_ma_1x_{12}+b_ma_2x_{22}+\cdots+b_ma_mx_{m2}) & \cdots & (a_mb_1x_{1n}+a_mb_2x_{2n}+\cdots+a_mb_mx_{mn})+(b_ma_1x_{1n}+b_ma_2x_{2n}+\cdots+b_ma_mx_{mn}) \end{bmatrix} \\\\ &= \begin{bmatrix} a_1b_1x_{11}+a_1b_2x_{21}+\cdots+a_1b_mx_{m1} & a_1b_1x_{12}+a_1b_2x_{22}+\cdots+a_1b_mx_{m2} & \cdots & a_1b_1x_{1n}+a_1b_2x_{2n}+\cdots+a_1b_mx_{mn} \\ a_2b_1x_{11}+a_2b_2x_{21}+\cdots+a_2b_mx_{m1} & a_2b_1x_{12}+a_2b_2x_{22}+\cdots+a_2b_mx_{m2} & \cdots & a_2b_1x_{1n}+a_2b_2x_{2n}+\cdots+a_2b_mx_{mn} \\ \vdots & \vdots & \vdots & \vdots \\ a_mb_1x_{11}+a_mb_2x_{21}+\cdots+a_mb_mx_{m1} & a_mb_1x_{12}+a_mb_2x_{22}+\cdots+a_mb_mx_{m2} & \cdots & a_mb_1x_{1n}+a_mb_2x_{2n}+\cdots+a_mb_mx_{mn} \end{bmatrix} + \begin{bmatrix} b_1a_1x_{11}+b_1a_2x_{21}+\cdots+b_1a_mx_{m1} & b_1a_1x_{12}+b_1a_2x_{22}+\cdots+b_1a_mx_{m2} & \cdots & b_1a_1x_{1n}+b_1a_2x_{2n}+\cdots+b_1a_mx_{mn} \\ b_2a_1x_{11}+b_2a_2x_{21}+\cdots+b_2a_mx_{m1} & b_2a_1x_{12}+b_2a_2x_{22}+\cdots+b_2a_mx_{m2} & \cdots & b_2a_1x_{1n}+b_2a_2x_{2n}+\cdots+b_2a_mx_{mn} \\ \vdots & \vdots & \vdots & \vdots \\ b_ma_1x_{11}+b_ma_2x_{21}+\cdots+b_ma_mx_{m1} & b_ma_1x_{12}+b_ma_2x_{22}+\cdots+b_ma_mx_{m2} & \cdots & b_ma_1x_{1n}+b_ma_2x_{2n}+\cdots+b_ma_mx_{mn} \end{bmatrix} \\\\ &= \begin{bmatrix} a_1b_1 & a_1b_2 & \cdots & a_1b_m \\ a_2b_1 & a_2b_2 & \cdots & a_2b_m \\ \vdots & \vdots & \vdots & \vdots \\ a_mb_1 & a_mb_2 & \cdots & a_mb_m \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n} \\ x_{21} & x_{22} & \cdots & x_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ x_{m1} & x_{m2} & \cdots & x_{mn} \end{bmatrix} + \begin{bmatrix} b_1a_1 & b_1a_2 & \cdots & b_1a_m \\ b_2a_1 & b_2a_2 & \cdots & b_2a_m \\ \vdots & \vdots & \vdots & \vdots \\ b_ma_1 & b_ma_2 & \cdots & b_ma_m \end{bmatrix} \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n} \\ x_{21} & x_{22} & \cdots & x_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ x_{m1} & x_{m2} & \cdots & x_{mn} \end{bmatrix} \\\\ &= \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_m \end{bmatrix} [b_1, b_2, \cdots, b_m] \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n} \\ x_{21} & x_{22} & \cdots & x_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ x_{m1} & x_{m2} & \cdots & x_{mn} \end{bmatrix} + \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_m \end{bmatrix} [a_1, a_2, \cdots, a_m] \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n} \\ x_{21} & x_{22} & \cdots & x_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ x_{m1} & x_{m2} & \cdots & x_{mn} \end{bmatrix} \\\\ &= \pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X} \end{align} \\\\ \tag{32}

证毕。

2.4

\frac{\partial( \pmb{a}^T\pmb{X}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} = \pmb{X}\pmb{b}\pmb{a}^T+\pmb{X}\pmb{a}\pmb{b}^T \\\\ \tag{33}

其中, an×1,bn×1\pmb{a}_{n \times 1},\pmb{b}_{n \times 1} 为常数向量,a=(a1,a2,,an)T,b=(b1,b2,,bn)T\pmb{a}_=(a_1,a_2,\cdots,a_n)^T,\pmb{b}=(b_1,b_2,\cdots,b_n)^T

证明:

我们来看一下 (本质篇_9)(本质篇\_9) 式:

\begin{align*} \text{D}_{\pmb{X}}f(\pmb{X})&= \frac{\partial f(\pmb{X})}{\partial \pmb{X}^T_{m\times n}} \\\\ &= \left[ \matrix{ \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial x_{mn}} } \right]_{n\times m} \end{align*} \\\\ \tag{本质篇_9}

再来看一下 (本质篇_11)(本质篇\_11) 式:

\begin{align*} \nabla_{\pmb{X}}f(\pmb{X})&= \frac{\partial f(\pmb{X})}{\partial \pmb{X}_{m\times n}} \\\\ &= \left[ \matrix{ \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{12}}&\cdots&\frac{\partial f}{\partial x_{1n}} \\ \frac{\partial f}{\partial x_{21}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{2n}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{m1}}&\frac{\partial f}{\partial x_{m2}}&\cdots&\frac{\partial f}{\partial x_{mn}} } \right]_{m\times n} \end{align*} \\\\ \tag{本质篇_11}

正如本质篇_三._2.5.1 总结的那样,这两个结果互为转置,即:

\frac{\partial f(\pmb{X})}{\partial{\pmb{X}}^T_{m\times n}} = \left(\frac{\partial f(\pmb{X})}{\partial{\pmb{X}_{m\times n}}}\right)^T \\\\ \tag{34}

所以,我们把 (31)(31) 式中的分母的矩阵变元写为转置,就有:

\begin{align} \frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}^T} &= \left(\frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}}\right)^T \\\\ &= (\pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X})^T \\\\ &= \pmb{X}^T\pmb{b}\pmb{a}^T+\pmb{X}^T\pmb{a}\pmb{b}^T \end{align} \\\\ \tag{35}

对于 (33)(33) 式,我们将其写为如下形式:

\frac{\partial( \pmb{a}^T\pmb{X}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} =\frac{\partial( \pmb{a}^T(\pmb{X}^T)(\pmb{X}^T)^T\pmb{b})}{\partial{(\pmb{X}}^T)^T} \\\\ \tag{36}

然后对 (36)(36) 式使用 (35)(35) 式,得:

\begin{align} \frac{\partial( \pmb{a}^T\pmb{X}^T\pmb{X}\pmb{b})}{\partial{\pmb{X}}} &=\frac{\partial( \pmb{a}^T(\pmb{X}^T)(\pmb{X}^T)^T\pmb{b})}{\partial{(\pmb{X}}^T)^T} \\\\ &= (\pmb{X}^T)^T\pmb{b}\pmb{a}^T+(\pmb{X}^T)^T\pmb{a}\pmb{b}^T \\\\ &= \pmb{X}\pmb{b}\pmb{a}^T+\pmb{X}\pmb{a}\pmb{b}^T \end{align} \\\\ \tag{37}

证毕。

三. 完

本文到这里就结束了,相信大家也和我一样,会觉的后面那几个求导公式,如果按照定义去推导的话,十分的麻烦而且容易出错。

所以, 在下一篇文章中,我们将介绍向量变元实值标量函数矩阵变元实值标量函数进阶的矩阵求导的技巧:矩阵的迹 tr(A)\mathrm{tr}(\pmb{A})一阶实矩阵微分 dX\mathrm{d}\pmb{X} ,它们可以极大地化简我们的推导过程。

欢迎大家点赞、关注、收藏、转发噢~

矩阵求导系列其他文章:

对称矩阵的求导,以多元正态分布的极大似然估计为例(矩阵求导——补充篇) - Iterator的文章 - 知乎

矩阵求导公式的数学推导(矩阵求导——进阶篇) - Iterator的文章 - 知乎

矩阵求导的本质与分子布局、分母布局的本质(矩阵求导——本质篇) - Iterator的文章 - 知乎

参考

  1. ^abcdefgh张贤达《矩阵分析与应用(第二版)》P147

© 2025 All rights reservedBuilt with DataHub Cloud

Built with LogoDataHub Cloud