Softplus

Article principal : Redresseur (réseaux neuronaux).

En mathématiques et en machine learning, la fonction softplus est la fonction réelle

\operatorname {softplus} (x)=\ln(1+\mathrm {e} ^{x}).

C'est une approximation lisse (et même analytique) d'une rampe, connue comme le rectifier ou ReLU (rectified linear unit) en machine learning.

Les noms softplus^[1]^,^[2] et SmoothReLU^[3] sont utilisés en machine learning. Le nom "softplus" (2000), par analogie avec le nom softmax (1989), vient supposément du fait qu'il s'agit d'une approximation lisse (soft) de la partie positive de $x$ , qui est parfois noté avec l'exposant plus, $x^{+}:=\max(0,x)$ .

Propriétés et formes alternatives

La fonction softplus est strictement positive. Pour de grandes valeurs négatives de $x$ , on a $\operatorname {softplus} (x)\approx \ln 1=0$ , et pour de grandes valeurs positives de $x$ , on a $\ln(1+\mathrm {e} ^{x})\sim x$ . Elle approche donc la fonction rampe par excès.

Cette fonction peut être approchée par :

\ln \left(1+\mathrm {e} ^{x}\right)\approx {\begin{cases}\ln 2,&x=0,\\[6pt]{\dfrac {x}{1-\mathrm {e} ^{-x/\ln 2}}},&x\neq 0\end{cases}}

Par un changement de variables $x=y\ln(2)$ , elle est équivalente à

\log _{2}(1+2^{y})\approx {\begin{cases}1,&y=0,\\[6pt]{\dfrac {y}{1-\mathrm {e} ^{-y}}},&y\neq 0.\end{cases}}

Un paramètre de raideur $k$ peut être ajouté :

\operatorname {softplus} _{k}(x)={\frac {\ln(1+\mathrm {e} ^{kx})}{k}},\qquad \qquad \operatorname {softplus} _{k}'(x)={\frac {\mathrm {e} ^{kx}}{1+\mathrm {e} ^{kx}}}={\frac {1}{1+\mathrm {e} ^{-kx}}}.

La fonction $f_{k}$ approche la fonction rampe d'autant mieux que le paramètre prend de grandes valeurs positives.

Fonctions associées

La dérivée de la fonction softplus est la fonction logistique standard :

\operatorname {softplus} '(x)={\frac {\mathrm {e} ^{x}}{1+\mathrm {e} ^{x}}}={\frac {1}{1+\mathrm {e} ^{-x}}}

qui est connue pour être une approximation lisse de la fonction de Heaviside.

LogSumExp

Article détaillé : LogSumExp.

Une généralisation multivariée de la fonction softplus est la fonction LogSumExp avec le premier argument fixé à zéro :

\operatorname {LSE_{0}} ^{+}(x_{1},\dots ,x_{n}):=\operatorname {LSE} (0,x_{1},\dots ,x_{n})=\ln \left(1+\sum _{k=1}^{n}\exp(x_{k})\right).

La fonction LogSumExp est définie par

\operatorname {LSE} (x_{1},\dots ,x_{n})=\ln \left(\sum _{k=1}^{n}\exp(x_{k})\right),

et son gradient est la softmax ; le softmax avec le premier argument fixé à zéro est la généralisation multivariée de la fonction logistique. Les deux fonctions LogSumExp et softmax sont aussi utilisées en machine learning.

Conjugué convexe

Le conjugué convexe (plus précisément, la transformée de Legendre) de la fonction softplus est la fonction entropie binaire (de base $e$ ) négative. Cela vient du fait (avec la définition de la transformation de Legendre : les dérivées sont les fonctions inverses) que la dérivée de la fonction softplus est la fonction logistique, dont l'inverse est le logit, qui est la dérivée de l'entropie binaire négative.

Softplus peut être interprété comme une perte logistique (comme un nombre positif), ainsi, par dualité, minimiser la perte logistique correspond à maximiser l'entropie. Ceci justifie le principe d'entropie maximale comme minimisation de perte.

Références

(en) Cet article est partiellement ou en totalité issu de l’article de Wikipédia en anglais intitulé « Softplus » (voir la liste des auteurs).

↑ (en) Charles Dugas, Yoshua Bengio, François Bélisle, Claude Nadeau et René Garcia, « Incorporating second-order functional knowledge for better option pricing », Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS'00), MIT Press,‎ 2000, p. 451–457 (lire en ligne) :
« Since the sigmoid h has a positive first derivative, its primitive, which we call softplus, is convex. »
↑ (en) Xavier Glorot, Antoine Bordes et Yoshua Bengio, « Deep Sparse Rectifier Neural Networks », Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings,‎ 14 juin 2011, p. 315–323 (lire en ligne) :
« Rectifier and softplus activation functions. The second one is a smooth version of the first. »
↑ (en) Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille et Quoc V. Le, « Smooth Adversarial Training », .

Portail de l'informatique théorique

Portail de l’intelligence artificielle

[1] (en) Charles Dugas, Yoshua Bengio, François Bélisle, Claude Nadeau et René Garcia, « Incorporating second-order functional knowledge for better option pricing », Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS'00), MIT Press,‎ 2000, p. 451–457 (lire en ligne) :
« Since the sigmoid h has a positive first derivative, its primitive, which we call softplus, is convex. »

[2] (en) Xavier Glorot, Antoine Bordes et Yoshua Bengio, « Deep Sparse Rectifier Neural Networks », Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings,‎ 14 juin 2011, p. 315–323 (lire en ligne) :
« Rectifier and softplus activation functions. The second one is a smooth version of the first. »

[3] (en) Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille et Quoc V. Le, « Smooth Adversarial Training », .

[1]

[2]

[3]