regularized learning algorthm
There will have many problems when training machine learning algorithm.
under fitting problem
Feature Polynomial too low for fitting target training set, unable to meet training set’s performance, let alone future data.overfitting problem
Feature polynomial too high for fitting target training set, means fit training set’s performance too well, but unable to predict future data.
Regularization could ameliorate or to reduce over-fitting problem.
premise
- $m$ is the number of training set records.
- $n$ is the number of features
- $\lambda$ is penality value for reducing high polynomial features’ effect, with larger this value is, smaller the effect is, but learning algorithm turn out to under fitting if $\lambda$ too high.
regularized Linear regression
Cost function
$$
J(\theta)=\frac{1}{2m}\left[ \sum^{m}{i=1} (h{\theta}(x^{(i)})-y^{(i)})^2 + \lambda \sum^{n}_{j=1} \theta^2_j \right]
$$
Gradient descend
$$
\begin{aligned}
\theta_0 &:=\theta_0-\alpha \frac{1}{m}\sum^{m}{i=1}(h{\theta}(x^{(i)})-y^{(i)})x^{(i)}0 \
\theta_j &:=\theta_j-\alpha \left[\frac{1}{m}\sum^{m}{i=1}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}_j + \frac{\lambda}{m}\theta_j \right] \quad \text{(j=1,2,3…,n)}
\end{aligned}
$$
regularized logistic regression
Cost function
$$
J(\theta)=-\left[ \frac{1}{m} \sum^{m}{i=1} y^{(i)} \log h_\theta(x^{(i)})+(1-y^{(i)}) \log(1-h_\theta(x^{(i)})) \right] + \frac{\lambda}{2m}\sum^{n}{j=1} \theta^2_j
$$
Gradient descend
$$
\begin{aligned}
\theta_0 &:=\theta_0-\alpha \frac{1}{m}\sum^{m}{i=1}(h{\theta}(x^{(i)})-y^{(i)})x^{(i)}0 \
\theta_j &:=\theta_j-\alpha \left[\frac{1}{m}\sum^{m}{i=1}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}j + \frac{\lambda}{m}\theta_j \right] \quad \text{(j=1,2,3…,n)} \
\end{aligned} \
h{\theta}(x)=\frac{1}{1+e^{-\theta^{T}x}}
$$