- $X$ is a
ncolumns, that means it is a $m \times n$ matrix, represents for training set.
- $\theta$ is a $1 \times n$
vector, stands for hypothesis parameter.
- $y$ is a $m \times 1$
vector, stands for real value of training set.
- $\alpha$ named
learning ratefor defining learning or descending speed.
- $S(X_j)$ means to get standard deviation of the j feature from training set.
Draw hypothesis of a pattern.
Calculate the Cost for single training point.
3. Cost function
Draw cost function for iterating whole training set.
4. Get optimized parameter
Learn from training set to get optimized parameter for proposed algorithm.
Complicate to implement.
suitable for any senario.
Convenient, but performance bad while
mgrow large than 100000.
Unable to conquer non-invertable matrix.
Use feature scaling to optimize training set.
Make gradient descend converge much faster.