## premise

1. $X$ is a matrix which has m rows and n columns, that means it is a $m \times n$ matrix, represents for training set.
2. $\theta$ is a $1 \times n$ vector, stands for hypothesis parameter.
3. $y$ is a $m \times 1$ vector, stands for real value of training set.
4. $\alpha$ named learning rate for defining learning or descending speed.

# 1. Hypothesis

Draw hypothesis of a pattern.
Since classification problem range from 0 to 1
We need to make use of this sigmoid function

# 2. Cost

Calculate the Cost for single training point.

# 3. Cost function

Draw cost function for iterating whole training set.

# 4. Get optimized parameter

Learn from training set to get optimized parameter for proposed algorithm.