I’ve finished learning Machine Learning course by Andrew Ng in November 2018, and I got the Certificate on Coursea. This course lasts for 11 weeks and covers topics such as linear regression, logistic regression, neural networks, SVM, etc.
The entire lecture notes will be posted in a few days, since the original notes were written on my notebook. Complying with the Coursera Honor Code, I won’t provide solution to quiz or assignment in my blog.
Week 1 Introduction
Lecture 1: Introduction
Grew out of work in A.I.
New capability for computers.
- Database Mining: Large dataset from growth of automation/web (Web click data)
- Applications can’t program by hand (Natural Language Process, Computer Vision)
- Self-customing Porgrams (Amazon/Netflix recommandations)
- Understanding human learning (brain or real A.I.)
Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.Arthur Samuel (1959)
Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.Tom Mitchell (1998)
Other: Reinforcement Learning, Recommender Systems
Programmes are given a data set.
Know what the correct output looks like.
Familar with the relationsip between input and output.
- Regression (Example: Predicting the house price with squares)
- Classification (Example: Predicting a tumor as maligant or benign)
Little or no data what our result looks like.
Derive structures of data where we don’t know the effects of variable.
No feedback based on the prediction results.
Lecture 2: Linear regression with one variable
Here’s notations for a adata set:
m: Numbers of treaining examples
x: ‘input ‘ variable/features
y: ‘output variable/ target variable
: one training example
: ith training example
In order to describe the learning problem more formally, we give a data set to the programme and let it find a function: as a predictor of value ‘Y’, which is called hypothesis function.
(Short cut as h(x))
Linear regression with one variable or univariate linear regression.
(is more convenient for the computation of gradient descent)
with 2 parameters is usually represented in a contour plots or contour figure.
Gradient Descent Algorithm is a efficient method to find the minimal point of J(Cost Function)
The 2 parameters should be update simultaneously.
In this function, a means ‘training rate’
For a cost function with l parameter ().
The derivative shows the slope of , by subtract it, the wil move closely to the minimum point.
For the training rate a, if a is too small, gradient descent can be slow, but if a is too large, it may fail to converge or even diverge.
The rate a could fixed since the derivative of is gradually decreasing with the destruction of the slope of .
Gradient Descent for Linder Regression
This algorithm is called ‘Batch Gradient Descent’, or BGD. ‘Batch’ means taht each step of gradient descent uses all the training examples. This will eventually find the global minimum for any cost function. There’s nolocal optima in the algorithm.
Lecture 3: Linear Algebra Review
Matrix is a rectangular array of numbers.
A is a 2*4 matrix.
The dimension of a matrix is the numbers of its rows and its colums.
is the ‘i,j’ entry in the ith row, jth column.
In this example. .
We use to represent a 2×4 matrix.
Vector is an n*1 matrix.
is the ith element in the vector.
In this course, we use 1-index in this course.
Addition and subtraction are element-wise, so you simply add or subtract each corresponding element.
Matrix Multiplication with scalar:
(mn matrix multiplied by an n1 vector results in an m1 vector.)
The number of columns of matrix must equal to the number of raws in the vector.
Divide the second matrix into vectors.
An mn amtrix multiplied by an no matrix results in an m*o matrix.
The number of columns of the first must equal to the rows of the second.
- Not commutative:
- Identity matrix: .
- For example:
Matrix Inverse and Transpose
The inverse of a matrix is denoted . (Identity matrix)
Unsquare matrix has no inverse. Use Octave to get inverse.
Transposition of a matrix is rotating it by 90 degrees in clockwise, and then reversing it. For example: