Week 1 of Neural Networks and Deep Learning
Introduction to Deep Learning
These are my notes on week #1 of the Introduction to Deep Learning Coursera MOOC.
Overview
Video #1 runs through an overview of all 5 courses in the specialisation.
What is a Neural Network?
Video #2 asks What Is A Neural Network? E.g. Taking samples of house size (x) and their house price (y) can be plotted as a straight line using linear regression. Since the house price can never be negative, the curve ends up looking like _/ This simple graph is a neural network: given an input size, an output price can be predicted. This graph appears frequently in NN literature, and is known as a ReLU (Rectified Linear Unit).
Input house size can be expanded to many more features like ZIP code, number of bedrooms, etc - “home features”.
Supervised learning
Video #3 talks about Supervised Learning. It discusses different applications, and which type of NN is best suited, e.g.
Input (x) | Output (y) | Application | Type | Data |
---|---|---|---|---|
Home features | Price | Real estate | Standard NN | Structured (e.g. CSV) |
Ad, user info | Click? (0/1) | Online ads | Standard NN | Structured |
Image | Classification | Photo tagging | CNN | Unstructured |
Audio | Text transcript | Speech recog | RNN (because it’s 1-dimensional time-series) | Unstructured |
English | Chinese | Machine translation | RNN (also sequence data) | Unstructured |
Image, Radar | Position of other cars | Autonomous driving | Custom/Hybrid | Unstructured |
Why is Deep Learning taking off?
Video #4 asks why deep learning is taking off. (well, I’ve personally been fascinated with neural nets since the late nineties, but here we go…). Answer: data is getting larger, and NNs are more performant at data scale. Ng briefly touches on why the signoid is dropping out of favour and being replaced by ReLU: The sigmoid has a 0 gradient in the two outermost regions (the tails of the curve) which slows down training, because if you implement gradient descent, and the gradient is 0, then the parameters just change very slowly. Whereas with the ReLU, the gradient is 1 for all positive values. (the fact that the left region has a 0 slope and how it impacts training will probably become more apparent in later videos, but I suspect some manner of pruning/drop-out will come into play here.)
About the course
Video #5 talks about the curriculum again.
Resources
Video #6 mentions course resources:
- forum
- yup, that’s it
Heroes of Deep Learning
Video #7 the last, optional video is an interview with Geoffrey Hinton, which I can wholeheartedly recommend. Interestingly, at the 12h30 mark Geoffrey notes that he was working on variational methods, and it just so happens that people in statistics were also working on the same problem, but they didn’t know about it at the time. (Which goes to show that we can all do with better communication, and/or opening our eyes more often.) Around the 23m mark Hinton touches on Capsule Networks, which is something I’m also excited about.
Advice:
- read the literature, but not too much of it!
- notice what everyone does wrong, and do it right
Around the 36m mark, Hinton says the thing which is on everyone’s lips right now: we’re not programming computers anymore, we’re showing computers.