Juan Uys

Week 1 of Neural Networks and Deep Learning

2017-11-21

Introduction to Deep Learning

These are my notes on week #1 of the Introduction to Deep Learning Coursera MOOC.

Overview

Video #1 runs through an overview of all 5 courses in the specialisation.

What is a Neural Network?

Video #2 asks What Is A Neural Network? E.g. Taking samples of house size (x) and their house price (y) can be plotted as a straight line using linear regression. Since the house price can never be negative, the curve ends up looking like _/ This simple graph is a neural network: given an input size, an output price can be predicted. This graph appears frequently in NN literature, and is known as a ReLU (Rectified Linear Unit).

Input house size can be expanded to many more features like ZIP code, number of bedrooms, etc - “home features”.

Supervised learning

Video #3 talks about Supervised Learning. It discusses different applications, and which type of NN is best suited, e.g.

Input (x) Output (y) Application Type Data
Home features Price Real estate Standard NN Structured (e.g. CSV)
Ad, user info Click? (0/1) Online ads Standard NN Structured
Image Classification Photo tagging CNN Unstructured
Audio Text transcript Speech recog RNN (because it’s 1-dimensional time-series) Unstructured
English Chinese Machine translation RNN (also sequence data) Unstructured
Image, Radar Position of other cars Autonomous driving Custom/Hybrid Unstructured

Why is Deep Learning taking off?

Video #4 asks why deep learning is taking off. (well, I’ve personally been fascinated with neural nets since the late nineties, but here we go…). Answer: data is getting larger, and NNs are more performant at data scale. Ng briefly touches on why the signoid is dropping out of favour and being replaced by ReLU: The sigmoid has a 0 gradient in the two outermost regions (the tails of the curve) which slows down training, because if you implement gradient descent, and the gradient is 0, then the parameters just change very slowly. Whereas with the ReLU, the gradient is 1 for all positive values. (the fact that the left region has a 0 slope and how it impacts training will probably become more apparent in later videos, but I suspect some manner of pruning/drop-out will come into play here.)

About the course

Video #5 talks about the curriculum again.

Resources

Video #6 mentions course resources:

Heroes of Deep Learning

Video #7 the last, optional video is an interview with Geoffrey Hinton, which I can wholeheartedly recommend. Interestingly, at the 12h30 mark Geoffrey notes that he was working on variational methods, and it just so happens that people in statistics were also working on the same problem, but they didn’t know about it at the time. (Which goes to show that we can all do with better communication, and/or opening our eyes more often.) Around the 23m mark Hinton touches on Capsule Networks, which is something I’m also excited about.

Advice:

  • read the literature, but not too much of it!
  • notice what everyone does wrong, and do it right

Around the 36m mark, Hinton says the thing which is on everyone’s lips right now: we’re not programming computers anymore, we’re showing computers.

Copyright © 2002-2024 Juan Uys, (source code for this website). Updates via RSS, Mastodon or newsletter 💌.