Lecture 25: Stochastic Gradient Descent

Name: Lecture 25: Stochastic Gradient Descent
Uploaded: 2019-05-16T16:29:41Z
Duration: 53 min 3 s

53:03

Description

Professor Suvrit Sra gives this guest lecture on stochastic gradient descent (SGD), which randomly selects a minibatch of data at each step. The SGD is still the primary method for training large-scale machine learning systems.

Summary

Full gradient descent uses all data in each step.
Stochastic method uses a minibatch of data (often 1 sample!).
Each step is much faster and the descent starts well.
Later the points bounce around / time to stop!
This method is the favorite for weights in deep learning.

Related section in textbook: VI.5

Instructor: Prof. Suvrit Sra