One-dimensional system arising in stochastic gradient descent

Abstract: We consider SDEs of the form

d X_{t} = | f (X_{t}) | / t^{g a m m a} d t + 1 / t^{g a m m a} d B_{t}

, where

f (x)

behaves comparably to

| x |^{k}

in a neighborhood of the origin, for

k i n [1, i n f t y)

. We show that there exists a threshold value

:= i l d e g a m m a

for

g a m m a

, depending on

k

, such that when

g a m m a i n (1 / 2, i l d e g a m m a)

then

m a t h b b P (X_{n} i g h t a r r o w 0) = 0

, and for the rest of the permissible values

m a t h b b P (X_{n} i g h t a r r o w 0) > 0

. The previous results extend for discrete processes that satisfy

X_{n + 1} - X_{n} = f (X_{n}) / n^{g} a m m a + Y_{n} / n^{g} a m m a

. Here,

Y_{n + 1}

are martingale differences that are a.s. bounded. This result shows that for a function

F

, whose second derivative at degenerate saddle points is of polynomial order, it is always possible to escape saddle points via the iteration

X_{n + 1} - X_{n} = F^{'} (X_{n}) / n^{g} a m m a + Y_{n} / n^{g} a m m a

for a suitable choice of

g a m m a

.

Recommendations

Cites work

This page was built for publication: One-dimensional system arising in stochastic gradient descent