Gaussian Sketching yields a J-L Lemma in RKHS

Abstract: The main contribution of the paper is to show that Gaussian sketching of a kernel-Gram matrix

yields an operator whose counterpart in an RKHS

m a t h c a l H

, is a emph{random projection} operator---in the spirit of Johnson-Lindenstrauss (J-L) lemma. To be precise, given a random matrix

Z

with i.i.d. Gaussian entries, we show that a sketch

corresponds to a particular random operator in (infinite-dimensional) Hilbert space

m a t h c a l H

that maps functions

f i n m a t h c a l H

to a low-dimensional space

m a t h b b R^{d}

, while preserving a weighted RKHS inner-product of the form

l a n g l e f, g a n g l e_{S i g m a} d o t e q l a n g l e f, S i g m a^{3} g a n g l e_{m a t h c a l H}

, where

S i g m a

is the emph{covariance} operator induced by the data distribution. In particular, under similar assumptions as in kernel PCA (KPCA), or kernel

k

-means (K-

k

-means), well-separated subsets of feature-space

K (c d o t, x) : x i n c a l X

remain well-separated after such operation, which suggests similar benefits as in KPCA and/or K-

k

-means, albeit at the much cheaper cost of a random projection. In particular, our convergence rates suggest that, given a large dataset

{X_{i}}_{i = 1}^{N}

of size

N

, we can build the Gram matrix

on a much smaller subsample of size

n l l N

, so that the sketch

is very cheap to obtain and subsequently apply as a projection operator on the original data

{X_{i}}_{i = 1}^{N}

. We verify these insights empirically on synthetic data, and on real-world clustering applications.

This page was built for publication: Gaussian Sketching yields a J-L Lemma in RKHS