Neural nets with a Newton conjugate gradient method on multiple GPUs

DOI10.1007/978-3-031-30442-2_11arXiv2208.02017MaRDI QIDQ6135465FDOQ6135465

Authors: Severin Reiz, Tobias Neckel, Hans-Joachim Bungartz

Publication date: 25 August 2023

Published in: Parallel Processing and Applied Mathematics (Search for Journal in Brave)

Abstract: Training deep neural networks consumes increasing computational resource shares in many compute centers. Often, a brute force approach to obtain hyperparameter values is employed. Our goal is (1) to enhance this by enabling second-order optimization methods with fewer hyperparameters for large-scale neural networks and (2) to perform a survey of the performance optimizers for specific tasks to suggest users the best one for their problem. We introduce a novel second-order optimization method that requires the effect of the Hessian on a vector only and avoids the huge cost of explicitly setting up the Hessian for large-scale networks. We compare the proposed second-order method with two state-of-the-art optimizers on five representative neural network problems, including regression and very deep networks from computer vision or variational autoencoders. For the largest setup, we efficiently parallelized the optimizers with Horovod and applied it to a 8 GPU NVIDIA P100 (DGX-1) machine.

Full work available at URL: https://arxiv.org/abs/2208.02017

Recommendations

zbMATH Keywords

deep learning machine learning numerical methods data parallelism second-order optimization

Mathematics Subject Classification ID

Parallel numerical computation (65Y05) Research exposition (monographs, survey articles) pertaining to numerical analysis (65-02) Parallel algorithms in computer science (68W10)

Cites Work

Cited In (2)

This page was built for publication: Neural nets with a Newton conjugate gradient method on multiple GPUs

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6135465)