Stochastic Gradient MCMC for Large-Scale Gaussian Process Spatial Modeling
Authors:
(1) Mohamed A. Abba, Department of Statistics, North Carolina State University;
(2) Brian J. Reich, Department of Statistics, North Carolina State University;
(3) Reetam Majumder, Southeast Climate Adaptation Science Center, North Carolina State University;
(4) Brandon Feng, Department of Statistics, North Carolina State University.
Table of Links
Abstract and 1 Introduction
1.1 Methods to handle large spatial datasets
1.2 Review of stochastic gradient methods
2 Matern Gaussian Process Model and its Approximations
2.1 The Vecchia approximation
3 The SG-MCMC Algorithm and 3.1 SG Langevin Dynamics
3.2 Derivation of gradients and Fisher information for SGRLD
4 Simulation Study and 4.1 Data generation
4.2 Competing methods and metrics
4.3 Results
5 Analysis of Global Ocean Temperature Data
6 Discussion, Acknowledgements, and References
Appendix A.1: Computational Details
Appendix A.2: Additional Results
Abstract
Gaussian processes (GPs) are commonly used for prediction and inference for spatial data analyses. However, since estimation and prediction tasks have cubic time and quadratic memory complexity in number of locations, GPs are difficult to scale to large spatial datasets. The Vecchia approximation induces sparsity in the dependence structure and is one of several methods proposed to scale GP inference. Our work adds to the substantial research in this area by developing a stochastic gradient Markov chain Monte Carlo (SGMCMC) framework for efficient computation in GPs. At each step, the algorithm subsamples a minibatch of locations and subsequently updates process parameters through a Vecchia-approximated GP likelihood. Since the Vecchia-approximated GP has a time complexity that is linear in the number of locations, this results in scalable estimation in GPs. Through simulation studies, we demonstrate that SGMCMC is competitive with state-of-the-art scalable GP algorithms in terms of computational time and parameter estimation. An application of our method is also provided using the Argo dataset of ocean temperature measurements.
1 Introduction
Gaussian process (GP) modeling is a powerful statistical and machine learning tool used to tackle a variety of tasks including regression, classification, and optimization. Within spatial statistics, in particular, GPs have become the primary tool for inference (Gelfand and Schliep, 2016). In spatial regression and classification problems, the response variable is assumed to have a spatially correlated structure. GPs model this spatial dependence by specifying a form for the correlation between any two points in the spatial domain. In this paper we focus on the regression setting under the Matern correlation with large amounts of data. Formally, GPs place a prior on the spatial process using a parameterized correlation function, which allows us to estimate a posteriori the parameters given the observed data.
One of the main advantages of GPs is their ability to provide predictions at unobserved locations along with uncertainty quantification. Spatial interpolation, commonly known as Kriging (Woodard, 2000), provides optimal predictions at unobserved sites based on the correlation between a given location and its observed neighbors (Cressie, 1988). However, handling large datasets with GPs poses computational challenges due to the cubic time complexity and quadratic memory requirements for the evaluation of the joint likelihood. This prohibitive computational cost mainly results from the evaluation of the covariance matrix and computing its inverse. Several methods have been proposed to address this issue and make GPs more scalable for large datasets. In this work, we combine stochastic gradient (SG) methods along with the Vecchia (Vecchia, 1988) approximation to develop an efficient algorithm for scalable Bayesian inference in massive spatial data settings. In the following section we review some of the main methods used to scale GPs (see Heaton et al., 2019, for a full survey), and briefly discuss applications of SG methods in correlated and dependent data settings.