Gaussian Process Regression
Prediction
Given \(N\) training inputs \(\mathbf{X}\in\mathbb{R}^{N\times{D}}\) and training targets \(\mathbf{y}\in\mathbb{R}^{N}\), the posterior predictive distribution has a closed-form expression:
where \(\mathbf{k}_{\star}\) is the vector of kernel values between all the training inputs \(\mathbf{X}\) and the test input \(\mathbf{x}^{\star}\), \(\mathbf{K}_{y}\) is a shorthand of \(\mathbf{K}_{\mathbf{x}}+\sigma^{2}\mathbf{I}\), \(\mathbf{K}_{\mathbf{x}}\) is the covariance matrix given by the kernel function evaluated at each pair of training inputs, and \(k_{\star\star}\triangleq\mathtt{k}(\mathbf{x}^{\star},\mathbf{x}^{\star})\).
Learning
Optimizing the hyperparameters -- a process known as model selection -- is a common practice to obtain a better prediction. Model selection is typically implemented by maximizing the model evidence (better known as log marginal likelihood)