PREDICTING MILK TRAITS FROM SPECTRAL DATA USING BAYESIAN PROBABILISTIC PARTIAL LEAST SQUARES REGRESSION

成果类型:
Article
署名作者:
Urbas, Szymon; Lovera, Pierre; Daly, Robert; O'Riordan, Alan; Berry, Donagh; Gormley, Isobel Claire
署名单位:
University College Dublin; University College Cork; Teagasc
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/24-AOAS1947
发表日期:
2024
页码:
3486-3506
关键词:
midinfrared spectroscopy individual milk MODEL CLASSIFICATION selection proteins cows tool
摘要:
High-dimensional spectral data-routinely generated in dairy production-are used to predict a range of traits in milk products. Partial least squares (PLS) regression is ubiquitously used for these prediction tasks. However, PLS regression is not typically viewed as arising from a probabilistic model, and parameter uncertainty is rarely quantified. Additionally, PLS regression does not easily lend itself to model-based modifications, coherent prediction intervals are not readily available, and the process of choosing the latent-space dimension, Q, can be subjective and sensitive to data size. We introduce a Bayesian latent-variable model, emulating the desirable properties of PLS regression while accounting for parameter uncertainty in prediction. The need to choose Q is eschewed through a nonparametric shrinkage prior. The flexibility of the proposed Bayesian partial least squares (BPLS) regression framework is exemplified by considering sparsity modifications and allowing for multivariate response prediction. The BPLS regression framework is used in two motivating settings: (1) multivariate trait prediction from mid-infrared spectral analyses of milk samples and (2) milk pH prediction from surface-enhanced Raman spectral data. The prediction performance of BPLS regression at least matches that of PLS regression. Additionally, the provision of correctly calibrated prediction intervals objectively provides richer, more informative inference for stakeholders in dairy production.
来源URL: