The Treatment Of Missing Measurements In PCA And PLS Models

Nelson, Philip R.

Please use this identifier to cite or link to this item: http://hdl.handle.net/11375/6164

Title:	The Treatment Of Missing Measurements In PCA And PLS Models
Authors:	Nelson, Philip R.
Advisor:	Taylor, P. A. MacGregor, J. F.
Department:	Chemical Engineering
Keywords:	Chemical Engineering;Chemical Engineering
Publication Date:	May-2002
Abstract:	<p>This thesis investigates the building and application of principal components analysis (PCA) and projection to latent structures (PLS) models when some objects in the data set have missing measurements. Score calculation is treated first, followed by model application to prediction and monitoring. Model building is explored in the final part of the thesis. The first problem treated in this work is that of estimating scores from an existing PCA or PLS model when new observation vectors are incomplete. Several methods for estimating scores from data with missing measurements are presented and analysed, including a novel method involving data replacement by the conditional mean. Expressions are developed for the error in the scores calculated by each method and the factors that lead to error are drawn from these expressions. The error analysis is illustrated using simulated data sets designed to highlight problem situations. A larger industrial data set and a simulated process data set are also used to compare the approaches. In general, all the methods perform reasonably well with moderate amounts of missing data (up to 20% of the measurements). However, in extreme cases where critical combinations of measurements are missing, the novel method is generally superior to the other approaches. Uncertainty intervals arising from missing measurements are then developed for the squared prediction error (SPE), Hotelling T2, PLS predictions and their contributions. These uncertainty intervals provide performance measures and diagnostics when measurements are missing in process monitoring and prediction. The uncertainty intervals derived agree well with the values calculated from the complete objects and give valuable information about the true level of knowledge about the process. The uncertainty in the contributions is used to correctly diagnose which missing measurement in a set gives the greatest reduction in uncertainty when it is recovered. Insight gained in the analysis of score estimation and model application is applied to model building with the nonlinear iterative partial least squares (NlPALS), maximum likelihood principal components analysis (MLPCA), expectation maximisation (EM) and iterative replacement algorithms. Challenges unique to model building and factors that lead to error in individual steps of each model building algorithm are examined. Recommendations are made to improve the quality of the models obtained, and a procedure is proposed to screen data for objects and variables with missing measurements that would have an adverse effect on a model built using missing measurements.</p>
URI:	http://hdl.handle.net/11375/6164
Identifier:	opendissertations/1495 2198 1266044
Appears in Collections:	Open Access Dissertations and Theses

Files in This Item:

File	Size	Format
fulltext.pdf Open Access	6.31 MB	Adobe PDF	View/Open

Show full item record