estimation methods¶

Methods for estimation of scores of new samples, with missing data, from an existing PCA model.

###Methods based on least-squares regression###

projection_to_model_plane()
trimmed_score_regression()
known_data_regression()

Detailed descriptions of algorithms in e.g.

Dealing with missing data in MSPC: several methods, different interpretations, some examples Francisco Arteaga , Alberto Ferrer JOURNAL OF CHEMOMETRICS 2002

Missing data methods in PCA and PLS: Score calculations with incomplete observations Philip R.C.Nelson Paul A.Taylor John F.MacGregor Chemometrics and Intelligent Laboratory Systems 1996

###Methods based on individual PCAs with intersecting data and subsequent merging###

ind_pca_merge()

See e.g.

Comparing spatial maps of human population-genetic variationusing procrustes analysis Wang et.al. Stat Appl Genet Mol Biol 2010

Origins and genetic legacy of neolithic farmers and hunter-gatherers in Europe Skoglund et.al. Science 2012

for description of the method from applications in population genetics.

mpca.estimate.projection_to_model_plane(pca_model, new_data, missing_val=-1.0)[source]¶

Estimate scores for the given samples (that may have missing data) based on existing PCA model, using the Projection to the Model Plane (PMP) method.

Parameters:

pca_model (pcamodel) – The PCA model to use for estimating scores.
pca_data (array, shape (n_pca_samples x n_features)) – The data that was used to define the PCA model. Assumed normalized, and that samples and features are in the same order as pca_model.scores and pca_model.loadings.
new_data (array, shape (n_new_samples x n_features)) – Samples to estimate scores for, based on the PCA model. Assumed normalized, and that the features are the same as those of pca_data, in the same order.
missing_val – Value used to represent missing data.

Returns:

scores_est : array, shape (n_new_samples x 2) Estimated scores of the new data.

mpca.estimate.trimmed_score_regression(pca_model, pca_data, new_data, missing_val=-1.0)[source]¶

Estimate scores for the given samples (that may have missing data) based on existing PCA model, using the Trimmed Score Regresison (TSR) method.

Also returns the trimmed scores (estimated scores using the Trimmed Score method (TRI)).

Parameters:

pca_model (pcamodel) – The PCA model to use for estimating scores.
pca_data (array, shape (n_pca_samples x n_features)) – The data that was used to define the PCA model. Assumed normalized, and that samples and features are in the same order as pca_model.scores and pca_model.loadings.
new_data (array, shape (n_new_samples x n_features)) – Samples to estimate scores for, based on the PCA model. Assumed normalized, and that the features are the same as those of pca_data, in the same order.
missing_val – Value used to represent missing data.

Returns:

scores_est_tsr : array, shape (n_new_samples x 2) Estimated scores of the new data using TSR. scores_est_tri : array, shape (n_new_samples x 2) Estimated scores of the new data using TRI.

mpca.estimate.known_data_regression(pca_model, pca_data, new_data, missing_val=-1.0, ridge=False)[source]¶

Estimate scores for the given samples (that may have missing data) based on existing PCA model, using the Known Data Regression (KDR) method.

Parameters:

pca_model (pcamodel) – The PCA model to use for estimating scores.
pca_data (array, shape (n_pca_samples x n_features)) – The data that was used to define the PCA model. Assumed normalized, and that samples and features are in the same order as pca_model.scores and pca_model.loadings.
new_data (array, shape (n_new_samples x n_features)) – Samples to estimate scores for, based on the PCA model. Assumed normalized, and that the features are the same as those of pca_data, in the same order.
missing_val – Value used to represent missing data.

Returns:

scores_est : array, shape (n_new_samples x 2) Estimated scores of the new data.

mpca.estimate.ind_pca_merge(pca_model, pca_data, new_data, missing_val=-1.0, merge='procrustes')[source]¶

Estimate scores for the given new samples (that may have missing data) for an existing PCA model by perfoming individual PCAs using the values of the original PCA data that overlap with observed values of the new samples, and subsequently merging them by transforming the scores of the individual PCA to those of the full PCA.

Parameters:

pca_model (pcamodel) – The PCA model to use for estimating scores.
pca_data (array, shape (n_pca_samples x n_features)) – The data to use to define the PCA model. Assumed NOT normalized.
new_data (array, shape (n_new_samples x n_features)) – Samples to estimate scores for, based on the PCA model. Assumed NOT normalized, and that the features are the same as those of pca_data, in the same order.
missing_val – Value used to represent missing data.
merge (str) – procrustes: merge using procrustes transformation lsq: merge using general affine transformation

Returns:

scores_est : array, shape (n_new_samples x 2) Estimated scores of the new data.