mpca - methods to handle missing data in PCA¶
mpca contains implementations of various methods to solve the following general problem:
Given a PCA model that has been defined on a train set X and a new sample z, with some variables missing:
estimate scores t’ for z using the same PCA model s.t. the difference t’ - t is minimized
where t are the true scores of z (true scores defined as the scores obtained from the PCA model when all data of z is observed)
The methods are implemented to be general, but mpca also contains utilities for handling PCA of genotype data. See the GitHub page for code examples of different use-cases.