Principal Component Regression (PCR)

The nipalsPCR class carries out principal component regression. It analyses two data arrays and finds common systematic variance between the two arrays. See below for a description of the methods in nipalsPCR as well as some examples of how to use it.

class hoggorm.pcr.nipalsPCR(arrX, arrY, numComp=None, Xstand=False, Ystand=False, cvType=None)

This class carries out Principal Component Regression for two arrays using NIPALS algorithm.

Parameters:
  • arrX (numpy array) – This is X in the PCR model. Number and order of objects (rows) must match those of arrY.
  • arrY (numpy array) – This is Y in the PCR model. Number and order of objects (rows) must match those of arrX.
  • numComp (int, optional) – An integer that defines how many components are to be computed. If not provided, the maximum possible number of components is used.
  • Xstand (boolean, optional) –

    Defines whether variables in arrX are to be standardised/scaled or centered.

    False : columns of arrX are mean centred (default)
    Xstand = False
    True : columns of arrX are mean centred and devided by their own standard deviation
    Xstand = True
  • Ystand (boolean, optional) –

    Defines whether variables in arrY are to be standardised/scaled or centered.

    False : columns of arrY are mean centred (default)
    Ystand = False
    True : columns of arrY are mean centred and devided by their own standard deviation
    Ystand = True
  • cvType (list, optional) –

    The list defines cross validation settings when computing the PCA model. Note if cvType is not provided, cross validation will not be performed and as such cross validation results will not be available. Choose cross validation type from the following:

    loo : leave one out / a.k.a. full cross validation (default)
    cvType = ["loo"]
    KFold : leave out one fold or segment
    cvType = ["KFold", numFolds]

    numFolds: int

    Number of folds or segments

  • lolo (leave one label out) –

    cvType = ["lolo", labelsList]

    labelsList: list

    Sequence of lables. Must be same lenght as number of rows in arrX and arrY. Leaves out objects with same lable.

Returns:

A class that contains the PCR model and computational results

Return type:

class

Examples

First import the hoggormpackage

>>> import hoggorm as ho

Import your data into a numpy array.

>>> np.shape(my_X_data)
(14, 292)
>>> np.shape(my_Y_data)
(14, 5)

Examples of how to compute a PCR model using different settings for the input parameters.

>>> model = ho.nipalsPCR(arrX=my_X_data, arrY=my_Y_data, numComp=5)
>>> model = ho.nipalsPCR(arrX=my_X_data, arrY=my_Y_data)
>>> model = ho.nipalsPCR(arrX=my_X_data, arrY=my_Y_data, numComp=3, Ystand=True)
>>> model = ho.nipalsPCR(arrX=my_X_data, arrY=my_Y_data, Xstand=False, Ystand=True)
>>> model = ho.nipalsPCR(arrX=my_X_data, arrY=my_Y_data, cvType=["loo"])
>>> model = ho.nipalsPCR(arrX=my_X_data, arrY=my_Y_data, cvType=["KFold", 7])
>>> model = ho.nipalsPCR(arrX=my_X_data, arrY=my_Y_data, cvType=["lolo", [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7]])

Examples of how to extract results from the PCR model.

>>> X_scores = model.X_scores()
>>> X_loadings = model.X_loadings()
>>> Y_loadings = model.Y_loadings()
>>> X_cumulativeCalibratedExplainedVariance_allVariables = model.X_cumCalExplVar_indVar()
>>> Y_cumulativeValidatedExplainedVariance_total = model.Y_cumCalExplVar()
X_MSECV()

Returns an array holding MSECV across all variables in X acquired through cross validation after each computed component. First row is MSECV for zero components, second row for component 1, third row for component 2, etc.

X_MSECV_indVar()

Returns an arrary holding MSECV for each variable in X acquired through cross validation. First row is MSECV for zero components, second row for component 1, etc.

X_MSEE()

Returns an array holding MSEE across all variables in X acquired through calibration after each computed component. First row is MSEE for zero components, second row for component 1, third row for component 2, etc.

X_MSEE_indVar()

Returns an array holding MSEE for each variable in array X acquired through calibration after each computed component. First row holds MSEE for zero components, second row for component 1, third row for component 2, etc.

X_PRESSCV()

Returns an array holding PRESSCV across all variables in X acquired through cross validation after each computed component. First row is PRESSCV for zero components, second row for component 1, third row for component 2, etc.

X_PRESSCV_indVar()

Returns array holding PRESSCV for each individual variable in X acquired through cross validation after each computed component. First row is PRESSCV for zero components, second row for component 1, third row for component 2, etc.

X_PRESSE()

Returns array holding PRESSE across all variables in X acquired through calibration after each computed component. First row is PRESSE for zero components, second row for component 1, third row for component 2, etc.

X_PRESSE_indVar()

Returns array holding PRESSE for each individual variable in X acquired through calibration after each computed component. First row is PRESSE for zero components, second row for component 1, third row for component 2, etc.

X_RMSECV()

Returns an array holding RMSECV across all variables in X acquired through cross validation after each computed component. First row is RMSECV for zero components, second row for component 1, third row for component 2, etc.

X_RMSECV_indVar()

Returns an arrary holding RMSECV for each variable in X acquired through cross validation after each computed component. First row is RMSECV for zero components, second row for component 1, third row for component 2, etc.

X_RMSEE()

Returns an array holding RMSEE across all variables in X acquired through calibration after each computed component. First row is RMSEE for zero components, second row for component 1, third row for component 2, etc.

X_RMSEE_indVar()

Returns an array holding RMSEE for each variable in array X acquired through calibration after each component. First row holds RMSEE for zero components, second row for component 1, third row for component 2, etc.

X_calExplVar()

Returns a list holding the calibrated explained variance for each component. First number in list is for component 1, second number for component 2, etc.

X_corrLoadings()

Returns array holding correlation loadings of array X. First column holds correlation loadings for component 1, second column holds correlation loadings for component 2, etc.

X_cumCalExplVar()

Returns a list holding the cumulative calibrated explained variance for array X after each component.

X_cumCalExplVar_indVar()

Returns an array holding the cumulative calibrated explained variance for each variable in X after each component. First row represents zero components, second row represents one component, third row represents two components, etc. Columns represent variables.

X_cumValExplVar()

Returns a list holding the cumulative validated explained variance for array X after each component. First number represents zero components, second number represents component 1, etc.

X_cumValExplVar_indVar()

Returns an array holding the cumulative validated explained variance for each variable in X after each component. First row represents zero components, second row represents component 1, third row for compnent 2, etc. Columns represent variables.

X_loadings()

Returns array holding loadings of array X. Rows represent variables and columns represent components. First column holds loadings for component 1, second column holds scores for component 2, etc.

X_means()

Returns array holding column means of array X.

X_predCal()

Returns a dictionary holding the predicted arrays Xhat from calibration after each computed component. Dictionary key represents order of component.

X_predVal()

Returns dictionary holding arrays of predicted Xhat after each component from validation. Dictionary key represents order of component.

X_residuals()

Returns a dictionary holding the residual arrays for array X after each computed component. Dictionary key represents order of component.

X_scores()

Returns array holding scores of array X. First column holds scores for component 1, second column holds scores for component 2, etc.

X_scores_predict(Xnew, numComp=None)

Returns array of X scores from new X data using the exsisting model. Rows represent objects and columns represent components.

X_valExplVar()

Returns a list holding the validated explained variance for X after each component. First number in list is for component 1, second number for component 2, third number for component 3, etc.

Y_MSECV()

Returns an array holding MSECV across all variables in Y acquired through cross validation after each computed component. First row is MSECV for zero components, second row component 1, third row for component 2, etc.

Y_MSECV_indVar()

Returns an array holding MSECV of each variable in array Y acquired through cross validation after each computed component. First row is MSECV for zero components, second row component 1, third row for component 2, etc.

Y_MSEE()

Returns an array holding MSEE across all variables in Y acquired through calibration after each computed component. First row is MSEE for zero components, second row for component 1, third row for component 2, etc.

Y_MSEE_indVar()

Returns an array holding MSEE for each variable in array Y acquired through calibration after each computed component. First row holds MSEE for zero components, second row for component 1, third row for component 2, etc.

Y_PRESSCV()

Returns an array holding PRESSCV across all variables in Y acquired through cross validation after each computed component. First row is PRESSCV for zero components, second row component 1, third row for component 2, etc.

Y_PRESSCV_indVar()

Returns an array holding PRESSCV of each variable in array Y acquired through cross validation after each computed component. First row is PRESSCV for zero components, second row component 1, third row for component 2, etc.

Y_PRESSE()

Returns array holding PRESSE across all variables in Y acquired through calibration after each computed component. First row is PRESSE for zero components, second row for component 1, third row for component 2, etc.

Y_PRESSE_indVar()

Returns array holding PRESSE for each individual variable in Y acquired through calibration after each component. First row is PRESSE for zero components, second row for component 1, third row for component 2, etc.

Y_RMSECV()

Returns an array holding RMSECV across all variables in Y acquired through cross validation after each computed component. First row is RMSECV for zero components, second row component 1, third row for component 2, etc.

Y_RMSECV_indVar()

Returns an array holding RMSECV for each variable in array Y acquired through cross validation after each computed component. First row is RMSECV for zero components, second row component 1, third row for component 2, etc.

Y_RMSEE()

Returns an array holding RMSEE across all variables in Y acquired through calibration after each computed component. First row is RMSEE for zero components, second row for component 1, third row for component 2, etc.

Y_RMSEE_indVar()

Returns an array holding RMSEE for each variable in array Y acquired through calibration after each component. First row holds RMSEE for zero components, second row for component 1, third row for component 2, etc.

Y_calExplVar()

Returns a list holding the calibrated explained variance for each component. First number in list is for component 1, second number for component 2, etc.

Y_corrLoadings()

Returns array holding correlation loadings of array X. First column holds correlation loadings for component 1, second column holds correlation loadings for component 2, etc.

Y_cumCalExplVar()

Returns a list holding the cumulative calibrated explained variance for array X after each component. First number represents zero components, second number represents component 1, etc.

Y_cumCalExplVar_indVar()

Returns an array holding the cumulative calibrated explained variance for each variable in Y after each component. First row represents zero components, second row represents one component, third row represents two components, etc. Columns represent variables.

Y_cumValExplVar()

Returns a list holding the cumulative validated explained variance for array X after each component. First number represents zero components, second number represents component 1, etc.

Y_cumValExplVar_indVar()

Returns an array holding the cumulative validated explained variance for each variable in Y after each component. First row represents zero components, second row represents component 1, third row for compnent 2, etc. Columns represent variables.

Y_loadings()

Returns an array holding loadings C of array Y. Rows represent variables and columns represent components. First column for component 1, second columns for component 2, etc.

Y_means()

Returns array holding means of columns in array Y.

Y_predCal()

Returns dictionary holding arrays of predicted Yhat after each component from calibration. Dictionary key represents order of components.

Y_predVal()

Returns dictionary holding arrays of predicted Yhat after each component from validation. Dictionary key represents order of component.

Y_predict(Xnew, numComp=1)

Return predicted Yhat from new measurements X.

Y_residuals()

Returns a dictionary holding residuals F of array Y after each component. Dictionary key represents order of component.

Y_valExplVar()

Returns a list holding the validated explained variance for Y after each component. First number in list is for component 1, second number for component 2, third number for component 3, etc.

__init__(arrX, arrY, numComp=None, Xstand=False, Ystand=False, cvType=None)

On initialisation check how arrX and arrY are to be pre-processed (parameters Xstand and Ystand are either True or False). Then check whether number of components chosen by user is OK.

corrLoadingsEllipses()

Returns coordinates for the ellipses that represent 50% and 100% expl. variance in correlation loadings plot.

cvTrainAndTestData()

Returns a list consisting of dictionaries holding training and test sets.

modelSettings()

Returns a dictionary holding the settings under which NIPALS PCR was run.

regressionCoefficients(numComp=1)

Returns regression coefficients from the fitted model using all available samples and a chosen number of components.