Partial Least Squares Regression (PLSR)

PLSR1

class hoggorm.plsr1.nipalsPLS1(arrX, vecy, numComp=3, Xstand=False, Ystand=False, cvType=['loo'])

This class carries out partial least squares regression (PLSR) for two arrays using NIPALS algorithm. The y array is univariate, which is why PLS1 is applied.

Parameters:
  • arrX (numpy array) – This is X in the PLS1 model. Number and order of objects (rows) must match those of arrY.
  • vecy (numpy array) – This is y in the PLS1 model. Number and order of objects (rows) must match those of arrX.
  • numComp (int, optional) – An integer that defines how many components are to be computed. If not provided, the maximum possible number of components is used.
  • Xstand (boolean, optional) –

    Defines whether variables in arrX are to be standardised/scaled or centered.

    False : columns of arrX are mean centred (default)
    Xstand = False
    True : columns of arrX are mean centred and devided by their own standard deviation
    Xstand = True
  • Ystand (boolean, optional) –

    Defines whether vecy is to be standardised/scaled or centered.

    False : vecy is to be mean centred (default)
    Ystand = False
    True : vecy is to be mean centred and devided by its own standard deviation
    Ystand = True
  • cvType (list, optional) –

    The list defines cross validation settings when computing the PCA model. Note if cvType is not provided, cross validation will not be performed and as such cross validation results will not be available. Choose cross validation type from the following:

    loo : leave one out / a.k.a. full cross validation (default)
    cvType = ["loo"]
    KFold : leave out one fold or segment
    cvType = ["KFold", numFolds]

    numFolds: int

    Number of folds or segments

  • lolo (leave one label out) –

    cvType = ["lolo", labelsList]

    labelsList: list

    Sequence of lables. Must be same lenght as number of rows in arrX and arrY. Leaves out objects with same lable.

Returns:

A class that contains the PLS1 model and computational results

Return type:

class

EXAMPLES

First import the hoggormpackage

>>> import hoggorm as ho

Import your data into a numpy array.

>>> np.shape(my_X_data)
(14, 292)
>>> np.shape(my_y_data)
(14, 1)

Examples of how to compute a PLS1 model using different settings for the input parameters.

>>> model = ho.nipalsPLS1(arrX=my_X_data, vecy=my_y_data, numComp=5)
>>> model = ho.nipalsPLS1(arrX=my_X_data, vecy=my_y_data)
>>> model = ho.nipalsPLS1(arrX=my_X_data, vecy=my_y_data, numComp=3, Ystand=True)
>>> model = ho.nipalsPLS1(arrX=my_X_data, vecy=my_y_data, Xstand=False, Ystand=True)
>>> model = ho.nipalsPLS1(arrX=my_X_data, vecy=my_y_data, cvType=["loo"])
>>> model = ho.nipalsPLS1(arrX=my_X_data, vecy=my_y_data, cvType=["KFold", 7])
>>> model = ho.nipalsPLS1(arrX=my_X_data, vecy=my_y_data, cvType=["lolo", [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7]]])

Examples of how to extract results from the PCR model.

>>> X_scores = model.X_scores()
>>> X_loadings = model.X_loadings()
>>> y_loadings = model.Y_loadings()
>>> X_cumulativeCalibratedExplainedVariance_allVariables = model.X_cumCalExplVar_indVar()
>>> Y_cumulativeValidatedExplainedVariance_total = model.Y_cumCalExplVar()
X_MSECV()

Returns an array holding MSECV across all variables in X acquired through cross validation after each computed component. First row is MSECV for zero components, second row for component 1, third row for component 2, etc.

X_MSECV_indVar()

Returns an arrary holding MSECV for each variable in X acquired through cross validation. First row is MSECV for zero components, second row for component 1, etc.

X_MSEE()

Returns an array holding MSEE across all variables in X acquired through calibration after each computed component. First row is MSEE for zero components, second row for component 1, third row for component 2, etc.

X_MSEE_indVar()

Returns an array holding MSEE for each variable in array X acquired through calibration after each computed component. First row holds MSEE for zero components, second row for component 1, third row for component 2, etc.

X_PRESSCV()

Returns an array holding PRESSCV across all variables in X acquired through cross validation after each computed component. First row is PRESSCV for zero components, second row for component 1, third row for component 2, etc.

X_PRESSCV_indVar()

Returns array holding PRESSCV for each individual variable in X acquired through cross validation after each computed component. First row is PRESSCV for zero components, second row for component 1, third row for component 2, etc.

X_PRESSE()

Returns array holding PRESSE across all variables in X acquired through calibration after each computed component. First row is PRESSE for zero components, second row for component 1, third row for component 2, etc.

X_PRESSE_indVar()

Returns array holding PRESSE for each individual variable in X acquired through calibration after each computed component. First row is PRESSE for zero components, second row for component 1, third row for component 2, etc.

X_RMSECV()

Returns an array holding RMSECV across all variables in X acquired through cross validation after each computed component. First row is RMSECV for zero components, second row for component 1, third row for component 2, etc.

X_RMSECV_indVar()

Returns an arrary holding RMSECV for each variable in X acquired through cross validation after each computed component. First row is RMSECV for zero components, second row for component 1, third row for component 2, etc.

X_RMSEE()

Returns an array holding RMSEE across all variables in X acquired through calibration after each computed component. First row is RMSEE for zero components, second row for component 1, third row for component 2, etc.

X_RMSEE_indVar()

Returns an array holding RMSEE for each variable in array X acquired through calibration after each component. First row holds RMSEE for zero components, second row for component 1, third row for component 2, etc.

X_calExplVar()

Returns a list holding the calibrated explained variance for each component. First number in list is for component 1, second number for component 2, etc.

X_corrLoadings()

Returns array holding correlation loadings of array X. First column holds correlation loadings for component 1, second column holds correlation loadings for component 2, etc.

X_cumCalExplVar()

Returns a list holding the cumulative calibrated explained variance for array X after each component.

X_cumCalExplVar_indVar()

Returns an array holding the cumulative calibrated explained variance for each variable in X after each component. First row represents zero components, second row represents one component, third row represents two components, etc. Columns represent variables.

X_cumValExplVar()

Returns a list holding the cumulative validated explained variance for array X after each component. First number represents zero components, second number represents component 1, etc.

X_cumValExplVar_indVar()

Returns an array holding the cumulative validated explained variance for each variable in X after each component. First row represents zero components, second row represents component 1, third row for compnent 2, etc. Columns represent variables.

X_loadingWeights()

Returns an array holding X loadings weights.

X_loadings()

Returns array holding loadings of array X. Rows represent variables and columns represent components. First column holds loadings for component 1, second column holds scores for component 2, etc.

X_means()

Returns array holding the column means of X.

X_predCal()

Returns a dictionary holding the predicted arrays Xhat from calibration after each computed component. Dictionary key represents order of component.

X_predVal()

Returns dictionary holding arrays of predicted Xhat after each component from validation. Dictionary key represents order of component.

X_residuals()

Returns a dictionary holding the residual arrays for array X after each computed component. Dictionary key represents order of component.

X_scores()

Returns array holding scores of array X. First column holds scores for component 1, second column holds scores for component 2, etc.

X_scores_predict(Xnew, numComp=None)

Returns array of X scores from new X data using the exsisting model. Rows represent objects and columns represent components.

X_valExplVar()

Returns a list holding the validated explained variance for X after each component. First number in list is for component 1, second number for component 2, third number for component 3, etc.

Y_MSECV()

Returns an array holding MSECV of vector y acquired through cross validation after each computed component. First row is MSECV for zero components, second row component 1, third row for component 2, etc.

Y_MSEE()

Returns an array holding MSEE of vector y acquired through calibration after each component. First row holds MSEE for zero components, second row component 1, third row for component 2, etc.

Y_PRESSCV()

Returns an array holding PRESSECV for Y acquired through cross validation after each computed component. First row is PRESSECV for zero components, second row component 1, third row for component 2, etc.

Y_PRESSE()

Returns an array holding PRESSE for y acquired through calibration after each computed component. First row is PRESSE for zero components, second row component 1, third row for component 2, etc.

Y_RMSECV()

Returns an array holding RMSECV for vector y acquired through cross validation after each computed component. First row is RMSECV for zero components, second row component 1, third row for component 2, etc.

Y_RMSEE()

Returns an array holding RMSEE of vector y acquired through calibration after each computed component. First row is RMSEE for zero components, second row component 1, third row for component 2, etc.

Y_calExplVar()

Returns list holding calibrated explained variance for each component in vector y.

Y_corrLoadings()

Returns an array holding correlation loadings of vector y. Columns represent components. First column for component 1, second columns for component 2, etc.

Y_cumCalExplVar()

Returns a list holding the calibrated explained variance for each component. First number represent zero components, second number one component, etc.

Y_cumValExplVar()

Returns list holding cumulative validated explained variance in vector y.

Y_loadings()

Returns an array holding loadings of vector y. Columns represent components. First column for component 1, second columns for component 2, etc.

Y_means()

Returns an array holding the mean of vector y.

Y_predCal()

Returns dictionary holding arrays of predicted yhat after each component from calibration. Dictionary key represents order of components.

Y_predVal()

Returns dictionary holding arrays of predicted yhat after each component from validation. Dictionary key represents order of component.

Y_predict(Xnew, numComp=1)

Return predicted yhat from new measurements X.

Y_residuals()

Returns list of arrays holding residuals of vector y after each component.

Y_scores()

Returns scores of array Y (NOT IMPLEMENTED)

Y_valExplVar()

Returns list holding validated explained variance for each component in vector y.

__init__(arrX, vecy, numComp=3, Xstand=False, Ystand=False, cvType=['loo'])

On initialisation check how X and y are to be pre-processed (which mode is used). Then check whether number of PC’s chosen by user is OK. Then run NIPALS PLS1 algorithm.

corrLoadingsEllipses()

Returns coordinates of ellipses that represent 50% and 100% expl. variance in correlation loadings plot.

cvTrainAndTestData()

Returns a list consisting of dictionaries holding training and test sets.

modelSettings()

Returns a dictionary holding settings under which PLS1 was run.

regressionCoefficients(numComp=1)

Returns regression coefficients from the fitted model using all available samples and a chosen number of components.

PLSR2

class hoggorm.plsr2.nipalsPLS2(arrX, arrY, numComp=None, Xstand=False, Ystand=False, cvType=None)

This class carries out partial least squares regression (PLSR) for two arrays using NIPALS algorithm. The Y array is multivariate, which is why PLS2 is applied.

Parameters:
  • arrX (numpy array) – This is X in the PCR model. Number and order of objects (rows) must match those of arrY.
  • arrY (numpy array) – This is Y in the PCR model. Number and order of objects (rows) must match those of arrX.
  • numComp (int, optional) – An integer that defines how many components are to be computed. If not provided, the maximum possible number of components is used.
  • Xstand (boolean, optional) –

    Defines whether variables in arrX are to be standardised/scaled or centered.

    False : columns of arrX are mean centred (default)
    Xstand = False
    True : columns of arrX are mean centred and devided by their own standard deviation
    Xstand = True
  • Ystand (boolean, optional) –

    Defines whether variables in arrY are to be standardised/scaled or centered.

    False : columns of arrY are mean centred (default)
    Ystand = False
    True : columns of arrY are mean centred and devided by their own standard deviation
    Ystand = True
  • cvType (list, optional) –

    The list defines cross validation settings when computing the PCA model. Note if cvType is not provided, cross validation will not be performed and as such cross validation results will not be available. Choose cross validation type from the following:

    loo : leave one out / a.k.a. full cross validation (default)
    cvType = ["loo"]
    KFold : leave out one fold or segment
    cvType = ["KFold", numFolds]

    numFolds: int

    Number of folds or segments

  • lolo (leave one label out) –

    cvType = ["lolo", labelsList]

    labelsList: list

    Sequence of lables. Must be same lenght as number of rows in arrX and arrY. Leaves out objects with same lable.

Returns:

A class that contains the PLS2 model and computational results

Return type:

class

EXAMPLES

First import the hoggormpackage

>>> import hoggorm as ho

Import your data into a numpy array.

>>> np.shape(my_X_data)
(14, 292)
>>> np.shape(my_Y_data)
(14, 5)

Examples of how to compute a PLS2 model using different settings for the input parameters.

>>> model = ho.nipalsPLS2(arrX=my_X_data, arrY=my_Y_data, numComp=5)
>>> model = ho.nipalsPLS2(arrX=my_X_data, arrY=my_Y_data)
>>> model = ho.nipalsPLS2(arrX=my_X_data, arrY=my_Y_data, numComp=3, Ystand=True)
>>> model = ho.nipalsPLS2(arrX=my_X_data, arrY=my_Y_data, Xstand=False, Ystand=True)
>>> model = ho.nipalsPLS2(arrX=my_X_data, arrY=my_Y_data, cvType=["loo"])
>>> model = ho.nipalsPLS2(arrX=my_X_data, arrY=my_Y_data, cvType=["KFold", 7])
>>> model = ho.nipalsPLS2(arrX=my_X_data, arrY=my_Y_data, cvType=["lolo", [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7]])

Examples of how to extract results from the PLS2 model.

>>> X_scores = model.X_scores()
>>> X_loadings = model.X_loadings()
>>> Y_loadings = model.Y_loadings()
>>> X_cumulativeCalibratedExplainedVariance_allVariables = model.X_cumCalExplVar_indVar()
>>> Y_cumulativeValidatedExplainedVariance_total = model.Y_cumCalExplVar()
X_MSECV()

Returns an array holding MSECV across all variables in X acquired through cross validation after each computed component. First row is MSECV for zero components, second row for component 1, third row for component 2, etc.

X_MSECV_indVar()

Returns an arrary holding MSECV for each variable in X acquired through cross validation. First row is MSECV for zero components, second row for component 1, etc.

X_MSEE()

Returns an array holding MSEE across all variables in X acquired through calibration after each computed component. First row is MSEE for zero components, second row for component 1, third row for component 2, etc.

X_MSEE_indVar()

Returns an array holding MSEE for each variable in array X acquired through calibration after each computed component. First row holds MSEE for zero components, second row for component 1, third row for component 2, etc.

X_PRESSCV()

Returns an array holding PRESSCV across all variables in X acquired through cross validation after each computed component. First row is PRESSCV for zero components, second row for component 1, third row for component 2, etc.

X_PRESSCV_indVar()

Returns array holding PRESSCV for each individual variable in X acquired through cross validation after each computed component. First row is PRESSCV for zero components, second row for component 1, third row for component 2, etc.

X_PRESSE()

Returns array holding PRESSE across all variables in X acquired through calibration after each computed component. First row is PRESSE for zero components, second row for component 1, third row for component 2, etc.

X_PRESSE_indVar()

Returns array holding PRESSE for each individual variable in X acquired through calibration after each computed component. First row is PRESSE for zero components, second row for component 1, third row for component 2, etc.

X_RMSECV()

Returns an array holding RMSECV across all variables in X acquired through cross validation after each computed component. First row is RMSECV for zero components, second row for component 1, third row for component 2, etc.

X_RMSECV_indVar()

Returns an arrary holding RMSECV for each variable in X acquired through cross validation after each computed component. First row is RMSECV for zero components, second row for component 1, third row for component 2, etc.

X_RMSEE()

Returns an array holding RMSEE across all variables in X acquired through calibration after each computed component. First row is RMSEE for zero components, second row for component 1, third row for component 2, etc.

X_RMSEE_indVar()

Returns an array holding RMSEE for each variable in array X acquired through calibration after each component. First row holds RMSEE for zero components, second row for component 1, third row for component 2, etc.

X_calExplVar()

Returns a list holding the calibrated explained variance for each component. First number in list is for component 1, second number for component 2, etc.

X_corrLoadings()

Returns array holding correlation loadings of array X. First column holds correlation loadings for component 1, second column holds correlation loadings for component 2, etc.

X_cumCalExplVar()

Returns a list holding the cumulative calibrated explained variance for array X after each component.

X_cumCalExplVar_indVar()

Returns an array holding the cumulative calibrated explained variance for each variable in X after each component. First row represents zero components, second row represents one component, third row represents two components, etc. Columns represent variables.

X_cumValExplVar()

Returns a list holding the cumulative validated explained variance for array X after each component. First number represents zero components, second number represents component 1, etc.

X_cumValExplVar_indVar()

Returns an array holding the cumulative validated explained variance for each variable in X after each component. First row represents zero components, second row represents component 1, third row for compnent 2, etc. Columns represent variables.

X_loadingWeights()

Returns an array holding loadings weights of array X.

X_loadings()

Returns array holding loadings of array X. Rows represent variables and columns represent components. First column holds loadings for component 1, second column holds scores for component 2, etc.

X_means()

Returns a vector holding the column means of X.

X_predCal()

Returns a dictionary holding the predicted arrays Xhat from calibration after each computed component. Dictionary key represents order of component.

X_predVal()

Returns dictionary holding arrays of predicted Xhat after each component from validation. Dictionary key represents order of component.

X_residuals()

Returns a dictionary holding the residual arrays for array X after each computed component. Dictionary key represents order of component.

X_scores()

Returns array holding scores of array X. First column holds scores for component 1, second column holds scores for component 2, etc.

X_scores_predict(Xnew, numComp=None)

Returns array of X scores from new X data using the exsisting model. Rows represent objects and columns represent components.

X_valExplVar()

Returns a list holding the validated explained variance for X after each component. First number in list is for component 1, second number for component 2, third number for component 3, etc.

Y_MSECV()

Returns an array holding MSECV across all variables in Y acquired through cross validation after each computed component. First row is MSECV for zero components, second row component 1, third row for component 2, etc.

Y_MSECV_indVar()

Returns an array holding MSECV of each variable in array Y acquired through cross validation after each computed component. First row is MSECV for zero components, second row component 1, third row for component 2, etc.

Y_MSEE()

Returns an array holding MSEE across all variables in Y acquired through calibration after each computed component. First row is MSEE for zero components, second row for component 1, third row for component 2, etc.

Y_MSEE_indVar()

Returns an array holding MSEE for each variable in array Y acquired through calibration after each computed component. First row holds MSEE for zero components, second row for component 1, third row for component 2, etc.

Y_PRESSCV()

Returns an array holding PRESSCV across all variables in Y acquired through cross validation after each computed component. First row is PRESSCV for zero components, second row component 1, third row for component 2, etc.

Y_PRESSCV_indVar()

Returns an array holding PRESSCV of each variable in array Y acquired through cross validation after each computed component. First row is PRESSCV for zero components, second row component 1, third row for component 2, etc.

Y_PRESSE()

Returns array holding PRESSE across all variables in Y acquired through calibration after each computed component. First row is PRESSE for zero components, second row for component 1, third row for component 2, etc.

Y_PRESSE_indVar()

Returns array holding PRESSE for each individual variable in Y acquired through calibration after each component. First row is PRESSE for zero components, second row for component 1, third row for component 2, etc.

Y_RMSECV()

Returns an array holding RMSECV across all variables in Y acquired through cross validation after each computed component. First row is RMSECV for zero components, second row component 1, third row for component 2, etc.

Y_RMSECV_indVar()

Returns an array holding RMSECV for each variable in array Y acquired through cross validation after each computed component. First row is RMSECV for zero components, second row component 1, third row for component 2, etc.

Y_RMSEE()

Returns an array holding RMSEE across all variables in Y acquired through calibration after each computed component. First row is RMSEE for zero components, second row for component 1, third row for component 2, etc.

Y_RMSEE_indVar()

Returns an array holding RMSEE for each variable in array Y acquired through calibration after each component. First row holds RMSEE for zero components, second row for component 1, third row for component 2, etc.

Y_calExplVar()

Returns a list holding the calibrated explained variance for each component. First number in list is for component 1, second number for component 2, etc.

Y_corrLoadings()

Returns array holding correlation loadings of array X. First column holds correlation loadings for component 1, second column holds correlation loadings for component 2, etc.

Y_cumCalExplVar()

Returns a list holding the cumulative calibrated explained variance for array X after each component. First number represents zero components, second number represents component 1, etc.

Y_cumCalExplVar_indVar()

Returns an array holding the cumulative calibrated explained variance for each variable in Y after each component. First row represents zero components, second row represents one component, third row represents two components, etc. Columns represent variables.

Y_cumValExplVar()

Returns a list holding the cumulative validated explained variance for array X after each component. First number represents zero components, second number represents component 1, etc.

Y_cumValExplVar_indVar()

Returns an array holding the cumulative validated explained variance for each variable in Y after each component. First row represents zero components, second row represents component 1, third row for compnent 2, etc. Columns represent variables.

Y_loadings()

Returns an array holding loadings C of array Y. Rows represent variables and columns represent components. First column for component 1, second columns for component 2, etc.

Y_means()

Returns a vector holding the column means of array Y.

Y_predCal()

Returns dictionary holding arrays of predicted Yhat after each component from calibration. Dictionary key represents order of components.

Y_predVal()

Returns dictionary holding arrays of predicted Yhat after each component from validation. Dictionary key represents order of component.

Y_predict(Xnew, numComp=1)

Return predicted Yhat from new measurements X.

Y_residuals()

Returns a dictionary holding residuals F of array Y after each component. Dictionary key represents order of component.

Y_scores()

Returns an array holding loadings C of array Y. Rows represent variables and columns represent components. First column for component 1, second columns for component 2, etc.

Y_valExplVar()

Returns a list holding the validated explained variance for Y after each component. First number in list is for component 1, second number for component 2, third number for component 3, etc.

__init__(arrX, arrY, numComp=None, Xstand=False, Ystand=False, cvType=None)

On initialisation check whether number of PC’s chosen by user is given and smaller than maximum number of PC’s possible.Then check how X and Y are to be pre-processed (whether ‘Xstand’ and ‘Ystand’ are used). Then run NIPALS PLS2 algorithm.

corrLoadingsEllipses()

Returns the coordinates of ellipses that represent 50% and 100% expl. variance in correlation loadings plot.

cvTrainAndTestData()

Returns a list consisting of dictionaries holding training and test sets.

modelSettings()

Returns a dictionary holding settings under which PLS2 was run.

regressionCoefficients(numComp=1)

Returns regression coefficients from the fitted model using all available samples and a chosen number of components.

scoresRegressionCoeffs()

Returns a one dimensional array holding regression coefficients between scores of array X and Y.