Utililty classes and functions

There are number of functions and classes that might be useful for working with data outside the Hoggorm package. They are provided here for convenience.

Functions in hoggorm.statTools module

The hoggorm.statTools module provides some functions that can be useful when working with multivariate data sets.

hoggorm.statTools.center(arr, axis=0)

This function centers an array column-wise or row-wise.

Parameters:arrX (numpy array) – A numpy array containing the data
Returns:Mean centered data.
Return type:numpy array

Examples

>>> import hoggorm as ho
>>> # Column centering of array
>>> centData = ho.center(data, axis=0)
>>> # Row centering of array
>>> centData = ho.center(data, axis=1)
hoggorm.statTools.matrixRank(arr, tol=1e-08)

Computes the rank of an array/matrix, i.e. number of linearly independent variables. This is not the same as numpy.rank() which only returns the number of ways (2-way, 3-way, etc) an array/matrix has.

Parameters:arrX (numpy array) – A numpy array containing the data
Returns:Rank of matrix.
Return type:scalar

Examples

>>> import hoggorm as ho
>>>
>>> # Get the rank of the data
>>> ho.matrixRank(myData)
>>> 8
hoggorm.statTools.ortho(arr1, arr2)

This function orthogonalises arr1 with respect to arr2. The function then returns orthogonalised array arr1_orth.

Parameters:
  • arr1 (numpy array) – A numpy array containing some data
  • arr2 (numpy array) – A numpy array containing some data
Returns:

A numpy array holding orthogonalised numpy array arr1.

Return type:

numpy array

Examples

some examples

hoggorm.statTools.standardise(arr, mode=0)

This function standardises the input array either column-wise (mode = 0) or row-wise (mode = 1).

Parameters:
  • arrX (numpy array) – A numpy array containing the data
  • selection (int) – An integer indicating whether standardisation should happen column wise or row wise.
Returns:

Standardised data.

Return type:

numpy array

Examples

>>> import hoggorm as ho
>>> # Standardise array column-wise
>>> standData = ho.standardise(data, mode=0)
>>> # Standardise array row-wise
>>> standData = ho.standarise(data, mode=1)

Cross validation classes in hoggorm.cross_val module

Hoggorm classes PCA, PLSR and PCR use a number classes for computation of the models which are found in the hoggorm.cross_val module.

The cross validation classes in this module are used inside the multivariate statistical methods and may be called upon using the cvType input parameter for these methods. They are not intended to be used outside the multivariate statistical methods, even though it is possible. They are shown here to illustrate how the different cross validation options work.

The code in this module is based on the cross_val.py module from scikt-learn 0.4. It is adapted to work with hoggorm.

Authors:

Alexandre Gramfort <alexandre.gramfort@inria.fr>

Gael Varoquaux <gael.varoquaux@normalesup.org>

License: BSD Style.

class hoggorm.cross_val.KFold(n, k)

K-Folds cross validation iterator: Provides train/test indexes to split data in train test sets

__init__(n, k)

K-Folds cross validation iterator: Provides train/test indexes to split data in train test sets

Parameters:
  • n (int) – Total number of elements
  • k (int) – number of folds

Examples

>>> import hoggorm as ho
>>> X = [[1, 2], [3, 4], [1, 2], [3, 4]]
>>> y = [1, 2, 3, 4]
>>> kf = ho.KFold(4, k=2)
>>> for train_index, test_index in kf:
...    print "TRAIN:", train_index, "TEST:", test_index
...    X_train, X_test, y_train, y_test = cross_val.split(train_index, test_index, X, y)
TRAIN: [False False  True  True] TEST: [ True  True False False]
TRAIN: [ True  True False False] TEST: [False False  True  True]

Notes

All the folds have size trunc(n/k), the last one has the complementary

class hoggorm.cross_val.LeaveOneLabelOut(labels)

Leave-One-Label_Out cross-validation iterator: Provides train/test indexes to split data in train test sets

__init__(labels)

Leave-One-Label_Out cross validation: Provides train/test indexes to split data in train test sets

Parameters:labels (list) – List of labels

Examples

>>> import hoggorm as ho
>>> X = [[1, 2], [3, 4], [5, 6], [7, 8]]
>>> y = [1, 2, 1, 2]
>>> labels = [1, 1, 2, 2]
>>> lolo = ho.LeaveOneLabelOut(labels)
>>> for train_index, test_index in lol:
...    print "TRAIN:", train_index, "TEST:", test_index
...    X_train, X_test, y_train, y_test = cross_val.split(train_index,             test_index, X, y)
...    print X_train, X_test, y_train, y_test
TRAIN: [False False  True  True] TEST: [ True  True False False]
[[5 6]
[7 8]] [[1 2]
[3 4]] [1 2] [1 2]
TRAIN: [ True  True False False] TEST: [False False  True  True]
[[1 2]
[3 4]] [[5 6]
[7 8]] [1 2] [1 2]
class hoggorm.cross_val.LeaveOneOut(n)

Leave-One-Out cross validation iterator: Provides train/test indexes to split data in train test sets

__init__(n)

Leave-One-Out cross validation iterator: Provides train/test indexes to split data in train test sets

Parameters:n (int) – Total number of elements

Examples

>>> import hoggorm as ho
>>> X = [[1, 2], [3, 4]]
>>> y = [1, 2]
>>> loo = ho.LeaveOneOut(2)
>>> for train_index, test_index in loo:
...    print "TRAIN:", train_index, "TEST:", test_index
...    X_train, X_test, y_train, y_test = cross_val.split(train_index, test_index, X, y)
...    print X_train, X_test, y_train, y_test
TRAIN: [False  True] TEST: [ True False]
[[3 4]] [[1 2]] [2] [1]
TRAIN: [ True False] TEST: [False  True]
[[1 2]] [[3 4]] [1] [2]
class hoggorm.cross_val.LeavePOut(n, p)

Leave-P-Out cross validation iterator: Provides train/test indexes to split data in train test sets

__init__(n, p)

Leave-P-Out cross validation iterator: Provides train/test indexes to split data in train test sets

Parameters:
  • n (int) – Total number of elements
  • p (int) – Size test sets

Examples

>>> import hoggorm as ho
>>> X = [[1, 2], [3, 4], [5, 6], [7, 8]]
>>> y = [1, 2, 3, 4]
>>> lpo = ho.LeavePOut(4, 2)
>>> for train_index, test_index in lpo:
...    print "TRAIN:", train_index, "TEST:", test_index
...    X_train, X_test, y_train, y_test = cross_val.split(train_index, test_index, X, y)
TRAIN: [False False  True  True] TEST: [ True  True False False]
TRAIN: [False  True False  True] TEST: [ True False  True False]
TRAIN: [False  True  True False] TEST: [ True False False  True]
TRAIN: [ True False False  True] TEST: [False  True  True False]
TRAIN: [ True False  True False] TEST: [False  True False  True]
TRAIN: [ True  True False False] TEST: [False False  True  True]
hoggorm.cross_val.split(train_indexes, test_indexes, *args)

For each arg return a train and test subsets defined by indexes provided in train_indexes and test_indexes