Quickstart¶

hoggorm is a Python package for explorative multivariate statistics in Python. It contains

PCA (principal component analysis)
PCR (principal component regression)
PLSR (partial least squares regression)
- PLSR1 for univariate responses
- PLSR2 for multivariate responses
matrix correlation coefficients RV and RV2.

Unlike scikit-learn, which is an excellent Python machine learning package focusing on classification and prediction, hoggorm rather aims at understanding and interpretation of the variance in the data. hoggorm also contains tools for prediction.

Note

Results computed with the hoggorm package can be visualised using plotting functions implemented in the complementary hoggormplot package.

Requirements¶

Make sure that Python 3.5 or higher is installed. A convenient way to install Python and many useful packages for scientific computing is to use the Anaconda distribution.

numpy >= 1.11.3

Installation and upgrades¶

Installation¶

Install hoggorm easily from the command line from the PyPI - the Python Packaging Index.

pip install hoggorm

Upgrading¶

To upgrade hoggorm from a previously installed older version execute the following from the command line:

pip install --upgrade hoggorm

If you need more information on how to install Python packages using pip, please see the pip documentation.

Documentation¶

Documentation at Read the Docs
Jupyter notebooks with examples of how to use hoggorm
- for PCA
  - PCA on cancer data on men in OECD countries
  - PCA on NIR spectroscopy data measured on gasoline
  - PCA on sensory data measured on cheese
- for PCR
  - PCR on sensory and fluorescence spectroscopy data measured on cheese
- for PLSR1 for univariate response (one response variable)
  - PLSR1 on NIR spectroscopy and octane data measured on gasoline
- for PLSR2 for multivariate response (multiple response variables)
  - PLSR2 on sensory and fluorescence spectroscopy data measured on cheese
- for matrix correlation coefficients RV and RV2
  - RV and RV2 coefficient on sensory and fluorescence spectroscopy data measured on cheese
- for the SMI (similarity of matrix index)
  - SMI on sensory data and fluorescence data measured on cheese
  - SMI on pseudo-random numbers

More examples in Jupyter notebooks are provided at hoggormExamples GitHub repository.

Example¶

# Import hoggorm
>>> import hoggorm as ho

# Consumer liking data of 5 consumers stored in a numpy array
>>> print(my_data)
[[2 4 2 7 6]
 [4 7 4 3 6]
 [3 3 2 5 2]
 [5 9 6 4 4]
 [1 2 1 3 4]]

# Compute PCA model with
# - 3 components
# - standardised/scaled variables (features or columns)
# - Leave-one-out (LOO) cross validation
>>> model = ho.nipalsPCA(arrX=my_data, numComp=3, Xstand=True, cvType=["loo"])

# Extract results from PCA model
# Get PCA scores
>>> scores = model.X_scores()
>>> print(scores)
[[-0.97535198 -1.71827581  0.43672952]
 [ 1.28340424 -0.24453505 -0.98250731]
 [-0.9127492   0.97132275  1.04708189]
 [ 2.34954599  0.30633998  0.43178679]
 [-1.74484905  0.68514813 -0.93309089]]

# Get PCA loadings
>>> loadings = model.X_loadings()
>>> print(loadings)
[[ 0.55080115  0.10025801  0.25045298]
 [ 0.57184198 -0.11712858  0.00316316]
 [ 0.57141459  0.00568809  0.10503941]
 [-0.1682551  -0.61149788  0.77153937]
 [ 0.12161589 -0.77605877 -0.57528864]]

# Get cumulative explained variance for each variable
>>> cumCalExplVar_allVariables = model.X_cumCalExplVar_indVar()
>>> print(cumCalExplVar_allVariables)
[[ 0.          0.          0.          0.          0.        ]
 [90.98654597 98.07234952 97.92497156  8.48956314  4.43690992]
 [92.12195756 99.62227118 97.92862256 50.73769558 72.47502242]
 [97.31181824 99.62309922 98.84150821 99.98958248 99.85786661]]

# Get cumulative explained variance for all variables
>>> cumCalExplVar_total = model.X_cumValExplVar()
>>> print(cumCalExplVar_total)
[0.0, 35.43333631454735, 32.12929746015379, 71.32495809880507]

hoggorm repository on GitHub¶

The source code is available at the hoggorm GitHub repository.

Testing¶

The correctness of the results provided PCA, PCR and PLSR may be checked using the tests provided in the tests folder.

After cloning the repository to your disk, at the command line navigate to the test folder. The code below shows an example of how to run the test for PCA.

python test_pca.py

After testing is finished, pytest should report that none of tests failed.