Hybrid Mapping Techniques#

PCovR#

class skmatter.decomposition.PCovR(mixing=0.5, n_components=None, svd_solver='auto', tol=1e-12, space='auto', regressor=None, iterated_power='auto', random_state=None, whiten=False)[source]#

Bases: RegressorMixin, MultiOutputMixin, _BasePCov

Principal Covariates Regression (PCovR).

As described in [deJong1992], PCovR determines a latent-space projection \(\mathbf{T}\) which minimizes a combined loss in supervised and unsupervised tasks.

This projection is determined by the eigendecomposition of a modified gram matrix \(\mathbf{\tilde{K}}\)

\[\mathbf{\tilde{K}} = \alpha \mathbf{X} \mathbf{X}^T + (1 - \alpha) \mathbf{\hat{Y}}\mathbf{\hat{Y}}^T\]

where \(\alpha\) is a mixing parameter and \(\mathbf{X}\) and \(\mathbf{\hat{Y}}\) are matrices of shapes \((n_{samples}, n_{features})\) and \((n_{samples}, n_{properties})\), respectively, which contain the input and approximate targets. For \((n_{samples} < n_{features})\), this can be more efficiently computed using the eigendecomposition of a modified covariance matrix \(\mathbf{\tilde{C}}\)

\[\mathbf{\tilde{C}} = \alpha \mathbf{X}^T \mathbf{X} + (1 - \alpha) \left(\left(\mathbf{X}^T \mathbf{X}\right)^{-\frac{1}{2}} \mathbf{X}^T \mathbf{\hat{Y}}\mathbf{\hat{Y}}^T \mathbf{X} \left(\mathbf{X}^T \mathbf{X}\right)^{-\frac{1}{2}}\right)\]

For all PCovR methods, it is strongly suggested that \(\mathbf{X}\) and \(\mathbf{Y}\) are centered and scaled to unit variance, otherwise the results will change drastically near \(\alpha \to 0\) and \(\alpha \to 1\). This can be done with the companion preprocessing classes, where

>>> from skmatter.preprocessing import StandardFlexibleScaler as SFS
>>> import numpy as np
>>>
>>> # Set column_wise to True when the columns are relative to one another,
>>> # False otherwise.
>>> scaler = SFS(column_wise=True)
>>>
>>> A = np.array([[1, 2], [2, 1]])  # replace with your matrix
>>> scaler.fit(A)
StandardFlexibleScaler(column_wise=True)
>>> A = scaler.transform(A)
Parameters:
  • mixing (float, default=0.5) – mixing parameter, as described in PCovR as \({\alpha}\), here named to avoid confusion with regularization parameter alpha

  • n_components (int, float or str, default=None) –

    Number of components to keep. if n_components is not set all components are kept:

    n_components == min(n_samples, n_features)
    

  • svd_solver ({'auto', 'full', 'arpack', 'randomized'}, default='auto') –

    If auto :

    The solver is selected by a default policy based on X.shape and n_components: if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient ‘randomized’ method is enabled. Otherwise the exact full SVD is computed and optionally truncated afterwards.

    If full :

    run exact full SVD calling the standard LAPACK solver via scipy.linalg.svd and select the components by postprocessing

    If arpack :

    run SVD truncated to n_components calling ARPACK solver via scipy.sparse.linalg.svds. It requires strictly 0 < n_components < min(X.shape)

    If randomized :

    run randomized SVD by the method of Halko et al.

  • tol (float, default=1e-12) – Tolerance for singular values computed by svd_solver == ‘arpack’. Must be of range [0.0, infinity).

  • space ({'feature', 'sample', 'auto'}, default='auto') – whether to compute the PCovR in sample or feature space default=`sample` when \({n_{samples} < n_{features}}\) and feature when \({n_{features} < n_{samples}}\)

  • regressor ({Ridge, RidgeCV, LinearRegression, precomputed}, default=None) – regressor for computing approximated \({\mathbf{\hat{Y}}}\). The regressor should be one sklearn.linear_model.Ridge, sklearn.linear_model.RidgeCV, or sklearn.linear_model.LinearRegression. If a pre-fitted regressor is provided, it is used to compute \({\mathbf{\hat{Y}}}\). Note that any pre-fitting of the regressor will be lost if PCovR is within a composite estimator that enforces cloning, e.g., sklearn.compose.TransformedTargetRegressor or sklearn.pipeline.Pipeline with model caching. In such cases, the regressor will be re-fitted on the same training data as the composite estimator. If precomputed, we assume that the y passed to the fit function is the regressed form of the targets \({\mathbf{\hat{Y}}}\). If None, sklearn.linear_model.Ridge('alpha':1e-6, 'fit_intercept':False, 'tol':1e-12) is used as the regressor.

  • iterated_power (int or 'auto', default='auto') – Number of iterations for the power method computed by svd_solver == ‘randomized’. Must be of range [0, infinity).

  • random_state (int, numpy.random.RandomState instance or None, default=None) – Used when the ‘arpack’ or ‘randomized’ solvers are used. Pass an int for reproducible results across multiple function calls.

  • whiten (boolean, deprecated)

mixing#

mixing parameter, as described in PCovR as \({\alpha}\)

Type:

float, default=0.5

tol#

Tolerance for singular values computed by svd_solver == ‘arpack’. Must be of range [0.0, infinity).

Type:

float, default=1e-12

space#

whether to compute the PCovR in sample or feature space default=`sample` when \({n_{samples} < n_{features}}\) and feature when \({n_{features} < n_{samples}}\)

Type:

{‘feature’, ‘sample’, ‘auto’}, default=’auto’

n_components_#

The estimated number of components, which equals the parameter n_components, or the lesser value of n_features and n_samples if n_components is None.

Type:

int

pxt_#

the projector, or weights, from the input space \(\mathbf{X}\) to the latent-space projection \(\mathbf{T}\)

Type:

numpy.ndarray of size \(({n_{samples}, n_{components}})\)

pxy_#

the projector, or weights, from the input space \(\mathbf{X}\) to the properties \(\mathbf{Y}\)

Type:

numpy.ndarray of size \(({n_{samples}, n_{properties}})\)

pty_#

the projector, or weights, from the latent-space projection \(\mathbf{T}\) to the properties \(\mathbf{Y}\)

Type:

numpy.ndarray of size \(({n_{components}, n_{properties}})\)

explained_variance_#

The amount of variance explained by each of the selected components. Equal to n_components largest eigenvalues of the PCovR-modified covariance matrix of \(\mathbf{X}\).

Type:

numpy.ndarray of shape (n_components,)

singular_values_#

The singular values corresponding to each of the selected components.

Type:

numpy.ndarray of shape (n_components,)

Examples

>>> import numpy as np
>>> from skmatter.decomposition import PCovR
>>> X = np.array([[-1, 1, -3, 1], [1, -2, 1, 2], [-2, 0, -2, -2], [1, 0, 2, -1]])
>>> Y = np.array([[0, -5], [-1, 1], [1, -5], [-3, 2]])
>>> pcovr = PCovR(mixing=0.1, n_components=2)
>>> pcovr.fit(X, Y)
PCovR(mixing=0.1, n_components=2)
>>> pcovr.transform(X)
array([[ 3.2630561 ,  0.06663787],
       [-2.69395511, -0.41582771],
       [ 3.48683147, -0.83164387],
       [-4.05593245,  1.18083371]])
>>> pcovr.predict(X)
array([[ 0.01371776, -5.00945512],
       [-1.02805338,  1.06736871],
       [ 0.98166504, -4.98307078],
       [-2.9963189 ,  1.98238856]])
fit(X, Y, W=None)[source]#

Fit the model with X and Y. Depending on the dimensions of X, calls either _fit_feature_space or _fit_sample_space

Parameters:
  • X (numpy.ndarray, shape (n_samples, n_features)) –

    Training data, where n_samples is the number of samples and n_features is the number of features.

    It is suggested that \(\mathbf{X}\) be centered by its column- means and scaled. If features are related, the matrix should be scaled to have unit variance, otherwise \(\mathbf{X}\) should be scaled so that each feature has a variance of 1 / n_features.

  • Y (numpy.ndarray, shape (n_samples, n_properties)) –

    Training data, where n_samples is the number of samples and n_properties is the number of properties

    It is suggested that \(\mathbf{Y}\) be centered by its column- means and scaled. If features are related, the matrix should be scaled to have unit variance, otherwise \(\mathbf{Y}\) should be scaled so that each feature has a variance of 1 / n_features.

    If the passed regressor = precomputed, it is assumed that Y is the regressed form of the properties, \({\mathbf{\hat{Y}}}\).

  • W (numpy.ndarray, shape (n_features, n_properties)) – Regression weights, optional when regressor= precomputed. If not passed, it is assumed that W = np.linalg.lstsq(X, Y, self.tol)[0]

_fit_feature_space(X, Y, Yhat)[source]#

In feature-space PCovR, the projectors are determined by:

\[\mathbf{\tilde{C}} = \alpha \mathbf{X}^T \mathbf{X} + (1 - \alpha) \left(\left(\mathbf{X}^T \mathbf{X}\right)^{-\frac{1}{2}} \mathbf{X}^T \mathbf{\hat{Y}}\mathbf{\hat{Y}}^T \mathbf{X} \left(\mathbf{X}^T \mathbf{X}\right)^{-\frac{1}{2}}\right)\]

where

\[\mathbf{P}_{XT} = (\mathbf{X}^T \mathbf{X})^{-\frac{1}{2}} \mathbf{U}_\mathbf{\tilde{C}}^T \mathbf{\Lambda}_\mathbf{\tilde{C}}^{\frac{1}{2}}\]
\[\mathbf{P}_{TX} = \mathbf{\Lambda}_\mathbf{\tilde{C}}^{-\frac{1}{2}} \mathbf{U}_\mathbf{\tilde{C}}^T (\mathbf{X}^T \mathbf{X})^{\frac{1}{2}}\]
\[\mathbf{P}_{TY} = \mathbf{\Lambda}_\mathbf{\tilde{C}}^{-\frac{1}{2}} \mathbf{U}_\mathbf{\tilde{C}}^T (\mathbf{X}^T \mathbf{X})^{-\frac{1}{2}} \mathbf{X}^T \mathbf{Y}\]
_fit_sample_space(X, Y, Yhat, W)[source]#

In sample-space PCovR, the projectors are determined by:

\[\mathbf{\tilde{K}} = \alpha \mathbf{X} \mathbf{X}^T + (1 - \alpha) \mathbf{\hat{Y}}\mathbf{\hat{Y}}^T\]

where

\[\mathbf{P}_{XT} = \left(\alpha \mathbf{X}^T + (1 - \alpha) \mathbf{W} \mathbf{\hat{Y}}^T\right) \mathbf{U}_\mathbf{\tilde{K}} \mathbf{\Lambda}_\mathbf{\tilde{K}}^{-\frac{1}{2}}\]
\[\mathbf{P}_{TX} = \mathbf{\Lambda}_\mathbf{\tilde{K}}^{-\frac{1}{2}} \mathbf{U}_\mathbf{\tilde{K}}^T \mathbf{X}\]
\[\mathbf{P}_{TY} = \mathbf{\Lambda}_\mathbf{\tilde{K}}^{-\frac{1}{2}} \mathbf{U}_\mathbf{\tilde{K}}^T \mathbf{Y}\]
transform(X=None)[source]#

Apply dimensionality reduction to X.

X is projected on the first principal components as determined by the modified PCovR distances.

Parameters:

X (numpy.ndarray, shape (n_samples, n_features)) – New data, where n_samples is the number of samples and n_features is the number of features.

predict(X=None, T=None)[source]#

Predicts the property values using regression on X or T.

inverse_transform(T)[source]#

Transform data back to its original space.

\[\mathbf{\hat{X}} = \mathbf{T} \mathbf{P}_{TX} = \mathbf{X} \mathbf{P}_{XT} \mathbf{P}_{TX}\]
Parameters:

T (ndarray, shape (n_samples, n_components)) – Projected data, where n_samples is the number of samples and n_components is the number of components.

Returns:

X_original (numpy.ndarray, shape (n_samples, n_features))

score(X, y, T=None)[source]#

Return the (negative) total reconstruction error for X and Y, defined as:

\[\ell_{X} = \frac{\lVert \mathbf{X} - \mathbf{T}\mathbf{P}_{TX} \rVert ^ 2} {\lVert \mathbf{X}\rVert ^ 2}\]

and

\[\ell_{Y} = \frac{\lVert \mathbf{Y} - \mathbf{T}\mathbf{P}_{TY} \rVert ^ 2} {\lVert \mathbf{Y}\rVert ^ 2}\]

The negative loss \(-\ell = -(\ell_{X} + \ell{Y})\) is returned for easier use in sklearn pipelines, e.g., a grid search, where methods named ‘score’ are meant to be maximized.

Parameters:
  • X (numpy.ndarray of shape (n_samples, n_features)) – The data.

  • Y (numpy.ndarray of shape (n_samples, n_properties)) – The target.

Returns:

loss (float) – Negative sum of the loss in reconstructing X from the latent-space projection T and the loss in predicting Y from the latent-space projection T

PCovC#

class skmatter.decomposition.PCovC(mixing=0.5, n_components=None, svd_solver='auto', tol=1e-12, space='auto', classifier=None, iterated_power='auto', random_state=None, whiten=False)[source]#

Bases: LinearClassifierMixin, _BasePCov

Principal Covariates Classification (PCovC).

As described in [Jorgensen2025], PCovC determines a latent-space projection \(\mathbf{T}\) which minimizes a combined loss in supervised and unsupervised tasks.

This projection is determined by the eigendecomposition of a modified gram matrix \(\mathbf{\tilde{K}}\)

\[\mathbf{\tilde{K}} = \alpha \mathbf{X} \mathbf{X}^T + (1 - \alpha) \mathbf{Z}\mathbf{Z}^T\]

where \(\alpha\) is a mixing parameter, \(\mathbf{X}\) is an input matrix of shape \((n_{samples}, n_{features})\), and \(\mathbf{Z}\) is a matrix of class confidence scores of shape \((n_{samples}, n_{classes})\). For \((n_{samples} < n_{features})\), this can be more efficiently computed using the eigendecomposition of a modified covariance matrix \(\mathbf{\tilde{C}}\)

\[\mathbf{\tilde{C}} = \alpha \mathbf{X}^T \mathbf{X} + (1 - \alpha) \left(\left(\mathbf{X}^T \mathbf{X}\right)^{-\frac{1}{2}} \mathbf{X}^T \mathbf{Z}\mathbf{Z}^T \mathbf{X} \left(\mathbf{X}^T \mathbf{X}\right)^{-\frac{1}{2}}\right)\]

For all PCovC methods, it is strongly suggested that \(\mathbf{X}\) is centered and scaled to unit variance, otherwise the results will change drastically near \(\alpha \to 0\) and \(\alpha \to 1\). This can be done with the companion preprocessing classes, where

>>> from skmatter.preprocessing import StandardFlexibleScaler as SFS
>>> import numpy as np
>>>
>>> # Set column_wise to True when the columns are relative to one another,
>>> # False otherwise.
>>> scaler = SFS(column_wise=True)
>>>
>>> A = np.array([[1, 2], [2, 1]])  # replace with your matrix
>>> scaler.fit(A)
StandardFlexibleScaler(column_wise=True)
>>> A = scaler.transform(A)
Parameters:
  • mixing (float, default=0.5) – mixing parameter, as described in PCovC as \({\alpha}\), here named to avoid confusion with regularization parameter alpha

  • n_components (int, float or str, default=None) –

    Number of components to keep. if n_components is not set all components are kept:

    n_components == min(n_samples, n_features)
    

  • svd_solver ({'auto', 'full', 'arpack', 'randomized'}, default='auto') –

    If auto :

    The solver is selected by a default policy based on X.shape and n_components: if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient ‘randomized’ method is enabled. Otherwise the exact full SVD is computed and optionally truncated afterwards.

    If full :

    run exact full SVD calling the standard LAPACK solver via scipy.linalg.svd and select the components by postprocessing

    If arpack :

    run SVD truncated to n_components calling ARPACK solver via scipy.sparse.linalg.svds. It requires strictly 0 < n_components < min(X.shape)

    If randomized :

    run randomized SVD by the method of Halko et al.

  • tol (float, default=1e-12) – Tolerance for singular values computed by svd_solver == ‘arpack’. Must be of range [0.0, infinity).

  • space ({'feature', 'sample', 'auto'}, default='auto') –

    whether to compute the PCovC in sample or feature space

    default=`sample` when \({n_{samples} < n_{features}}\) and feature when \({n_{features} < n_{samples}}\)

    classifier: estimator object or precomputed, default=None

    classifier for computing \({\mathbf{Z}}\). The classifier should be one of the following:

    • sklearn.linear_model.LogisticRegression()

    • sklearn.linear_model.LogisticRegressionCV()

    • sklearn.svm.LinearSVC()

    • sklearn.discriminant_analysis.LinearDiscriminantAnalysis()

    • sklearn.linear_model.RidgeClassifier()

    • sklearn.linear_model.RidgeClassifierCV()

    • sklearn.linear_model.Perceptron()

    If a pre-fitted classifier is provided, it is used to compute \({\mathbf{Z}}\). Note that any pre-fitting of the classifier will be lost if PCovC is within a composite estimator that enforces cloning, e.g., sklearn.pipeline.Pipeline with model caching. In such cases, the classifier will be re-fitted on the same training data as the composite estimator. If None, sklearn.linear_model.LogisticRegression() is used as the classifier.

  • iterated_power (int or 'auto', default='auto') – Number of iterations for the power method computed by svd_solver == ‘randomized’. Must be of range [0, infinity).

  • random_state (int, RandomState instance or None, default=None) – Used when the ‘arpack’ or ‘randomized’ solvers are used. Pass an int for reproducible results across multiple function calls.

  • whiten (boolean, deprecated)

mixing#

mixing parameter, as described in PCovC as \({\alpha}\)

Type:

float, default=0.5

tol#

Tolerance for singular values computed by svd_solver == ‘arpack’. Must be of range [0.0, infinity).

Type:

float, default=1e-12

space#

whether to compute the PCovC in sample or feature space default=`sample` when \({n_{samples} < n_{features}}\) and feature when \({n_{features} < n_{samples}}\)

Type:

{‘feature’, ‘sample’, ‘auto’}, default=’auto’

n_components_#

The estimated number of components, which equals the parameter n_components, or the lesser value of n_features and n_samples if n_components is None.

Type:

int

classifier#

The linear classifier passed for fitting.

Type:

estimator object

z_classifier_#

The linear classifier fit between \(\mathbf{X}\) and \(\mathbf{Y}\).

Type:

estimator object

classifier_#

The linear classifier fit between \(\mathbf{T}\) and \(\mathbf{Y}\).

Type:

estimator object

pxt_#

the projector, or weights, from the input space \(\mathbf{X}\) to the latent-space projection \(\mathbf{T}\)

Type:

ndarray of size \(({n_{features}, n_{components}})\)

pxz_#

the projector, or weights, from the input space \(\mathbf{X}\) to the class confidence scores \(\mathbf{Z}\)

Type:

ndarray of size \(({n_{features}, })\) or \(({n_{features}, n_{classes}})\)

ptz_#

the projector, or weights, from the latent-space projection \(\mathbf{T}\) to the class confidence scores \(\mathbf{Z}\)

Type:

ndarray of size \(({n_{components}, })\) or \(({n_{components}, n_{classes}})\)

explained_variance_#

The amount of variance explained by each of the selected components. Equal to n_components largest eigenvalues of the PCovC-modified covariance matrix of \(\mathbf{X}\).

Type:

numpy.ndarray of shape (n_components,)

singular_values_#

The singular values corresponding to each of the selected components.

Type:

numpy.ndarray of shape (n_components,)

Examples

>>> import numpy as np
>>> from skmatter.decomposition import PCovC
>>> from sklearn.preprocessing import StandardScaler
>>> X = np.array([[-1, 0, -2, 3], [3, -2, 0, 1], [-3, 0, -1, -1], [1, 3, 0, -2]])
>>> X = StandardScaler().fit_transform(X)
>>> Y = np.array([0, 1, 2, 0])
>>> pcovc = PCovC(mixing=0.1, n_components=2)
>>> pcovc.fit(X, Y)
PCovC(mixing=0.1, n_components=2)
>>> pcovc.transform(X)
array([[-0.4794854 , -0.46228114],
       [ 1.9416966 ,  0.2532831 ],
       [-1.08744947,  0.89117784],
       [-0.37476173, -0.6821798 ]])
>>> pcovc.predict(X)
array([0, 1, 2, 0])
fit(X, Y, W=None)[source]#

Fit the model with X and Y.

Note that W is taken from the coefficients of a linear classifier fit between X and Y to compute Z:

\[\mathbf{Z} = \mathbf{X} \mathbf{W}\]

We then call either _fit_feature_space or _fit_sample_space, using Z as our approximation of Y. Finally, we refit a classifier on T and Y to obtain \(\mathbf{P}_{TZ}\).

Parameters:
  • X (numpy.ndarray, shape (n_samples, n_features)) –

    Training data, where n_samples is the number of samples and n_features is the number of features.

    It is suggested that \(\mathbf{X}\) be centered by its column- means and scaled. If features are related, the matrix should be scaled to have unit variance, otherwise \(\mathbf{X}\) should be scaled so that each feature has a variance of 1 / n_features.

  • Y (numpy.ndarray, shape (n_samples,)) – Training data, where n_samples is the number of samples.

  • W (numpy.ndarray, shape (n_features, n_classes)) – Classification weights, optional when classifier = precomputed. If not passed, it is assumed that the weights will be taken from a linear classifier fit between \(\mathbf{X}\) and \(\mathbf{Y}\)

_fit_feature_space(X, Y, Z)[source]#

In feature-space PCovC, the projectors are determined by:

\[\mathbf{\tilde{C}} = \alpha \mathbf{X}^T \mathbf{X} + (1 - \alpha) \left(\left(\mathbf{X}^T \mathbf{X}\right)^{-\frac{1}{2}} \mathbf{X}^T \mathbf{Z}\mathbf{Z}^T \mathbf{X} \left(\mathbf{X}^T \mathbf{X}\right)^{-\frac{1}{2}}\right)\]

where

\[\mathbf{P}_{XT} = (\mathbf{X}^T \mathbf{X})^{-\frac{1}{2}} \mathbf{U}_\mathbf{\tilde{C}}^T \mathbf{\Lambda}_\mathbf{\tilde{C}}^{\frac{1}{2}}\]
\[\mathbf{P}_{TX} = \mathbf{\Lambda}_\mathbf{\tilde{C}}^{-\frac{1}{2}} \mathbf{U}_\mathbf{\tilde{C}}^T (\mathbf{X}^T \mathbf{X})^{\frac{1}{2}}\]
_fit_sample_space(X, Y, Z, W)[source]#

In sample-space PCovC, the projectors are determined by:

\[\mathbf{\tilde{K}} = \alpha \mathbf{X} \mathbf{X}^T + (1 - \alpha) \mathbf{Z}\mathbf{Z}^T\]

where

\[\mathbf{P}_{XT} = \left(\alpha \mathbf{X}^T + (1 - \alpha) \mathbf{W} \mathbf{Z}^T\right) \mathbf{U}_\mathbf{\tilde{K}} \mathbf{\Lambda}_\mathbf{\tilde{K}}^{-\frac{1}{2}}\]
\[\mathbf{P}_{TX} = \mathbf{\Lambda}_\mathbf{\tilde{K}}^{-\frac{1}{2}} \mathbf{U}_\mathbf{\tilde{K}}^T \mathbf{X}\]
transform(X=None)[source]#

Apply dimensionality reduction to X.

X is projected on the first principal components as determined by the modified PCovC distances.

Parameters:

X (numpy.ndarray, shape (n_samples, n_features)) – New data, where n_samples is the number of samples and n_features is the number of features.

predict(X=None, T=None)[source]#

Predicts the property labels using classification on T.

inverse_transform(T)[source]#

Transform data back to its original space.

\[\mathbf{\hat{X}} = \mathbf{T} \mathbf{P}_{TX} = \mathbf{X} \mathbf{P}_{XT} \mathbf{P}_{TX}\]
Parameters:

T (ndarray, shape (n_samples, n_components)) – Projected data, where n_samples is the number of samples and n_components is the number of components.

Returns:

X_original (numpy.ndarray, shape (n_samples, n_features))

decision_function(X=None, T=None)[source]#

Predicts confidence scores from X or T.

\[\mathbf{Z} = \mathbf{T} \mathbf{P}_{TZ} = \mathbf{X} \mathbf{P}_{XT} \mathbf{P}_{TZ} = \mathbf{X} \mathbf{P}_{XZ}\]
Parameters:
  • X (ndarray, shape(n_samples, n_features)) – Original data for which we want to get confidence scores, where n_samples is the number of samples and n_features is the number of features.

  • T (ndarray, shape (n_samples, n_components)) – Projected data for which we want to get confidence scores, where n_samples is the number of samples and n_components is the number of components.

Returns:

Z (numpy.ndarray, shape (n_samples,) or (n_samples, n_classes)) – Confidence scores. For binary classification, has shape (n_samples,), for multiclass classification, has shape (n_samples, n_classes)

score(X, y, sample_weight=None)#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score (float) – Mean accuracy of self.predict(X) w.r.t. y.

Kernel PCovR#

class skmatter.decomposition.KernelPCovR(mixing=0.5, n_components=None, svd_solver='auto', regressor=None, kernel='linear', gamma=None, degree=3, coef0=1, kernel_params=None, center=False, fit_inverse_transform=False, tol=1e-12, n_jobs=None, iterated_power='auto', random_state=None)[source]#

Bases: _BaseKPCov

Kernel Principal Covariates Regression (KPCovR).

As described in [Helfrecht2020], KPCovR determines a latent-space projection \(\mathbf{T}\) which minimizes a combined loss in supervised and unsupervised tasks in the reproducing kernel Hilbert space (RKHS).

This projection is determined by the eigendecomposition of a modified gram matrix \(\mathbf{\tilde{K}}\)

\[\mathbf{\tilde{K}} = \alpha \mathbf{K} + (1 - \alpha) \mathbf{\hat{Y}}\mathbf{\hat{Y}}^T\]

where \(\alpha\) is a mixing parameter, \(\mathbf{K}\) is the input kernel of shape \((n_{samples}, n_{samples})\) and \(\mathbf{\hat{Y}}\) is the target matrix of shape \((n_{samples}, n_{properties})\).

Parameters:
  • mixing (float, default=0.5) – mixing parameter, as described in PCovR as \({\alpha}\)

  • n_components (int, float or str, default=None) –

    Number of components to keep. if n_components is not set all components are kept:

    n_components == n_samples
    

  • svd_solver ({'auto', 'full', 'arpack', 'randomized'}, default='auto') –

    If auto :

    The solver is selected by a default policy based on X.shape and n_components: if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient ‘randomized’ method is enabled. Otherwise the exact full SVD is computed and optionally truncated afterwards.

    If full :

    run exact full SVD calling the standard LAPACK solver via scipy.linalg.svd and select the components by postprocessing

    If arpack :

    run SVD truncated to n_components calling ARPACK solver via scipy.sparse.linalg.svds. It requires strictly 0 < n_components < min(X.shape)

    If randomized :

    run randomized SVD by the method of Halko et al.

  • regressor ({instance of sklearn.kernel_ridge.KernelRidge, precomputed, None}, default=None) –

    The regressor to use for computing the property predictions \(\hat{\mathbf{Y}}\). A pre-fitted regressor may be provided. If the regressor is not None, its kernel parameters (kernel, gamma, degree, coef0, and kernel_params) must be identical to those passed directly to KernelPCovR.

    If precomputed, we assume that the y passed to the fit function is the regressed form of the targets \({\mathbf{\hat{Y}}}\).

  • kernel ({'linear', 'poly', 'rbf', 'sigmoid', 'cosine', 'precomputed'} or callable, default='linear') – Kernel.

  • gamma (float, default=None) – Kernel coefficient for rbf, poly and sigmoid kernels. Ignored by other kernels.

  • degree (int, default=3) – Degree for poly kernels. Ignored by other kernels.

  • coef0 (float, default=1) – Independent term in poly and sigmoid kernels. Ignored by other kernels.

  • kernel_params (mapping of str to any, default=None) – Parameters (keyword arguments) and values for kernel passed as callable object. Ignored by other kernels.

  • center (bool, default=False) – Whether to center any computed kernels

  • fit_inverse_transform (bool, default=False) – Learn the inverse transform for non-precomputed kernels. (i.e. learn to find the pre-image of a point)

  • tol (float, default=1e-12) – Tolerance for singular values computed by svd_solver == ‘arpack’ and for matrix inversions. Must be of range [0.0, infinity).

  • n_jobs (int, default=None) – The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

  • iterated_power (int or 'auto', default='auto') – Number of iterations for the power method computed by svd_solver == ‘randomized’. Must be of range [0, infinity).

  • random_state (int, numpy.random.RandomState instance or None, default=None) – Used when the ‘arpack’ or ‘randomized’ solvers are used. Pass an int for reproducible results across multiple function calls.

pt__#

pseudo-inverse of the latent-space projection, which can be used to contruct projectors from latent-space

Type:

numpy.darray of size \(({n_{components}, n_{components}})\)

pkt_#

the projector, or weights, from the input kernel \(\mathbf{K}\) to the latent-space projection \(\mathbf{T}\)

Type:

numpy.ndarray of size \(({n_{samples}, n_{components}})\)

pky_#

the projector, or weights, from the input kernel \(\mathbf{K}\) to the properties \(\mathbf{Y}\)

Type:

numpy.ndarray of size \(({n_{samples}, n_{properties}})\)

pty_#

the projector, or weights, from the latent-space projection \(\mathbf{T}\) to the properties \(\mathbf{Y}\)

Type:

numpy.ndarray of size \(({n_{components}, n_{properties}})\)

ptx_#

the projector, or weights, from the latent-space projection \(\mathbf{T}\) to the feature matrix \(\mathbf{X}\)

Type:

numpy.ndarray of size \(({n_{components}, n_{features}})\)

X_fit_#

The data used to fit the model. This attribute is used to build kernels from new data.

Type:

numpy.ndarray of shape (n_samples, n_features)

Examples

>>> import numpy as np
>>> from skmatter.decomposition import KernelPCovR
>>> from skmatter.preprocessing import StandardFlexibleScaler as SFS
>>> from sklearn.kernel_ridge import KernelRidge
>>> X = np.array([[-1, 1, -3, 1], [1, -2, 1, 2], [-2, 0, -2, -2], [1, 0, 2, -1]])
>>> X = SFS().fit_transform(X)
>>> Y = np.array([[0, -5], [-1, 1], [1, -5], [-3, 2]])
>>> Y = SFS(column_wise=True).fit_transform(Y)
>>> kpcovr = KernelPCovR(
...     mixing=0.1,
...     n_components=2,
...     regressor=KernelRidge(kernel="rbf", gamma=1),
...     kernel="rbf",
...     gamma=1,
... )
>>> kpcovr.fit(X, Y)
KernelPCovR(gamma=1, kernel='rbf', mixing=0.1, n_components=2,
            regressor=KernelRidge(gamma=1, kernel='rbf'))
>>> kpcovr.transform(X)
array([[-0.61261285, -0.18937908],
       [ 0.45242098,  0.25453465],
       [-0.77871824,  0.04847559],
       [ 0.91186937, -0.21211816]])
>>> kpcovr.predict(X)
array([[ 0.5100212 , -0.99488463],
       [-0.18992219,  0.82064368],
       [ 1.11923584, -1.04798016],
       [-1.5635827 ,  1.11078662]])
>>> round(kpcovr.score(X, Y), 5)
np.float64(-0.52039)
fit(X, Y, W=None)[source]#

Fit the model with X and Y.

Parameters:
  • X (numpy.ndarray, shape (n_samples, n_features)) –

    Training data, where n_samples is the number of samples and n_features is the number of features.

    It is suggested that \(\mathbf{X}\) be centered by its column- means and scaled. If features are related, the matrix should be scaled to have unit variance, otherwise \(\mathbf{X}\) should be scaled so that each feature has a variance of 1 / n_features.

  • Y (numpy.ndarray, shape (n_samples, n_properties)) –

    Training data, where n_samples is the number of samples and n_properties is the number of properties

    It is suggested that \(\mathbf{X}\) be centered by its column- means and scaled. If features are related, the matrix should be scaled to have unit variance, otherwise \(\mathbf{Y}\) should be scaled so that each feature has a variance of 1 / n_features.

  • W (numpy.ndarray, shape (n_samples, n_properties)) – Regression weights, optional when regressor = precomputed. If not passed, it is assumed that W = np.linalg.lstsq(K, Y, self.tol)[0]

Returns:

self (object) – Returns the instance itself.

transform(X)[source]#

Apply dimensionality reduction to X.

X is projected on the first principal components as determined by the modified Kernel PCovR distances.

Parameters:

X (numpy.ndarray, shape (n_samples, n_features)) – New data, where n_samples is the number of samples and n_features is the number of features.

predict(X=None)[source]#

Predicts the property values

inverse_transform(T)[source]#

Transform input data back to its original space.

\[\mathbf{\hat{X}} = \mathbf{T} \mathbf{P}_{TX} = \mathbf{K} \mathbf{P}_{KT} \mathbf{P}_{TX}\]

Similar to KPCA, the original features are not always recoverable, as the projection is computed from the kernel features, not the original features, and the mapping between the original and kernel features is not one-to-one.

Parameters:

T (numpy.ndarray, shape (n_samples, n_components)) – Projected data, where n_samples is the number of samples and n_components is the number of components.

Returns:

X_original (numpy.ndarray, shape (n_samples, n_features))

score(X, y)[source]#

Computes the (negative) loss values for KernelPCovR on the given predictor and response variables.

The loss in \(\mathbf{K}\), as explained in [Helfrecht2020] does not correspond to a traditional Gram loss \(\mathbf{K} - \mathbf{TT}^T\). Indicating the kernel between set A and B as \(\mathbf{K}_{AB}\), the projection of set A as \(\mathbf{T}_A\), and with N and V as the train and validation/test set, one obtains

\[\ell=\frac{\operatorname{Tr}\left[\mathbf{K}_{VV} - 2 \mathbf{K}_{VN} \mathbf{T}_N (\mathbf{T}_N^T \mathbf{T}_N)^{-1} \mathbf{T}_V^T +\mathbf{T}_V(\mathbf{T}_N^T \mathbf{T}_N)^{-1} \mathbf{T}_N^T \mathbf{K}_{NN} \mathbf{T}_N (\mathbf{T}_N^T \mathbf{T}_N)^{-1} \mathbf{T}_V^T\right]}{\operatorname{Tr}(\mathbf{K}_{VV})}\]

The negative loss is returned for easier use in sklearn pipelines, e.g., a grid search, where methods named ‘score’ are meant to be maximized.

Parameters:
Returns:

L (float) – Negative sum of the KPCA and KRR losses, with the KPCA loss determined by the reconstruction of the kernel

Kernel PCovC#

class skmatter.decomposition.KernelPCovC(mixing=0.5, n_components=None, svd_solver='auto', classifier=None, kernel='linear', gamma=None, degree=3, coef0=1, kernel_params=None, center=False, fit_inverse_transform=False, tol=1e-12, n_jobs=None, iterated_power='auto', random_state=None)[source]#

Bases: LinearClassifierMixin, _BaseKPCov

Kernel Principal Covariates Classification (KPCovC).

KPCovC is a modification on the PrincipalCovariates Classification proposed in [Jorgensen2025]. It determines a latent-space projection \(\mathbf{T}\) which minimizes a combined loss in supervised and unsupervised tasks in the reproducing kernel Hilbert space (RKHS).

This projection is determined by the eigendecomposition of a modified gram matrix \(\mathbf{\tilde{K}}\)

\[\mathbf{\tilde{K}} = \alpha \mathbf{K} + (1 - \alpha) \mathbf{Z}\mathbf{Z}^T\]

where \(\alpha\) is a mixing parameter, \(\mathbf{K}\) is the input kernel of shape \((n_{samples}, n_{samples})\) and \(\mathbf{Z}\) is a matrix of class confidence scores of shape \((n_{samples}, n_{classes})\)

Parameters:
  • mixing (float, default=0.5) – mixing parameter, as described in PCovC as \({\alpha}\)

  • n_components (int, float or str, default=None) –

    Number of components to keep. if n_components is not set all components are kept:

    n_components == n_samples
    

  • svd_solver ({'auto', 'full', 'arpack', 'randomized'}, default='auto') –

    If auto :

    The solver is selected by a default policy based on X.shape and n_components: if the input data is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient ‘randomized’ method is enabled. Otherwise the exact full SVD is computed and optionally truncated afterwards.

    If full :

    run exact full SVD calling the standard LAPACK solver via scipy.linalg.svd and select the components by postprocessing

    If arpack :

    run SVD truncated to n_components calling ARPACK solver via scipy.sparse.linalg.svds. It requires strictly 0 < n_components < min(X.shape)

    If randomized :

    run randomized SVD by the method of Halko et al.

  • classifier (estimator object or precomputed, default=None) –

    classifier for computing \({\mathbf{Z}}\). The classifier should be one of the following:

    • sklearn.linear_model.LogisticRegression()

    • sklearn.linear_model.LogisticRegressionCV()

    • sklearn.svm.LinearSVC()

    • sklearn.discriminant_analysis.LinearDiscriminantAnalysis()

    • sklearn.linear_model.RidgeClassifier()

    • sklearn.linear_model.RidgeClassifierCV()

    • sklearn.linear_model.Perceptron()

    If a pre-fitted classifier is provided, it is used to compute \({\mathbf{Z}}\). If None, sklearn.linear_model.LogisticRegression() is used as the classifier.

  • kernel ({"linear", "poly", "rbf", "sigmoid", "precomputed"} or callable, default="linear") – Kernel.

  • gamma ({'scale', 'auto'} or float, default=None) – Kernel coefficient for rbf, poly and sigmoid kernels. Ignored by other kernels.

  • degree (int, default=3) – Degree for poly kernels. Ignored by other kernels.

  • coef0 (float, default=1) – Independent term in poly and sigmoid kernels. Ignored by other kernels.

  • kernel_params (mapping of str to any, default=None) – Parameters (keyword arguments) and values for kernel passed as callable object. Ignored by other kernels.

  • center (bool, default=False) – Whether to center any computed kernels

  • fit_inverse_transform (bool, default=False) – Learn the inverse transform for non-precomputed kernels. (i.e. learn to find the pre-image of a point)

  • tol (float, default=1e-12) – Tolerance for singular values computed by svd_solver == ‘arpack’ and for matrix inversions. Must be of range [0.0, infinity).

  • n_jobs (int, default=None) – The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

  • iterated_power (int or 'auto', default='auto') – Number of iterations for the power method computed by svd_solver == ‘randomized’. Must be of range [0, infinity).

  • random_state (int, numpy.random.RandomState instance or None, default=None) – Used when the ‘arpack’ or ‘randomized’ solvers are used. Pass an int for reproducible results across multiple function calls.

classifier#

The linear classifier passed for fitting. If pre-fitted, it is assummed to be fit on a precomputed kernel \(\mathbf{K}\) and \(\mathbf{Y}\).

Type:

estimator object

z_classifier_#

The linear classifier fit between the computed kernel \(\mathbf{K}\) and \(\mathbf{Y}\).

Type:

estimator object

classifier_#

The linear classifier fit between \(\mathbf{T}\) and \(\mathbf{Y}\).

Type:

estimator object

pt__#

pseudo-inverse of the latent-space projection, which can be used to contruct projectors from latent-space

Type:

numpy.darray of size \(({n_{components}, n_{components}})\)

pkt_#

the projector, or weights, from the input kernel \(\mathbf{K}\) to the latent-space projection \(\mathbf{T}\)

Type:

numpy.ndarray of size \(({n_{samples}, n_{components}})\)

pkz_#

the projector, or weights, from the input kernel \(\mathbf{K}\) to the class confidence scores \(\mathbf{Z}\)

Type:

numpy.ndarray of size \(({n_{samples}, })\) or \(({n_{samples}, n_{classes}})\)

ptz_#

the projector, or weights, from the latent-space projection \(\mathbf{T}\) to the class confidence scores \(\mathbf{Z}\)

Type:

numpy.ndarray of size \(({n_{components}, })\) or \(({n_{components}, n_{classes}})\)

ptx_#

the projector, or weights, from the latent-space projection \(\mathbf{T}\) to the feature matrix \(\mathbf{X}\)

Type:

numpy.ndarray of size \(({n_{components}, n_{features}})\)

X_fit_#

The data used to fit the model. This attribute is used to build kernels from new data.

Type:

numpy.ndarray of shape (n_samples, n_features)

Examples

>>> import numpy as np
>>> from skmatter.decomposition import KernelPCovC
>>> from sklearn.preprocessing import StandardScaler
>>> X = np.array([[-2, 3, -1, 0], [2, 0, -3, 1], [3, 0, -1, 3], [2, -2, 1, 0]])
>>> X = StandardScaler().fit_transform(X)
>>> Y = np.array([[2], [0], [1], [2]])
>>> kpcovc = KernelPCovC(
...     mixing=0.1,
...     n_components=2,
...     kernel="rbf",
...     gamma=1,
... )
>>> kpcovc.fit(X, Y)
KernelPCovC(gamma=1, kernel='rbf', mixing=0.1, n_components=2)
>>> kpcovc.transform(X)
array([[-4.45970689e-01,  8.95327566e-06],
       [ 4.52745933e-01,  5.54810948e-01],
       [ 4.52881359e-01, -5.54708315e-01],
       [-4.45921092e-01, -7.32157649e-05]])
>>> kpcovc.predict(X)
array([2, 0, 1, 2])
>>> kpcovc.score(X, Y)
1.0
fit(X, Y, W=None)[source]#

Fit the model with X and Y.

A computed kernel K is derived from X, and W is taken from the coefficients of a linear classifier fit between K and Y to compute Z:

\[\mathbf{Z} = \mathbf{K} \mathbf{W}\]

We then call either _fit_feature_space or _fit_sample_space, using Z as our approximation of Y. Finally, we refit a classifier on T and Y to obtain \(\mathbf{P}_{TZ}\).

Parameters:
  • X (numpy.ndarray, shape (n_samples, n_features)) –

    Training data, where n_samples is the number of samples and n_features is the number of features.

    It is suggested that \(\mathbf{X}\) be centered by its column- means and scaled. If features are related, the matrix should be scaled to have unit variance, otherwise \(\mathbf{X}\) should be scaled so that each feature has a variance of 1 / n_features.

  • Y (numpy.ndarray, shape (n_samples,)) – Training data, where n_samples is the number of samples.

  • W (numpy.ndarray, shape (n_features, n_classes)) – Classification weights, optional when classifier = precomputed. If not passed, it is assumed that the weights will be taken from a linear classifier fit between K and Y.

Returns:

self (object) – Returns the instance itself.

transform(X)[source]#

Apply dimensionality reduction to X.

X is projected on the first principal components as determined by the modified Kernel PCovR distances.

Parameters:

X (numpy.ndarray, shape (n_samples, n_features)) – New data, where n_samples is the number of samples and n_features is the number of features.

predict(X=None, T=None)[source]#

Predicts the property labels using classification on T.

inverse_transform(T)[source]#

Transform input data back to its original space.

\[\mathbf{\hat{X}} = \mathbf{T} \mathbf{P}_{TX} = \mathbf{K} \mathbf{P}_{KT} \mathbf{P}_{TX}\]

Similar to KPCA, the original features are not always recoverable, as the projection is computed from the kernel features, not the original features, and the mapping between the original and kernel features is not one-to-one.

Parameters:

T (numpy.ndarray, shape (n_samples, n_components)) – Projected data, where n_samples is the number of samples and n_components is the number of components.

Returns:

X_original (numpy.ndarray, shape (n_samples, n_features))

decision_function(X=None, T=None)[source]#

Predicts confidence scores from X or T.

\[\mathbf{Z} = \mathbf{T} \mathbf{P}_{TZ} = \mathbf{K} \mathbf{P}_{KT} \mathbf{P}_{TZ} = \mathbf{K} \mathbf{P}_{KZ}\]
Parameters:
  • X (ndarray, shape(n_samples, n_features)) – Original data for which we want to get confidence scores, where n_samples is the number of samples and n_features is the number of features.

  • T (ndarray, shape (n_samples, n_components)) – Projected data for which we want to get confidence scores, where n_samples is the number of samples and n_components is the number of components.

Returns:

Z (numpy.ndarray, shape (n_samples,) or (n_samples, n_classes)) – Confidence scores. For binary classification, has shape (n_samples,), for multiclass classification, has shape (n_samples, n_classes)

score(X, y, sample_weight=None)#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score (float) – Mean accuracy of self.predict(X) w.r.t. y.