.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/regression/Ridge2FoldCVRegularization.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_regression_Ridge2FoldCVRegularization.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_regression_Ridge2FoldCVRegularization.py:


Ridge2FoldCV for data with low effective rank
=============================================
In this notebook we explain in more detail how
:class:`skmatter.linear_model.Ridge2FoldCV` speeds up the cross-validation optimizing
the regularitzation parameter :param alpha: and compare it with existing solution for
that in scikit-learn :class:`slearn.linear_model.RidgeCV`.
:class:`skmatter.linear_model.Ridge2FoldCV` was designed to predict efficiently feature
matrices, but it can be also useful for the prediction single targets.

.. GENERATED FROM PYTHON SOURCE LINES 15-28

.. code-block:: Python


    import time

    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn.datasets import make_regression
    from sklearn.linear_model import RidgeCV
    from sklearn.metrics import mean_squared_error
    from sklearn.model_selection import KFold, train_test_split

    from skmatter.linear_model import Ridge2FoldCV


.. GENERATED FROM PYTHON SOURCE LINES 29-33

.. code-block:: Python


    SEED = 12616
    N_REPEAT_MICRO_BENCH = 5


.. GENERATED FROM PYTHON SOURCE LINES 34-87

Numerical instabilities of sklearn leave-one-out CV
---------------------------------------------------

In linear regression, the complexity of computing the weight matrix is
theoretically bounded by the inversion of the covariance matrix.  This is
more costly when conducting regularized regression, wherein we need to
optimise the regularization parameter in a cross-validation (CV) scheme,
thereby recomputing the inverse for each parameter.  scikit-learn offers an
efficient leave-one-out CV (LOO CV) for its ridge regression which avoids
these repeated computations [loocv]_. Because we needed an efficient ridge that works
in predicting  for the reconstruction measures in :py:mod:`skmatter.metrics`
we implemented with :class:`skmatter.linear_model.Ridge2FoldCV` an
efficient 2-fold CV ridge regression that uses a singular value decomposition
(SVD) to reuse it for all regularization parameters :math:`\lambda`. Assuming
we have the standard regression problem optimizing the weight matrix in

.. math::

     \begin{align}
        \|\mathbf{X}\mathbf{W} - \mathbf{Y}\|
     \end{align}

Here :math:`\mathbf{Y}` can be seen also a matrix as it is in the case of
multi target learning. Then in 2-fold cross validation we would predict first
the targets of fold 2 using fold 1 to estimate the weight matrix and vice
versa. Using SVD the scheme estimation on fold 1 looks like this.

.. math::

     \begin{align}
         &\mathbf{X}_1 = \mathbf{U}_1\mathbf{S}_1\mathbf{V}_1^T,
               \qquad\qquad\qquad\quad
               \textrm{feature matrix }\mathbf{X}\textrm{ for fold 1} \\
         &\mathbf{W}_1(\lambda) = \mathbf{V}_1
                \tilde{\mathbf{S}}_1(\lambda)^{-1} \mathbf{U}_1^T \mathbf{Y}_1,
                \qquad
                \textrm{weight matrix fitted on fold 1}\\
         &\tilde{\mathbf{Y}}_2 = \mathbf{X}_2 \mathbf{W}_1,
                \qquad\qquad\qquad\qquad
                \textrm{ prediction of }\mathbf{Y}\textrm{ for fold 2}
     \end{align}

The efficient 2-fold scheme in `Ridge2FoldCV` reuses the matrices

.. math::

     \begin{align}
         &\mathbf{A}_1 = \mathbf{X}_2 \mathbf{V}_1, \quad
          \mathbf{B}_1 = \mathbf{U}_1^T \mathbf{Y}_1.
     \end{align}

for each fold to not recompute the SVD. The computational complexity
after the initial SVD is thereby reduced to that of matrix multiplications.

.. GENERATED FROM PYTHON SOURCE LINES 90-91

We first create an artificial dataset

.. GENERATED FROM PYTHON SOURCE LINES 91-100

.. code-block:: Python


    X, y = make_regression(
        n_samples=1000,
        n_features=400,
        random_state=SEED,
    )


.. GENERATED FROM PYTHON SOURCE LINES 101-128

.. code-block:: Python


    # regularization parameters
    alphas = np.geomspace(1e-12, 1e-1, 12)

    # 2 folds for train and validation split
    cv = KFold(n_splits=2, shuffle=True, random_state=SEED)

    skmatter_ridge_2foldcv_cutoff = Ridge2FoldCV(
        alphas=alphas, regularization_method="cutoff", cv=cv
    )

    skmatter_ridge_2foldcv_tikhonov = Ridge2FoldCV(
        alphas=alphas, regularization_method="tikhonov", cv=cv
    )

    sklearn_ridge_2foldcv_tikhonov = RidgeCV(
        alphas=alphas,
        cv=cv,
        fit_intercept=False,  # remove the incluence of learning bias
    )

    sklearn_ridge_loocv_tikhonov = RidgeCV(
        alphas=alphas,
        cv=None,
        fit_intercept=False,  # remove the incluence of learning bias
    )


.. GENERATED FROM PYTHON SOURCE LINES 129-130

Now we do simple benchmarks

.. GENERATED FROM PYTHON SOURCE LINES 130-167

.. code-block:: Python


    def micro_bench(ridge):
        """A small benchmark function."""
        global N_REPEAT_MICRO_BENCH, X, y
        timings = []
        train_mse = []
        test_mse = []
        for _ in range(N_REPEAT_MICRO_BENCH):
            X_train, X_test, y_train, y_test = train_test_split(
                X, y, train_size=0.5, random_state=SEED
            )
            start = time.time()
            ridge.fit(X_train, y_train)
            end = time.time()
            timings.append(end - start)
            train_mse.append(mean_squared_error(y_train, ridge.predict(X_train)))
            test_mse.append(mean_squared_error(y_test, ridge.predict(X_test)))

        print(f"  Time: {np.mean(timings)}s")
        print(f"  Train MSE: {np.mean(train_mse)}")
        print(f"  Test MSE: {np.mean(test_mse)}")


    print("skmatter 2-fold CV cutoff")
    micro_bench(skmatter_ridge_2foldcv_cutoff)
    print()
    print("skmatter 2-fold CV tikhonov")
    micro_bench(skmatter_ridge_2foldcv_tikhonov)
    print()
    print("sklearn 2-fold CV tikhonov")
    micro_bench(sklearn_ridge_2foldcv_tikhonov)
    print()
    print("sklearn leave-one-out CV")
    micro_bench(sklearn_ridge_loocv_tikhonov)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    skmatter 2-fold CV cutoff
      Time: 0.08593254089355469s
      Train MSE: 1.4179323700682082e-25
      Test MSE: 3.274563400398066e-25

    skmatter 2-fold CV tikhonov
      Time: 0.08517122268676758s
      Train MSE: 0.006720047338465212
      Test MSE: 0.22910125130751807

    sklearn 2-fold CV tikhonov
      Time: 0.2497239112854004s
      Train MSE: 0.006720047338441486
      Test MSE: 0.22910125130669332

    sklearn leave-one-out CV
      Time: 0.0592799186706543s
      Train MSE: 62762.521984744046
      Test MSE: 39274.657904825996


.. GENERATED FROM PYTHON SOURCE LINES 168-171

We can see that leave-one-out CV is completely off. Let us manually check
each regularization parameter individually and compare it with the store mean
squared errors (MSE).

.. GENERATED FROM PYTHON SOURCE LINES 171-223

.. code-block:: Python


    results = {}
    results["sklearn 2-fold CV Tikhonov"] = {"MSE train": [], "MSE test": []}
    results["sklearn LOO CV Tikhonov"] = {"MSE train": [], "MSE test": []}

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, train_size=0.5, random_state=SEED
    )


    def get_train_test_error(estimator):
        """The train tets error based on the estimator."""
        global X_train, y_train, X_test, y_test
        estimator = estimator.fit(X_train, y_train)
        return (
            mean_squared_error(y_train, estimator.predict(X_train)),
            mean_squared_error(y_test, estimator.predict(X_test)),
        )


    for i in range(len(alphas)):
        print(f"Computing step={i} using alpha={alphas[i]}")

        train_error, test_error = get_train_test_error(RidgeCV(alphas=[alphas[i]], cv=2))
        results["sklearn 2-fold CV Tikhonov"]["MSE train"].append(train_error)
        results["sklearn 2-fold CV Tikhonov"]["MSE test"].append(test_error)
        train_error, test_error = get_train_test_error(RidgeCV(alphas=[alphas[i]], cv=None))

        results["sklearn LOO CV Tikhonov"]["MSE train"].append(train_error)
        results["sklearn LOO CV Tikhonov"]["MSE test"].append(test_error)


    # returns array of errors, one error per fold/sample
    # ndarray of shape (n_samples, n_alphas)
    loocv_cv_train_error = (
        RidgeCV(
            alphas=alphas,
            cv=None,
            store_cv_results=True,
            scoring=None,  # uses by default mean squared error
            fit_intercept=False,
        )
        .fit(X_train, y_train)
        .cv_results_
    )

    results["sklearn LOO CV Tikhonov"]["MSE validation"] = np.mean(
        loocv_cv_train_error, axis=0
    ).tolist()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Computing step=0 using alpha=1e-12
    Computing step=1 using alpha=1e-11
    Computing step=2 using alpha=1e-10
    Computing step=3 using alpha=1e-09
    Computing step=4 using alpha=1e-08
    Computing step=5 using alpha=1e-07
    Computing step=6 using alpha=1e-06
    Computing step=7 using alpha=1e-05
    Computing step=8 using alpha=0.0001
    Computing step=9 using alpha=0.001
    Computing step=10 using alpha=0.01
    Computing step=11 using alpha=0.1


.. GENERATED FROM PYTHON SOURCE LINES 224-263

.. code-block:: Python


    # We plot all the results.
    plt.figure(figsize=(12, 8))
    for i, items in enumerate(results.items()):
        method_name, errors = items

        plt.loglog(
            alphas,
            errors["MSE test"],
            label=f"{method_name} MSE test",
            color=f"C{i}",
            lw=3,
            alpha=0.9,
        )
        plt.loglog(
            alphas,
            errors["MSE train"],
            label=f"{method_name} MSE train",
            color=f"C{i}",
            lw=4,
            alpha=0.9,
            linestyle="--",
        )
        if "MSE validation" in errors.keys():
            plt.loglog(
                alphas,
                errors["MSE validation"],
                label=f"{method_name} MSE validation",
                color=f"C{i}",
                linestyle="dotted",
                lw=5,
            )
    plt.ylim(1e-16, 1)
    plt.xlabel("alphas (regularization parameter)")
    plt.ylabel("MSE")

    plt.legend()
    plt.show()


.. image-sg:: /examples/regression/images/sphx_glr_Ridge2FoldCVRegularization_001.png
   :alt: Ridge2FoldCVRegularization
   :srcset: /examples/regression/images/sphx_glr_Ridge2FoldCVRegularization_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 264-268

We can see that Leave-one-out CV is estimating the error wrong for low
alpha values. That seems to be a numerical instability of the method. If we
would have limit our alphas to 1E-5, then LOO CV would have reach similar
accuracies as the 2-fold method.

.. GENERATED FROM PYTHON SOURCE LINES 270-274

**Important** to note that this is not an fully encompasing comparison
covering sufficient enough the parameter space. We just want to note that in
cases with high feature size and low effective rank the ridge solvers in
skmatter can be numerical more stable and act on a comparable speed.

.. GENERATED FROM PYTHON SOURCE LINES 276-286

Cutoff and Tikhonov regularization
----------------------------------
When using a hard threshold as regularization (using parameter ``cutoff``),
the singular values below :math:`\lambda` are cut off, the size of the
matrices :math:`\mathbf{A}_1` and :math:`\mathbf{B}_1` can then be reduced,
resulting in further computation time savings.  This performance advantage of
``cutoff`` over the ``tikhonov`` is visible if we to predict multiple targets
and use a regularization range that cuts off a lot of singular values. For
that we increase the feature size and use as regression task the prediction
of a shuffled version of :math:`\mathbf{X}`.

.. GENERATED FROM PYTHON SOURCE LINES 286-303

.. code-block:: Python


    X, y = make_regression(
        n_samples=1000,
        n_features=400,
        n_informative=400,
        effective_rank=5,  # decreasiing effective rank
        tail_strength=1e-9,
        random_state=SEED,
    )

    idx = np.arange(X.shape[1])
    np.random.seed(SEED)
    np.random.shuffle(idx)
    y = X.copy()[:, idx]

    singular_values = np.linalg.svd(X, full_matrices=False)[1]


.. GENERATED FROM PYTHON SOURCE LINES 304-312

.. code-block:: Python


    plt.loglog(singular_values)
    plt.title("Singular values of our feature matrix X")
    plt.axhline(1e-8, color="gray")
    plt.xlabel("index feature")
    plt.ylabel("singular value")
    plt.show()


.. image-sg:: /examples/regression/images/sphx_glr_Ridge2FoldCVRegularization_002.png
   :alt: Singular values of our feature matrix X
   :srcset: /examples/regression/images/sphx_glr_Ridge2FoldCVRegularization_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 313-316

We can see that a regularization value of 1e-8 cuts off a lot of singular
values. This is crucial for the computational speed up of the ``cutoff``
regularization method

.. GENERATED FROM PYTHON SOURCE LINES 318-353

.. code-block:: Python


    # we use a denser range of regularization parameters to make
    # the speed up more visible
    alphas = np.geomspace(1e-8, 1e-1, 20)

    cv = KFold(n_splits=2, shuffle=True, random_state=SEED)

    skmatter_ridge_2foldcv_cutoff = Ridge2FoldCV(
        alphas=alphas,
        regularization_method="cutoff",
        cv=cv,
    )

    skmatter_ridge_2foldcv_tikhonov = Ridge2FoldCV(
        alphas=alphas,
        regularization_method="tikhonov",
        cv=cv,
    )

    sklearn_ridge_loocv_tikhonov = RidgeCV(
        alphas=alphas,
        cv=None,
        fit_intercept=False,  # remove the incluence of learning bias
    )

    print("skmatter 2-fold CV cutoff")
    micro_bench(skmatter_ridge_2foldcv_cutoff)
    print()
    print("skmatter 2-fold CV tikhonov")
    micro_bench(skmatter_ridge_2foldcv_tikhonov)
    print()
    print("sklearn LOO CV tikhonov")
    micro_bench(sklearn_ridge_loocv_tikhonov)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    skmatter 2-fold CV cutoff
      Time: 0.09898738861083985s
      Train MSE: 8.199211435485744e-23
      Test MSE: 8.313680615089296e-23

    skmatter 2-fold CV tikhonov
      Time: 0.1453549385070801s
      Train MSE: 2.1125597863649234e-14
      Test MSE: 2.5492748573047085e-14

    sklearn LOO CV tikhonov
      Time: 0.2146282196044922s
      Train MSE: 2.1125614894762703e-14
      Test MSE: 2.549280153596718e-14


.. GENERATED FROM PYTHON SOURCE LINES 354-358

We also want to note that these benchmarks have huge deviations per run and
that more robust benchmarking methods would be adequate for this situation.
However, we cannot do this here as we try to keep the computation of these
examples as minimal as possible.

.. GENERATED FROM PYTHON SOURCE LINES 360-364

References
----------
.. [loocv] Rifkin "Regularized Least Squares."
    https://www.mit.edu/~9.520/spring07/Classes/rlsslides.pdf


.. _sphx_glr_download_examples_regression_Ridge2FoldCVRegularization.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: Ridge2FoldCVRegularization.ipynb <Ridge2FoldCVRegularization.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: Ridge2FoldCVRegularization.py <Ridge2FoldCVRegularization.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: Ridge2FoldCVRegularization.zip <Ridge2FoldCVRegularization.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_