.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/pcovr/PCovR_Scaling.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_pcovr_PCovR_Scaling.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_pcovr_PCovR_Scaling.py:


The Importance of Data Scaling in PCovR / KernelPCovR
=====================================================

.. GENERATED FROM PYTHON SOURCE LINES 9-18

.. code-block:: Python


    import numpy as np
    from matplotlib import pyplot as plt
    from sklearn.datasets import load_diabetes
    from sklearn.preprocessing import StandardScaler

    from skmatter.decomposition import PCovR


.. GENERATED FROM PYTHON SOURCE LINES 19-23

In PCovR, and KernelPCovR, we are combining multiple aspects of the dataset, primarily
the features and targets. As such, the results largely depend on the relative
contributions of each aspect to the
mixed model.

.. GENERATED FROM PYTHON SOURCE LINES 24-27

.. code-block:: Python


    X, y = load_diabetes(return_X_y=True)


.. GENERATED FROM PYTHON SOURCE LINES 28-30

Take the diabetes dataset from sklearn. In their raw form, the magnitudes of the
features and targets are

.. GENERATED FROM PYTHON SOURCE LINES 31-37

.. code-block:: Python


    print(
        "Norm of the features: %0.2f \nNorm of the targets: %0.2f"
        % (np.linalg.norm(X), np.linalg.norm(y))
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Norm of the features: 3.16 
    Norm of the targets: 3584.82


.. GENERATED FROM PYTHON SOURCE LINES 38-40

For the California dataset, we can use the `StandardScaler` class from sklearn,
as the features and targets are independent.

.. GENERATED FROM PYTHON SOURCE LINES 41-48

.. code-block:: Python


    x_scaler = StandardScaler()
    y_scaler = StandardScaler()

    X_scaled = x_scaler.fit_transform(X)
    y_scaled = y_scaler.fit_transform(y.reshape(-1, 1))


.. GENERATED FROM PYTHON SOURCE LINES 49-51

Looking at the results at ``mixing=0.5``, we see an especially large difference in the
latent-space projections

.. GENERATED FROM PYTHON SOURCE LINES 52-90

.. code-block:: Python


    pcovr_unscaled = PCovR(mixing=0.5, n_components=4).fit(X, y)
    T_unscaled = pcovr_unscaled.transform(X)
    Yp_unscaled = pcovr_unscaled.predict(X)

    pcovr_scaled = PCovR(mixing=0.5, n_components=4).fit(X_scaled, y_scaled)
    T_scaled = pcovr_scaled.transform(X_scaled)
    Yp_scaled = y_scaler.inverse_transform(pcovr_scaled.predict(X_scaled))

    fig, ((ax1_T, ax2_T), (ax1_Y, ax2_Y)) = plt.subplots(2, 2, figsize=(8, 10))

    ax1_T.scatter(T_unscaled[:, 0], T_unscaled[:, 1], c=y, cmap="plasma", ec="k")
    ax1_T.set_xlabel("PCov1")
    ax1_T.set_ylabel("PCov2")
    ax1_T.set_title("Latent Projection\nWithout Scaling")

    ax2_T.scatter(T_scaled[:, 0], T_scaled[:, 1], c=y, cmap="plasma", ec="k")
    ax2_T.set_xlabel("PCov1")
    ax2_T.set_ylabel("PCov2")
    ax2_T.set_title("Latent Projection\nWith Scaling")

    ax1_Y.scatter(Yp_unscaled, y, c=np.abs(y - Yp_unscaled), cmap="bone_r", ec="k")
    ax1_Y.plot(ax1_Y.get_xlim(), ax1_Y.get_xlim(), "r--")
    ax1_Y.set_xlabel("True Y, unscaled")
    ax1_Y.set_ylabel("Predicted Y, unscaled")
    ax1_Y.set_title("Regression\nWithout Scaling")

    ax2_Y.scatter(
        Yp_scaled, y, c=np.abs(y.ravel() - Yp_scaled.ravel()), cmap="bone_r", ec="k"
    )
    ax2_Y.plot(ax2_Y.get_xlim(), ax2_Y.get_xlim(), "r--")
    ax2_Y.set_xlabel("True Y, unscaled")
    ax2_Y.set_ylabel("Predicted Y, unscaled")
    ax2_Y.set_title("Regression\nWith Scaling")

    fig.subplots_adjust(hspace=0.5, wspace=0.3)


.. image-sg:: /examples/pcovr/images/sphx_glr_PCovR_Scaling_001.png
   :alt: Latent Projection Without Scaling, Latent Projection With Scaling, Regression Without Scaling, Regression With Scaling
   :srcset: /examples/pcovr/images/sphx_glr_PCovR_Scaling_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 91-100

Also, we see that when the datasets are unscaled, the total loss (loss in recreating
the original dataset and regression loss) does not vary with ``mixing``, as expected.
Typically, the regression loss should _gradually_ increase with ``mixing``
(and vice-versa for the loss in reconstructing the original features). When the
inputs are not scaled, however, only in the case of ``mixing`` = 0 or 1 will the
losses drastically change, depending on which component is dominating the model.
Here, because the features dominate the model, this jump occurs as ``mixing`` goes to
0. With the scaled inputs, there is still a jump when ``mixing>0`` due to the change
in matrix rank.

.. GENERATED FROM PYTHON SOURCE LINES 101-146

.. code-block:: Python


    mixings = np.linspace(0, 1, 21)
    losses_unscaled = np.zeros((2, len(mixings)))
    losses_scaled = np.zeros((2, len(mixings)))

    nc = 4

    for mi, mixing in enumerate(mixings):
        pcovr_unscaled = PCovR(mixing=mixing, n_components=nc).fit(X, y)
        t_unscaled = pcovr_unscaled.transform(X)
        yp_unscaled = pcovr_unscaled.predict(T=t_unscaled)
        xr_unscaled = pcovr_unscaled.inverse_transform(t_unscaled)
        losses_unscaled[:, mi] = (
            np.linalg.norm(xr_unscaled - X) ** 2.0 / np.linalg.norm(X) ** 2,
            np.linalg.norm(yp_unscaled - y) ** 2.0 / np.linalg.norm(y) ** 2,
        )

        pcovr_scaled = PCovR(mixing=mixing, n_components=nc).fit(X_scaled, y_scaled)
        t_scaled = pcovr_scaled.transform(X_scaled)
        yp_scaled = pcovr_scaled.predict(T=t_scaled)
        xr_scaled = pcovr_scaled.inverse_transform(t_scaled)
        losses_scaled[:, mi] = (
            np.linalg.norm(xr_scaled - X_scaled) ** 2.0 / np.linalg.norm(X_scaled) ** 2,
            np.linalg.norm(yp_scaled - y_scaled) ** 2.0 / np.linalg.norm(y_scaled) ** 2,
        )

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 4), sharey=True, sharex=True)
    ax1.plot(mixings, losses_unscaled[0], marker="o", label=r"$\ell_{X}$")
    ax1.plot(mixings, losses_unscaled[1], marker="o", label=r"$\ell_{Y}$")
    ax1.plot(mixings, np.sum(losses_unscaled, axis=0), marker="o", label=r"$\ell$")
    ax1.legend(fontsize=12)
    ax1.set_title("With Inputs Unscaled")
    ax1.set_xlabel(r"Mixing parameter $\alpha$")
    ax1.set_ylabel(r"Loss $\ell$")

    ax2.plot(mixings, losses_scaled[0], marker="o", label=r"$\ell_{X}$")
    ax2.plot(mixings, losses_scaled[1], marker="o", label=r"$\ell_{Y}$")
    ax2.plot(mixings, np.sum(losses_scaled, axis=0), marker="o", label=r"$\ell$")
    ax2.legend(fontsize=12)
    ax2.set_title("With Inputs Scaled")
    ax2.set_xlabel(r"Mixing parameter $\alpha$")
    ax2.set_ylabel(r"Loss $\ell$")

    fig.show()


.. image-sg:: /examples/pcovr/images/sphx_glr_PCovR_Scaling_002.png
   :alt: With Inputs Unscaled, With Inputs Scaled
   :srcset: /examples/pcovr/images/sphx_glr_PCovR_Scaling_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 147-150

**Note**: When the relative magnitude of the features or targets is important, such
as in :func:`skmatter.datasets.load_csd_1000r`, one should use the
:class:`skmatter.preprocessing.StandardFlexibleScaler`.


.. _sphx_glr_download_examples_pcovr_PCovR_Scaling.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: PCovR_Scaling.ipynb <PCovR_Scaling.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: PCovR_Scaling.py <PCovR_Scaling.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: PCovR_Scaling.zip <PCovR_Scaling.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_