.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/selection/FeatureSelection.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_selection_FeatureSelection.py: PCovR-Inspired Feature Selection ================================ .. GENERATED FROM PYTHON SOURCE LINES 10-23 .. code-block:: Python import numpy as np from matplotlib import cm from matplotlib import pyplot as plt from sklearn.linear_model import RidgeCV from sklearn.preprocessing import StandardScaler from skmatter.datasets import load_csd_1000r from skmatter.feature_selection import CUR, FPS, PCovCUR, PCovFPS from skmatter.preprocessing import StandardFlexibleScaler cmap = cm.brg .. GENERATED FROM PYTHON SOURCE LINES 24-25 For this, we will use the provided CSD dataset, which has 100 features to select from. .. GENERATED FROM PYTHON SOURCE LINES 26-32 .. code-block:: Python X, y = load_csd_1000r(return_X_y=True) X = StandardFlexibleScaler(column_wise=False).fit_transform(X) y = StandardScaler().fit_transform(y.reshape(X.shape[0], -1)) .. GENERATED FROM PYTHON SOURCE LINES 34-38 .. code-block:: Python n = X.shape[-1] // 2 lr = RidgeCV(cv=2, alphas=np.logspace(-10, 1), fit_intercept=False) .. GENERATED FROM PYTHON SOURCE LINES 39-44 Feature Selection with CUR + PCovR ---------------------------------- First, let's demonstrate CUR feature selection, and show the ten features chosen with a mixing parameter of 0.0, 0.5, and 1.0 perform. .. GENERATED FROM PYTHON SOURCE LINES 45-70 .. code-block:: Python for m in np.arange(0, 1.01, 0.5, dtype=np.float32): if m < 1.0: idx = PCovCUR(mixing=m, n_to_select=n).fit(X, y).selected_idx_ else: idx = CUR(n_to_select=n).fit(X, y).selected_idx_ plt.loglog( range(1, n + 1), np.array( [ lr.fit(X[:, idx[: ni + 1]], y).score(X[:, idx[: ni + 1]], y) for ni in range(n) ] ), label=m, c=cmap(m), marker="o", ) plt.xlabel("Number of Features Selected") plt.ylabel(r"$R^2$") plt.legend(title="Mixing \nParameter") plt.show() .. image-sg:: /examples/selection/images/sphx_glr_FeatureSelection_001.png :alt: FeatureSelection :srcset: /examples/selection/images/sphx_glr_FeatureSelection_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 71-78 Non-iterative feature selection with CUR + PCovR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Computing a non-iterative CUR is more efficient, although can result in poorer performance for larger datasets. you can also use a greater number of eigenvectors to compute the feature importance by varying ``k``, but ``k`` should not exceed the number of targets, for optimal results. .. GENERATED FROM PYTHON SOURCE LINES 79-113 .. code-block:: Python m = 0.0 idx = PCovCUR(mixing=m, n_to_select=n).fit(X, y).selected_idx_ idx_non_it = PCovCUR(mixing=m, recompute_every=0, n_to_select=n).fit(X, y).selected_idx_ plt.loglog( range(1, n + 1), np.array( [ lr.fit(X[:, idx[: ni + 1]], y).score(X[:, idx[: ni + 1]], y) for ni in range(n) ] ), label="Iterative", marker="o", ) plt.loglog( range(1, n + 1), np.array( [ lr.fit(X[:, idx_non_it[: ni + 1]], y).score(X[:, idx_non_it[: ni + 1]], y) for ni in range(n) ] ), label="Non-Iterative", marker="s", ) plt.xlabel("Number of Features Selected") plt.ylabel(r"$R^2$") plt.legend() plt.show() .. image-sg:: /examples/selection/images/sphx_glr_FeatureSelection_002.png :alt: FeatureSelection :srcset: /examples/selection/images/sphx_glr_FeatureSelection_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 114-118 Feature Selection with FPS + PCovR ---------------------------------- Next, let's look at FPS. We'll choose the first index from CUR at m = 0, which is 46. .. GENERATED FROM PYTHON SOURCE LINES 119-144 .. code-block:: Python for m in np.arange(0, 1.01, 0.5, dtype=np.float32): if m < 1.0: idx = PCovFPS(mixing=m, n_to_select=n, initialize=46).fit(X, y).selected_idx_ else: idx = FPS(n_to_select=n, initialize=46).fit(X, y).selected_idx_ plt.loglog( range(1, n + 1), np.array( [ lr.fit(X[:, idx[: ni + 1]], y).score(X[:, idx[: ni + 1]], y) for ni in range(n) ] ), label=m, c=cmap(m), marker="o", ) plt.xlabel("Number of Features Selected") plt.ylabel(r"$R^2$") plt.legend(title="Mixing \nParameter") plt.show() .. image-sg:: /examples/selection/images/sphx_glr_FeatureSelection_003.png :alt: FeatureSelection :srcset: /examples/selection/images/sphx_glr_FeatureSelection_003.png :class: sphx-glr-single-img .. _sphx_glr_download_examples_selection_FeatureSelection.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: FeatureSelection.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: FeatureSelection.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: FeatureSelection.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_