.. _selection-api:

Feature and Sample Selection
============================

.. automodule:: skmatter._selection

.. _CUR-api:

CUR
---

CUR decomposition begins by approximating a matrix :math:`{\mathbf{X}}` using a subset
of columns and rows

.. math::
    \mathbf{\hat{X}} \approx \mathbf{X}_\mathbf{c} \left(\mathbf{X}_\mathbf{c}^-
    \mathbf{X} \mathbf{X}_\mathbf{r}^-\right) \mathbf{X}_\mathbf{r}.

These subsets of rows and columns, denoted :math:`\mathbf{X}_\mathbf{r}` and
:math:`\mathbf{X}_\mathbf{c}`, respectively, can be determined by iterative maximization
of a leverage score :math:`\pi`, representative of the relative importance of each
column or row. From hereon, we will call selection methods which are derived off of the
CUR decomposition "CUR" as a shorthand for "CUR-derived selection". In each iteration of
CUR, we select the column or row that maximizes :math:`\pi` and orthogonalize the
remaining columns or rows. These steps are iterated until a sufficient number of
features has been selected. This iterative approach, albeit comparatively time
consuming, is the most deterministic and efficient route in reducing the number of
features needed to approximate :math:`\mathbf{X}` when compared to selecting all
features in a single iteration based upon the relative :math:`\pi` importance.

The feature and sample selection versions of CUR differ only in the computation of
:math:`\pi`. In sample selection :math:`\pi` is computed using the left singular
vectors, versus in feature selection, :math:`\pi` is computed using the right singular
vectors.

.. autoclass:: skmatter.feature_selection.CUR
   :members:
   :private-members: _compute_pi
   :undoc-members:
   :inherited-members:

.. autoclass:: skmatter.sample_selection.CUR
   :members:
   :private-members: _compute_pi
   :undoc-members:
   :inherited-members:

.. _PCov-CUR-api:

PCov-CUR
--------

PCov-CUR extends upon CUR by using augmented right or left singular vectors inspired by
Principal Covariates Regression, as demonstrated in [Cersonsky2021]_. These methods
employ the modified kernel and covariance matrices introduced in :ref:`PCovR-api` and
available via the Utility Classes.

Again, the feature and sample selection versions of PCov-CUR differ only in the
computation of :math:`\pi`. S

.. autoclass:: skmatter.feature_selection.PCovCUR
   :members:
   :private-members: _compute_pi
   :undoc-members:
   :inherited-members:

.. autoclass:: skmatter.sample_selection.PCovCUR
   :members:
   :private-members: _compute_pi
   :undoc-members:
   :inherited-members:

.. _FPS-api:

Farthest Point-Sampling (FPS)
-----------------------------

Farthest Point Sampling is a common selection technique intended to exploit the
diversity of the input space.

In FPS, the selection of the first point is made at random or by a separate metric. Each
subsequent selection is made to maximize the Hausdorf distance, i.e. the minimum
distance between a point and all previous selections. It is common to use the Euclidean
distance, however other distance metrics may be employed.

Similar to CUR, the feature and selection versions of FPS differ only in the way
distance is computed (feature selection does so column-wise, sample selection does so
row-wise), and are built off of the same base class,

These selectors can be instantiated using :py:class:`skmatter.feature_selection.FPS` and
:py:class:`skmatter.sample_selection.FPS`.

.. autoclass:: skmatter.feature_selection.FPS
   :members:
   :undoc-members:
   :inherited-members:

.. autoclass:: skmatter.sample_selection.FPS
   :members:
   :undoc-members:
   :inherited-members:

.. _PCov-FPS-api:

PCov-FPS
--------

PCov-FPS extends upon FPS much like PCov-CUR does to CUR. Instead of using the Euclidean
distance solely in the space of :math:`\mathbf{X}`, we use a combined distance in terms
of :math:`\mathbf{X}` and :math:`\mathbf{y}`.

.. autoclass:: skmatter.feature_selection.PCovFPS
   :members:
   :undoc-members:
   :inherited-members:

.. autoclass:: skmatter.sample_selection.PCovFPS
   :members:
   :undoc-members:
   :inherited-members:

.. _Voronoi-FPS-api:

Voronoi FPS
-----------

.. autoclass:: skmatter.sample_selection.VoronoiFPS
   :members:
   :undoc-members:
   :inherited-members:


When *Not* to Use Voronoi FPS
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In many cases, this algorithm may not increase upon the efficiency. For example, for
simple metrics (such as Euclidean distance), Voronoi FPS will likely not accelerate, and
may decelerate, computations when compared to FPS. The sweet spot for Voronoi FPS is
when the number of selectable samples is already enough to divide the space with Voronoi
polyhedrons, but not yet comparable to the total number of samples, when the cost of
bookkeeping significantly degrades the speed of work compared to FPS.

.. _DCH-api:

Directional Convex Hull (DCH)
-----------------------------

.. autoclass:: skmatter.sample_selection.DirectionalConvexHull
   :members:
   :undoc-members:
   :inherited-members: