What’s next

Where the combss package is headed

The current combss packages cover linear, logistic, and multinomial regression. The underlying Boolean-relaxation idea applies far more broadly — wherever the inner problem is tractable, the same homotopy Frank-Wolfe machinery does best-subset selection on top of it. Several extensions are already published; the next milestones are to ship them in the package. Below are seven directions where the combss package is headed.

More GLM families

The three families already in the package — Gaussian, binomial, multinomial — are the most commonly used GLMs, but the underlying recipe fits any GLM whose inner ridge problem can be solved efficiently. Natural next additions:

Poisson — count data (event counts, sequencing reads)
Negative binomial — over-dispersed counts
Gamma — positive continuous outcomes (insurance losses, reaction times)
etc…..

The plan is to expose a generic combss.general(...) entry point so that any glmnet-style family becomes a COMBSS-supported family without per-family glue code.

Group selection (group COMBSS)

In many real problems the predictors come in natural groups rather than as independent atoms — SNPs within a gene, basis functions of a smooth term, dummy-encoded levels of a categorical variable, sensors on the same device. The right question is then “pick the best $k$ groups,” not the best $k$ individual coefficients.

The Boolean relaxation transfers almost without change: replace the per-feature auxiliary $\boldsymbol{t} \in [0, 1]^p$ with a per-group auxiliary $\boldsymbol{t} \in [0, 1]^J$ (one $t_j$ per group). The inner ridge becomes a group-ridge, and the envelope gradient becomes the squared $\ell_2$ norm of the group block:

\[ [\nabla f_\delta(\boldsymbol{t})]_j \;=\; -\,\frac{2\delta\,\|\hat{\boldsymbol{\beta}}_{G_j}(\boldsymbol{t})\|_2^2}{t_j}. \]

Group COMBSS for linear regression was developed by Mathur, Moka, Liquet, Botev (WSC 2024). The next release will integrate it into the package as a groups = ... argument alongside the GLM families.

Cox proportional hazards (survival)

Survival analysis is the natural next destination: time-to-event with right-censoring, ubiquitous in clinical research. The Cox model differs from a standard GLM in two ways:

The baseline hazard $h_0(t)$ is left unspecified — there is no intercept to fit.
The inner objective is the partial likelihood, not a sum of independent log-likelihoods.

Both fit the COMBSS template. After the same reparameterisation, the inner problem is a weighted ridge-penalised Cox regression, solvable by glmnet(family = "cox"). The envelope gradient has the same closed form as in the GLM case, and the homotopy schedule transfers unchanged. A first implementation for the package is in development.

Sparse portfolio selection (QP)

Picking the best $k$ assets to build a minimum-variance portfolio is, structurally, a sparse quadratic program — and the same continuous relaxation applies. The objective is $\boldsymbol{\beta}^\top \Sigma \boldsymbol{\beta}$ with the budget constraint $\mathbf{1}^\top \boldsymbol{\beta} = 1$, and the inner solve has a closed form.

The companion paper (Moka, Quiroz, Asimit, Muller, arXiv 2025) derives the concavity threshold in closed form via the eigenvalues of $\Sigma$ and shows the homotopy outperforms standard Big-$M$ MIO at fixed compute budget on real S&P 500 data. A portfolio-specific package is under construction.

Offline change-point detection (CPD)

Change-point detection asks where a sequence abruptly switches its generative parameters — a sensor stream’s mean jumps, a genomic copy-number profile shifts, an econometric regime breaks. The classical normal-mean multiple-change-point problem has a clean reformulation as a best-subset selection problem in linear regression: candidate change-point locations become the features, and selecting a small subset is equivalent to identifying a handful of change points.

Once the problem is in that form, COMBSS plugs in directly — pick $k$ to control how many change points you want, run the homotopy, and return their locations and fitted segment means. Reimann, Moka, Sofronov (WSC 2024) develop this idea and demonstrate competitive accuracy with classical change-point methods at substantially lower wall-clock cost. A CPD-tailored wrapper around the package is planned.

Column subset selection (CSSP) and Nyström approximation

The COMBSS idea is not limited to supervised regression. Column subset selection asks for the best $k$ columns of a data matrix $\mathbf{X} \in \mathbb{R}^{m \times n}$ whose span best approximates the column space of $\mathbf{X}$:

\[ \arg\min_{\boldsymbol{s} \in \{0, 1\}^n}\; \|\mathbf{X} - \mathbf{P}_{\boldsymbol{s}} \mathbf{X}\|_F^2 \quad \text{subject to } \|\boldsymbol{s}\|_0 \le k, \]

where $\mathbf{P}_{\boldsymbol{s}}$ is the projection onto the columns selected by $\boldsymbol{s}$. Mathur, Moka, Botev (2023) replace the binary $\boldsymbol{s}$ with a continuous $\boldsymbol{t} \in [0, 1]^n$ via a smooth projection function $\widetilde{\mathbf{P}}(\boldsymbol{t})$ and minimise the penalised loss by stochastic gradient descent — the gradient needs only matrix–vector products with $\mathbf{X}$, so it scales to large matrices.

The same machinery extends to the Nyström approximation for kernel matrices $\mathbf{K} \in \mathbb{R}^{n \times n}$, where the “columns” are landmark points and the gradient uses MVMs with $\mathbf{K}$. Package support would be an unsupervised mode added to combss alongside the regression families.

Sparse PCA and sparse PLS

Principal components analysis (PCA) and partial least squares (PLS) build new variables as linear combinations of all original predictors. In high dimensions this is interpretable only if those combinations are sparse — each component made from a small subset of features.

Liquet, Moka, Muller (2024) cast sparse PCA and sparse PLS as a best subset solution path problem: for each subset size $k = 1, 2, \ldots$, find the $k$ variables that best build the components. The continuous-optimisation machinery developed for COMBSS in regression transfers to this dimension-reduction setting, producing a full path of sparse component models rather than a single regularisation-driven solution. Adding this as a combss.pca(...) / combss.pls(...) interface is a natural next direction.

← Previous: Comparisons

Next: References →

--- title: "What's next" subtitle: "Where the `combss` package is headed" --- The current `combss` packages cover **linear, logistic, and multinomial** regression. The underlying Boolean-relaxation idea applies far more broadly — wherever the inner problem is tractable, the same homotopy Frank-Wolfe machinery does best-subset selection on top of it. Several extensions are already published; the next milestones are to ship them in the package. Below are seven directions where the `combss` package is headed. ## More GLM families The three families already in the package — Gaussian, binomial, multinomial — are the most commonly used GLMs, but the underlying recipe fits *any* GLM whose inner ridge problem can be solved efficiently. Natural next additions: - **Poisson** — count data (event counts, sequencing reads) - **Negative binomial** — over-dispersed counts - **Gamma** — positive continuous outcomes (insurance losses, reaction times) - **etc.....** The plan is to expose a generic `combss.general(...)` entry point so that any `glmnet`-style family becomes a COMBSS-supported family without per-family glue code. ## Group selection (group COMBSS) In many real problems the predictors come in **natural groups** rather than as independent atoms — SNPs within a gene, basis functions of a smooth term, dummy-encoded levels of a categorical variable, sensors on the same device. The right question is then *"pick the best $k$ groups,"* not the best $k$ individual coefficients. The Boolean relaxation transfers almost without change: replace the per-feature auxiliary $\boldsymbol{t} \in [0, 1]^p$ with a per-group auxiliary $\boldsymbol{t} \in [0, 1]^J$ (one $t_j$ per group). The inner ridge becomes a *group-ridge*, and the envelope gradient becomes the squared $\ell_2$ norm of the group block: $$ [\nabla f_\delta(\boldsymbol{t})]_j \;=\; -\,\frac{2\delta\,\|\hat{\boldsymbol{\beta}}_{G_j}(\boldsymbol{t})\|_2^2}{t_j}. $$ Group COMBSS for linear regression was developed by Mathur, Moka, Liquet, Botev (WSC 2024). The next release will integrate it into the package as a `groups = ...` argument alongside the GLM families. ## Cox proportional hazards (survival) Survival analysis is the natural next destination: time-to-event with right-censoring, ubiquitous in clinical research. The Cox model differs from a standard GLM in two ways: - The baseline hazard $h_0(t)$ is left unspecified — there is no intercept to fit. - The inner objective is the *partial likelihood*, not a sum of independent log-likelihoods. Both fit the COMBSS template. After the same reparameterisation, the inner problem is a **weighted ridge-penalised Cox regression**, solvable by `glmnet(family = "cox")`. The envelope gradient has the same closed form as in the GLM case, and the homotopy schedule transfers unchanged. A first implementation for the package is in development. ## Sparse portfolio selection (QP) Picking the best $k$ assets to build a minimum-variance portfolio is, structurally, a sparse quadratic program — and the same continuous relaxation applies. The objective is $\boldsymbol{\beta}^\top \Sigma \boldsymbol{\beta}$ with the budget constraint $\mathbf{1}^\top \boldsymbol{\beta} = 1$, and the inner solve has a closed form. The companion paper (Moka, Quiroz, Asimit, Muller, arXiv 2025) derives the concavity threshold in closed form via the eigenvalues of $\Sigma$ and shows the homotopy outperforms standard Big-$M$ MIO at fixed compute budget on real S&P 500 data. A portfolio-specific package is under construction. ## Offline change-point detection (CPD) Change-point detection asks where a sequence abruptly switches its generative parameters — a sensor stream's mean jumps, a genomic copy-number profile shifts, an econometric regime breaks. The classical normal-mean multiple-change-point problem has a clean reformulation as a best-subset selection problem in linear regression: candidate change-point *locations* become the features, and selecting a small subset is equivalent to identifying a handful of change points. Once the problem is in that form, COMBSS plugs in directly — pick $k$ to control how many change points you want, run the homotopy, and return their locations and fitted segment means. Reimann, Moka, Sofronov (WSC 2024) develop this idea and demonstrate competitive accuracy with classical change-point methods at substantially lower wall-clock cost. A CPD-tailored wrapper around the package is planned. ## Column subset selection (CSSP) and Nyström approximation The COMBSS idea is not limited to supervised regression. **Column subset selection** asks for the best $k$ columns of a data matrix $\mathbf{X} \in \mathbb{R}^{m \times n}$ whose span best approximates the column space of $\mathbf{X}$: $$ \arg\min_{\boldsymbol{s} \in \{0, 1\}^n}\; \|\mathbf{X} - \mathbf{P}_{\boldsymbol{s}} \mathbf{X}\|_F^2 \quad \text{subject to } \|\boldsymbol{s}\|_0 \le k, $$ where $\mathbf{P}_{\boldsymbol{s}}$ is the projection onto the columns selected by $\boldsymbol{s}$. Mathur, Moka, Botev (2023) replace the binary $\boldsymbol{s}$ with a continuous $\boldsymbol{t} \in [0, 1]^n$ via a smooth projection function $\widetilde{\mathbf{P}}(\boldsymbol{t})$ and minimise the penalised loss by stochastic gradient descent — the gradient needs only matrix–vector products with $\mathbf{X}$, so it scales to large matrices. The same machinery extends to the **Nyström approximation** for kernel matrices $\mathbf{K} \in \mathbb{R}^{n \times n}$, where the "columns" are landmark points and the gradient uses MVMs with $\mathbf{K}$. Package support would be an unsupervised mode added to `combss` alongside the regression families. ## Sparse PCA and sparse PLS Principal components analysis (PCA) and partial least squares (PLS) build new variables as linear combinations of *all* original predictors. In high dimensions this is interpretable only if those combinations are *sparse* — each component made from a small subset of features. Liquet, Moka, Muller (2024) cast sparse PCA and sparse PLS as a **best subset solution path** problem: for each subset size $k = 1, 2, \ldots$, find the $k$ variables that best build the components. The continuous-optimisation machinery developed for COMBSS in regression transfers to this dimension-reduction setting, producing a full path of sparse component models rather than a single regularisation-driven solution. Adding this as a `combss.pca(...)` / `combss.pls(...)` interface is a natural next direction. ::: {.page-nav} [← Previous: Comparisons](demos/05-comparisons.qmd) [Next: References →](references.qmd) :::