COMBSS (our method)

Where it sits between lasso and MIO

Where COMBSS fits

Three properties we want a method to have:

  1. Continuous / smooth optimisation — tractable, gradient-based, scalable.
  2. Targets the \(\ell_0\)-constrained problem — faithful to the original sparse selection task.
  3. Returns exactly \(k\) features — gives the user direct control over model size.
Returns exactly k features Continuous / smooth optimisation Targets ℓ₀- constrained problem Lasso MIO COMBSS
  • Lasso sits only in the smooth optimisation circle — convex and scalable, but doesn’t target \(\ell_0\) and doesn’t directly control \(k\).
  • MIO sits in targets \(\ell_0\) and returns \(k\) — exact but not a smooth optimisation, and not scalable.
  • COMBSS sits in all three.

The bridge: side by side

Lasso MIO COMBSS
Variable type \(\boldsymbol{\beta} \in \mathbb{R}^p\) \(\boldsymbol{s} \in \{0,1\}^p\) \(\boldsymbol{t} \in [0,1]^p\)
Sparsity by \(\ell_1\) penalty hard \(\ell_0\) + Big-\(M\) continuous relaxation on the support indicator
Selection size indexed by \(\lambda\) \(k\) via constraint \(k\) via polytope
Scales to \(p \gg n\) yes barely yes
Biased coefficients yes no no

In one sentence

COMBSS relaxes the discrete support indicator \(\boldsymbol{s} \in \{0,1\}^p\) to a continuous \(\boldsymbol{t} \in [0,1]^p\) on the cardinality polytope, fits a ridge GLM in the relaxed space, and runs a Frank-Wolfe homotopy that drives \(\boldsymbol{t}\) to a vertex of the polytope — recovering an integer subset of size exactly \(k\).

Two head-to-head questions

The simplest way to see what COMBSS buys you is to put it next to lasso on the same dataset and ask two questions.

Setup: \(n = 300\) (\(200\) train + \(100\) validation), \(p = 30\), \(\sigma = 0.5\), true coefficients \(\boldsymbol{\beta} = (3,\,2,\,1.5,\,1,\,0.5,\,0,\,\ldots,\,0)\), predictors (i.e., entries of the dataset) i.i.d. standard normal.

1. At a prescribed \(k = 5\), how biased are the coefficients?

Coefficients at \(k=5\): True (grey), Lasso at the path-\(\lambda\) that gives 5 features and minimises training MSE (coral), and COMBSS (teal, OLS-refit on its \(k=5\) subset). Squared \(\ell_2\) error to the truth across all 30 dimensions: lasso 0.075, COMBSS 0.009 — ~8x smaller.

COMBSS’s recovered coefficients match the truth to within sampling noise. Lasso’s are all visibly shrunken, even though both methods are looking at exactly five features.

2. For the same validation loss, how many features does each method need?

Left: validation MSE as a function of the number of selected features (full range). Right: zoom into the region \(k \ge 4\) where the two methods plateau. The dashed line marks lasso’s best-ever validation MSE. COMBSS at \(k = 5\) already sits below that line — lasso cannot match COMBSS’s \(k=5\) performance at any model size.

At \(k = 5\), COMBSS reaches a validation MSE that lasso cannot match at any model size — even with 22 features, lasso’s MSE stays above COMBSS’s value at \(k = 5\).

Preview of the rest of the talk

  • Methodology — how the relaxation works, the Frank-Wolfe step, the curvature homotopy.
  • Demos — five runnable demos showing the method in action.