Return to Article Details A working set algorithm for support vector regression problems

A Working Set Algorithm
for Support Vector Regression Problems

Saida Azib and Belkacem Brahmi
(Date: August 08, 2025; accepted: November 22, 2025; published online: December 15, 2025.)
Abstract.

Support vector regression (SVR) is widely applied to nonlinear regression tasks but remains computationally demanding due to its dual quadratic formulation and structural constraints. In this paper, we propose a working set algorithm, termed WSA–SVR, that extends the primal simplex method recently developed for SVM classification to the regression framework. WSA–SVR iteratively generates a sequence of basic feasible solutions converging to the optimal solution. Its key feature lies in preserving primal feasibility while employing efficient rank-one updates, thereby avoiding null-space and reduced-Hessian projections. This guarantees both numerical stability and scalability. Extensive experiments on benchmark datasets demonstrate that WSA–SVR converges more rapidly than state-of-the-art solvers while maintaining competitive or superior predictive accuracy.

Key words and phrases:
Support vector regression, convex quadratic programming, simplex method, working basis, kernel methods.
2005 Mathematics Subject Classification:
90C20, 90C25, 65K05, 68T05, 90C90
LaMOS Research Unit, University of Bejaia, 06000 Bejaia, Algeria, e-mail: saida.azib@univ-bejaia.dz
Department of Operations Research, University of Bejaia, 06000 Bejaia, Algeria, e-mail: belkacem.brahmi@univ-bejaia.dz

1. Introduction

Support Vector Regression (SVR) is a powerful machine learning algorithm widely used for regression tasks, offering a principled framework for estimating real-valued functions [26, 21]. Beyond its theoretical foundations, SVR has been successfully applied in a wide range of domains, such as financial forecasting [24], bioinformatics and medical imaging [22, 28], meteorology [6], and more recently, environmental and hydrological modeling [15]. This shows that the method is versatile and reliable, even when dealing with complex, noisy, or nonlinear data.

The classical SVR problem is formulated by introducing an ϵ-insensitive region, called the ϵ-tube, symmetrically surrounding the regression function. Points outside this tube are penalized, while those inside incur no penalty [1, 9, 21]. Training an SVR model consists of solving a convex optimization problem that balances model complexity with predictive accuracy [15, 28, 12, 23]. This objective seeks to minimize model complexity while controlling errors larger than ϵ, thus ensuring robust generalization even in the presence of noise or nonlinearities.

However, solving the primal problem directly is often computationally challenging. The dual formulation, expressed as a convex quadratic program (QP) with an equality constraint and box constraints on pairs of dual variables, is better suited to kernel methods but introduces specific algorithmic difficulties due to the ϵ-insensitive loss and coupling constraints [17, 20, 29].

Several approaches have been developed to tackle this optimization framework. Exact QP solvers ensure convergence but scale poorly to large datasets [16]. Decomposition methods, especially sequential minimal optimization (SMO) and its variants as implemented in LIBSVM [5], are widely adopted for large-scale problems [18, 5]. Active-set strategies, initially studied for SVM classification, have motivated extensions to SVR, offering efficient rank-one updates and improved numerical stability [27, 14]. For large-scale regression problems, primal-stochastic techniques such as coordinate descent and stochastic gradient descent, combined with kernel approximations (Nyström, Random Fourier Features), enable scalability [11, 19]. Furthermore, variants such as least squares SVR (LS-SVR), smooth ϵ-SVR, linear programming formulations, and online-based algorithms have been developed to extend SVR to noisy and streaming data [22, 12, 11, 13].

In this context, we propose a novel approach, the Working Set Algorithm for Support Vector Regression (WSA–SVR). While inspired by the active-set framework recently introduced for SVM classification [4] and support methods [10, 3] for convex quadratic programming, our method provides a specific extension to the regression setting. The method explicitly addresses the structural constraints of the dual formulation of an ϵ–SVR, and generates a sequence of feasible basic solutions converging to an optimal solution. Each iteration of WSA–SVR involves four steps: identifying the most violating variable, computing a descent direction, selecting a step size that decreases the objective while maintaining feasibility, and changing the working basis. A key feature of our approach is that it directly handles the dual variables in pairs as originally formulated, guarantees the non-singularity of the basis matrix throughout the iterations, and avoids the costly computation of the reduced Hessian. Consequently, the proposed WSA–SVR remains simple to implement and easy to understand.

The proposed method is empirically evaluated on several regression datasets issue from the UCI repository [8], using both linear and RBF kernels. Numerical experiments are encouraging and indicate that WSA-SVR converges faster than conventional solvers such as SMO and LIBSVM, while achieving comparable or superior predictive accuracy. A theoretical study of convergence and complexity is also conducted to validate the robustness of the algorithm.

The remainder of this paper is organized as follows. Section 2 reviews the SVR framework, where the primal formulation and its associated dual problem are presented. Section 3 presents the suggested WSA–SVR algorithm, and Section 4 details the iterative update scheme. Convergence guarantees and complexity computation of WSA–SVR are discussed in Section 5. Experimental results and comparisons with standard solvers (SMO-MAT and LIBSVM) are reported in Section 6, and Section 7 concludes the paper.

2. Background

Let 𝒟={(xi,yi)}i=1n be a training set, where xim are input features and yi are scalar outputs. Here, n and m represent respectively the number of examples and features. The primary objective of an SVR model [26, 22] is to approximate a nonlinear function h:m that best fits the training data within an ε-insensitive margin while remaining as flat as possible. Its form is as follows:

h(x)=wΦ(x)+b,

where w is the weight vector, the scalar b is called bias, and the application Φ:m is a feature mapping to a reproducing kernel Hilbert space . In practice, the kernel function K(xi,xj)=Φ(xi)Φ(xj) allows the implicit computation of dot products in , assuming that K satisfies Mercer’s condition [22, 21]. The symbol () is used to the transposition of vectors or matrices.

To tolerate small deviations, the SVR uses an ϵ-insensitive loss function. Slack variables ξi+ and ξi are introduced to handle violations beyond the margin ϵ. The primal formulation of an SVR problem is the following convex optimization problem:

(1) minw,b,ξ+,ξ 12w2+Ci=1n(ξi++ξi)
s.t. yiwΦ(xi)bϵ+ξi+,i=1,2,,n,
wΦ(xi)+byiϵ+ξi,i=1,2,,n,
ξi+,ξi0,i=1,2,,n.

Here, C>0 denotes the regularization parameter that governs the trade-off between model flatness and error tolerance. Let α+ and α be the n-dimensional vectors of Lagrange multipliers associated with the pair of general constraints in the primal problem. By introducing these multipliers, the dual formulation can be expressed as:

(2) minα+,α 12i,j=1n(αi+αi)(αj+αj)K(xi,xj)+ϵi=1n(αi++αi)i=1nyi(αi+αi)
s.t. i=1n(αi+αi)=0,
0αi+,αiC,i=1,2,,n.

For numerical implementation, we define:

α=[α+α]2n,H=[QQQQ],p=[ϵ𝟏yϵ𝟏+y],a=[𝟏𝟏],

where 𝟏 is a vector of ones of an appropriate dimension. Accordingly, the dual problem can be expressed in matrix form as:

(3) minα2nf(α)= 12αHα+pα
s.t. aα=0,
0αC𝟏,

where Qij=K(xi,xj) is the Gram matrix associated with the kernel function. Being a combination of positive semi-definite kernel matrices, H is itself symmetric and positive semi-definite, which ensures the convexity of the dual problem (3).

As in classical SVM classification, the optimal solution α of an SVR model is typically sparse, meaning that the estimated regression function is determined by only a small subset of training examples, known as support vectors. This set is formally defined as

𝒮={i0<αiC}.

The final regression function can be expressed as follows:

h(x)=i𝒮αiK(xi,x)+b.

Since the dual problem (3) is convex, the Karush–Kuhn–Tucker (KKT) conditions  [16] are both necessary and sufficient for the optimality. In particular, a feasible point α is optimal if and only if it satisfies the following KKT conditions:

(4) {Hα+p+baμ+ν=0,aα=0,0αC𝟏,μiαi=0,νi(αiC)=0,i=1,2,,2nμi0,νi0,i=1,2,,2n.

where μ and ν are the Lagrange multipliers associated with the lower and upper box constraints, respectively. This yields:

  • If αi=0, then μi0,νi=0,

  • If αi=C, then μi=0,νi0,

  • If 0<αi<C, then μi=νi=0.

3. Principle of the new method

Unlike conventional strategies based on decomposition or interior point methods, we propose a direct Working Set Algorithm for solving the dual SVR quadratic program (3). The proposed method, termed WSA–SVR, is inspired by the primal simplex framework recently introduced in [4], and has been significantly adapted to address the specific structure and constraints of support vector regression. Our formulation avoids the use of null-space projections and reduced Hessians.

The WSA–SVR algorithm constructs a sequence of support feasible solutions (SFS) each defined by a working basis JB for which the associated Hessian submatrix QB is nonsingular. The SFS monotonically reduces the objective function value, with each update respecting both the box and equality constraints. The main process includes:

  • Identifying the most violating non-basic variable according to its reduced cost.

  • Computing a feasible descent direction d by finding a KKT conditions.

  • Finding the smallest stepsize, θ, that guarantees descent and maintains feasibility.

  • Updating the working set JB accordingly to keep the basis nonsingular.

Notations and Problem Structure

Let J={1,2,,2n} denote the full index set for the decision variables vector α. We decompose J into two disjoint subsets: the basic index set JB and the non-basic index set JN=JJB, with JBJN=. Accordingly, we partition the vector α and other relevant vectors and matrices as follows:

α=[αBαN],y=[yByN],p=[pBpN],a=[aBaN],H=[HBHBNHNBHN].

where:

  • αB=(αj,jJB) and αN=(αj,jJN) denote, respectively, basic and non-basic variables;

  • yB=(yj,jJB) and yN=(yj,jJN) are the basic and non-basic components of the responses, respectively;

  • pB=(pj,jJB) and pN=(pj,jJN) denote the corresponding partitions of the linear term p in the quadratic program;

  • aB=(aj,jJB) and aN=(aj,jJN) denote the corresponding partitions of the constraint vector a;

  • HB=H(JB,JB) is the principal submatrix of H associated with the basic variables;

  • HBN=H(JB,JN)andHNB=HBN;

  • HN=H(JN,JN) is the submatrix for non-basic variables.

We begin by formalizing the structural components that underpin the WSA–SVR framework.

Definition 1 (Working basis [4]).

A nonempty subset of indices JBJ is called a working-basis (support) for problem (3), if the associated basic matrix HB=H(JB,JB) is nonsingular. Its complement is the non-basic set which is noted JN=JJB.

Definition 2 (Support feasible solution (SFS)).

A pair {α,JB} is called a support feasible solution (SFS) if the vector α is feasible for the dual problem (3), and JB forms a working basis. Furtehermore, it is said to be non-degenerate if all the basic variables lie strictly within the bounds:

0<αj<C,jJB
Definition 3 (Reduced costs vector).

Given an SFS {α,JB} for the QP (3), the reduced costs vector Δ2n is defined as

Δ=Hα+p+ba=(ΔB,ΔN),

where the bias b is chosen such that the basic component satisfies ΔB=0, i.e.,

HBαB+HBNαN+pB+baB=0.

The optimality conditions (4) are reformulated with respect to the support set JB and can be established in a manner analogous to the arguments in [10, 3, 4].

Theorem 4 (Optimality conditions).

Let {α,JB} be an SFS for the QP (3). Then, the following relations:

(5) {Δj0,if αj=0,Δj0,if αj=C,Δj=0,if 0<αj<C,jJN,

are sufficient for the optimality of the point α and are also necessary if the SFS {α,JB} is nondegenerate.

Note that the key difference between the optimality conditions stated above and the standard KKT conditions (4) is that our conditions apply only to the non-basic variables, whereas the KKT conditions must hold for all variables. This constitutes a significant simplification of the optimality verification process.

4. Iteration Scheme

The aim of the proposed working set method, named WSA–SVR, is to solve the dual formulation of an ϵ-SVR problem. WSA–SVR generates a sequence of support feasible solutions that converges globally to an optimal solution of the QP (3). As any optimization method, our WSA-SVR algorithm needs a first feasible solution to start the resolution process. For our case, this solution can be calculated simply by choosing the initial working-basis as JB={1} and the feasible point α=0. Thus, the corresponding bias and the reduced costs vector are, respectively, given by the following explicit formulas.

(6) {ΔB=Δ1=0p1+ba1=0b=p1=ϵ+y1;ΔN=pN+baN.

An iteration of WSA-SVR algorithm consists to calculate a new solution α¯ from the current iterate α using the update formula:

(7) α¯=α+θd,θ>0,

where d2n is the search direction and θ is the stepsize. The associated bias and its reduced costs vector are updated as follows:

(8) {b¯=b+θr;Δ¯=Hα¯+p+b¯a=Δ+θ(Hd+ra)=Δ+θt,

where r, and the vector t=(tB,tN) is defined by:

(9) {tB=0,tN=HNBdB+HNj0dj0+raN.

The first step of WSA–SVR algorithm is to test the optimality of the current SFS {α,JB}. If the optimality relations (5) of Theorem 4 are satisfied, then α is an optimal solution of the problem and the algorithm terminates. Otherwise, the algorithm selects the non-basic index that exhibits the largest violation.

(10) j0=argmaxjJNNO|Δj|,

where JNNOJN denote the subset of non-basic indices for which the optimality conditions (5) are not satisfied.

Selecting j0JNNO thus fulfills two roles: it detects the most critical violation of the KKT conditions and specifies the entering variable used to construct the descent direction.

4.1. Search direction

Let j0JNNO be the non-basic index determined by rule (10). The search direction d=(dB,dN) is then constructed as follows.

(11) dj={sign(Δj0),if j=j0,0,otherwise,jJN.

This defines the non-basic component dN.

The basic component dB|JB| is then computed to satisfy the feasibility and stationarity conditions. Its explicit expression is given by:

(12) dB=HB1(HBj0dj0+raB),

where the scalar r is computed to enforce the equality constraint ad=0. Its value is:

(13) r=dj0(aj0aBHB1HBj0)aBHB1aB.

4.2. Stepsize selection

The stepsize θ is chosen to ensure that the updated solution α¯=α+θd remains feasible and the value of the objective function f decreases at α¯. Thus, it is defined as follows:

(14) θ=min{θj0,θj1,θf},

where each term corresponds respectively to:
- the minimum feasible step allowed by the non-basic variable αj0,
- the minimum permissible step before any basic variable violates its bounds,
- the stepsize that minimizes the objective function along the non-basic component αj0.

To ensure feasibility, the novel solution must satisfy the following box constraints:

{0αj0+θdj0C,j0JNNO,0αj+θdjC,jJB,

where j0 denotes a non-basic index and JB is the set of basic indices.

Calculate θj0 and θj1

The maximum feasible step for the non-basic index j0 is computed as:

(15) θj0={αj0,if dj0=1,Cαj0,if dj0=1.

For the basic index j1JB, the corresponding step is:

(16) θj1=minjJBθj, such that: θj={Cαjdj,dj>0,αjdj,dj<0,+,dj=0,jJB.

Calculate θf :

The step size θf is computed to achieve the maximal reduction in the objective function f when moving along the direction d. Using the exact second-order Taylor expansion of f at point α, we obtain:

g(θ) =f(α+θd)
=f(α)+θdf(α)+12θ2d2f(α)d
=f(α)+θd(Hα+p)+12θ2dHd.

By substituting the expressions f(α)=Δba and Hd=tra, and using the feasibility condition ad=0, the expression simplifies to:

(17) g(θ)=f(α)+θdΔ+12θ2dt.

The optimal value θf is obtained by solving:

gθ=dΔ+θdt=0.

Assuming ΔB=tB=0 and using only the non-basic component:

gθ =dj0Δj0+θdj0tj0=0,

which gives the final expression:

(18) θf={Δj0tj0,if Δj0tj0<0,+,otherwise.

4.3. Update the working-basis

After determining the new feasible solution α¯, the support set J¯B is updated according to the active constraint that has become binding.

  • Case θ=θj0: The non-basic variable αj0 reaches a bound:

    α¯j0=αj0+θj0dj0={0,if dj0=1,C,if dj0=1.

    No change is made to the basis:

    J¯BJB,J¯NJN.
  • Case θ=θj1: A basic variable reaches a bound:

    α¯j1=αj1+θj1dj1={0,if dj1<0,C,if dj1>0.

    Two subcases arise:

    • If |JB|>1, remove j1 from the basis:

      J¯BJB{j1},J¯NJN{j1}.
    • If |JB|=1, replace it with j0:

      J¯B{j0},J¯NJN{j0}.
  • Case θ=θf: The non-basic index j0 becomes basic:

    J¯BJB{j0},J¯NJN{j0}.

After this step, the WSA-SVR algorithm proceeds to the next iteration until convergence to the optimal solution.

4.4. Proposed WSA-SVR algorithm

The main steps of the proposed WSA–SVR algorithm are summarized in the following pseudocode.

Input: Training set {(xi,yi)}i=1n, kernel K(,), parameters C,ϵ,γ
Output: Dual variables α and bias b.
Step 0: Initialization
Start with the first SFS {α,JB}, such that: JB={1} and α=0.
Compute the bias b and the reduced costs vector Δ=(ΔB,ΔN) using (6).
1exStep 1: Check the optimality of the SFS {α,JB}
if optimality conditions (5) are satisfied then
   Stop: α is an optimal solution to problem (3).
 
1exStep 2: Search directions computation
Compute d=(dB,dN) using (12) and (11), and r by (13).
Compute reduced-costs t=(tB,tN) using (9).
1exStep 3: Stepsize computation
Set θ=min{θj0,θj1,θf}, where stepsizes are calculated by (15)–(18).
Compute:
αα+θd,ΔΔ+θt,bb+θr.
1exStep : Working basis update
if θ=θj1,j1JB then
 if |JB|>1 then
    JBJB{j1},JNJN{j1}.
    
 else
    JB{j0},JNJ{j0}.
    
   end if
 
end if
else if θ=θf then
 JBJB{j0},JNJN{j0}.
 
end if
Go to Step 1.
Algorithm 1 WSA–SVR: Working Set Algorithm for ϵ-SVR problem

5. Convergence Analysis and Computational Complexity

Below, we present a proof establishing the finite convergence of WSA–SVR and analyze its computational complexity. The argument follows the same logic as the analysis given for primal simplex-type schemes (see, in particular,  [4] and  [10]).
We begin with the following proposition, which demonstrates that the direction d2n is a feasible descent direction for the dual formulation (3).

Proposition 5.

At an iteration k of WSA–SVR method, let dk=(dBk,dNk) be the search direction, calculated similarly by relations (11) and (12), respectively, and let θk the step size calculated by formulas (14). Then, the following statements hold:

  1. 1)

    The 2n-vector dk is a descent feasible direction at αk.

  2. 2)

    For any other solution αk+1=αk+τdk, we have:

    (19) f(αk+1)f(αk).
Proof.

a) For the first part, it is clear that dk is a feasible direction. Indeed, dk is constructed such that αk and αk+1 remain feasible, i.e., aαk=0 and aαk+1=0adk=0. Moreover, by construction of the optimal step size θk via Eq. (14), αk+1 satisfies the box constraints in (3). Hence, dk is a feasible direction.

Using relations (11)–(12) and the definition of the reduced costs vector Δk and its direction tk at step k, we obtain

f(αk)dk =(Hαk+p)dk=(Δkbka)dk=(Δk)dk=(ΔBk)dBk+(ΔNk)dNk
=dj0kΔj0k=|Δj0k|<0.

This strict inequality holds since, by the choice of the entering index in the working basis, we have Δj0k0,j0JNNOk.

b) For the second part, using the Taylor’s expansion at αk, we have

f(αk+1)=f(αk+τdk)=f(αk)+τf(αk)dk+12τ2(dk)Hdk.

Since dk is a descent feasible direction and for a small value of τ[0,θk] with θk computed by (14), it follows that

f(αk+1)f(αk).

Moreover, the strict inequality holds when the SFS {αk,JBk} is non-degenerate for the QP (3). ∎

Now, we establish the finite termination of the proposed WSA–SVR algorithm in the following theorem.

Theorem 6.

The proposed WSA–SVR algorithm generates a sequence of support feasible solutions to problem (3) and terminates in a finite number of iterations at an optimal solution.

Proof.

The dual SVR problem (3) is bounded below, so the sequence of feasible iterates {xk}k0 generated by WSA–SVR remains bounded. Each update has the form

xk+1=xk+θkdk,

where dk is a feasible search direction and θk0 is the maximal step length defined previously.

By Proposition 5, if the current basic feasible solution is non-degenerate, then θk>0 and the objective function decreases strictly:

f(xk+1)<f(xk).

In the degenerate case, θk=0, so the objective value does not decrease, but the working-basis is updated (see Section 4.3). To avoid cycling, one may invoke a standard anti-cycling rule such as Bland’s rule (see [10, 16]) or rely on simplex-type convergence arguments as in [4], which guarantee progress in the sequence of working sets.

Since the number of possible working-basis is finite, the algorithm cannot generate an infinite sequence of distinct supports without either decreasing the objective function or revisiting a previous working-basis. Consequently, WSA–SVR produces a sequence of iterates that is strictly decreasing in the objective value, except possibly for finitely many degenerate steps. Termination therefore occurs in finite iterations, precisely when no entering variable can be found, i.e., when the KKT conditions of (3) are satisfied and an optimal solution is reached. ∎

In this part, we analyze the computational complexity of our WSA–SVR algorithm. Let n denote the number of samples and q=|JBk| the size of the working-basis at a typical iteration k. The dominant per-iteration cost arises from computing the search directions dBk and tNk, defined in (12) and (9), respectively.

Computing dBk requires solving a linear system with the basis matrix HBk=H(JBk,JBk)q×q. This is typically performed using the Cholesky factorization, which incurs a cost of 𝒪(q3) if recomputed from scratch. However, when employing rank-one updates of the Cholesky factors, this cost can be reduced to 𝒪(q2), as shown in [4]. Step-size computation and primal–dual updates involve comparatively negligible costs of 𝒪(q) and 𝒪(2n), respectively. In contrast, evaluating the non-basic reduced costs component tNk requires matrix–vector multiplications with a complexity of 𝒪((2nq)q).

Hence, the overall per-iteration complexity is bounded by

𝒪(q2+(2nq)q).

This demonstrates that the WSA–SVR algorithm achieves polynomial-time complexity with respect to the size of the training set, in addition to ensuring finite convergence.

6. Experimental Results

In this section, the proposed WSA-SVR method is empirically evaluated and compared against two widely used SVR solvers: LIBSVM and SMO-MAT. LIBSVM [5] is a standard library for support vector machines, extensively optimized for both classification and regression tasks, and coded in the C++ Language that is widely regarded as the reference implementation for SVR. SMO-MAT refers to MATLAB’s built-in SVR solver, implemented through the fitrsvm function, which relies on a variant of Platt’s SMO algorithm. The fitrsvm interface offers efficient routines for both linear and nonlinear kernels and is fully integrated within MATLAB’s machine learning toolbox. The experiments were conducted with both RBF and linear kernels on five benchmark datasets: Mpg, Housing, Mg, Space_ga, and Abalone.

All experiments were run on a personal computer, without GPU acceleration. The implementation of WSA-SVR, along with LIBSVM (via the MATLAB interface) and SMO-MAT, was carried out in MATLAB R2025a.

Prior to training, all datasets were standardized to zero mean and unit variance. This preprocessing step ensures that each feature contributes equally to the training process, thereby improving both convergence and numerical stability of the SVR algorithms.

To ensure fair comparison, a five-fold cross-validation technique was used to select hyper-parameters for each dataset. For models that used the RBF kernel, a grid search was conducted across C{23,,215} and γ{215,,23}. For the linear kernel, C was selected from the range {24,,210}.

The RBF (Gaussian) kernel is defined as:

(20) K(𝐱i,𝐱j)=exp(γ𝐱i𝐱j2),

where γ>0 is the kernel width parameter. The linear kernel corresponds to the inner product:

(21) K(𝐱i,𝐱j)=𝐱i𝐱j.

The optimal hyper-parameters obtained for all datasets through five-fold cross-validation are reported in Table 1 for both kernels. These configurations were subsequently used for training and evaluation across all SVR methods to guarantee a consistent and unbiased comparison between solvers.

Dataset Inputs Samples RBF Kernel Linear Kernel
𝐂 𝜸 𝐂
Mpg 7 392 64 0.125 16
Housing 13 506 64 0.0625 4
Mg 6 1385 1 0.5 4
Space_ga 6 3107 128 0.125 16
Abalone 8 4177 16 0.0625 64
Table 1. Description of the datasets, their characteristics, and the corresponding optimal hyper-parameters for both kernel types.

The performance of the models was evaluated using several criteria: the number of iterations until convergence, training time (in seconds), the number of support vectors (NSV), and two regression accuracy metrics: the mean squared error (MSE) and the coefficient of determination R2.

The MSE is defined as:

(22) MSE=1ni=1n(yiy^i)2,

where yi is the true output, y^i the predicted output, and n the number of samples.

The coefficient of determination R2 is given by:

(23) R2=1i=1n(yiy^i)2i=1n(yiy¯)2,

where y¯ is the mean of the observed data.

(24) y¯=1ni=1nyi.

Table 2 and Table 3 present numerical results that show the effectiveness and robustness of the proposed WSA–SVR algorithm across diverse benchmark datasets and kernel types. WSA–SVR shows good convergence behavior and attains predictive accuracy equal to, and sometimes greater than, that of state-of-the-art solvers including SMO-MAT and LIBSVM.

Dataset Method Iterations Time(s) NSV MSE R2
Mpg WSA-SVR 586 0.0966 375 4.4902 0.9261
SMO-MAT 3305 0.0229 377 4.4956 0.9260
LIBSVM 6120 0.0378 375 4.4903 0.9263
Housing WSA-SVR 887 0.2241 480 4.7924 0.9432
SMO-MAT 4287 0.0384 486 4.7909 0.9432
LIBSVM 7328 0.0658 480 4.7924 0.9445
Mg WSA-SVR 841 0.3932 474 0.0118 0.7796
SMO-MAT 1960 0.0445 493 0.0118 0.7795
LIBSVM 2258 0.3581 479 0.0118 0.7806
Space-ga WSA-SVR 1529 3.3463 831 0.0089 0.7877
SMO-MAT 4897 0.2308 871 0.0089 0.7866
LIBSVM 4388 1.7257 842 0.0089 0.7881
Abalone WSA-SVR 3351 3.0348 3914 4.4024 0.5864
SMO-MAT 2391 1.1571 3899 4.4024 0.5863
LIBSVM 3319 3.2790 3917 4.4024 0.5965
Table 2. Performance comparison between WSA-SVR, LIBSVM, and SMO-MAT using the RBF kernel for ϵ=0.1.
Dataset Method Iterations Time(s) NSV MSE R2
Mpg WSA-SVR 1238 0.2642 389 11.5610 0.8097
ϵ=0.01 SMO-MAT 4146 0.3912 392 11.4183 0.8121
LIBSVM 20920 0.0388 389 11.5606 0.8130
Housing WSA-SVR 1136 0.1040 506 24.6955 0.7075
ϵ=0.01 SMO-MAT 2495 0.1718 506 24.6734 0.7077
LIBSVM 10449 0.0224 506 24.6852 0.7174
Mg WSA-SVR 9907 1.2648 1371 0.0219 0.5732
ϵ=0.001 SMO-MAT 54324 1.5300 1385 0.0218 0.5740
LIBSVM 235970 0.4469 1374 0.0219 0.5767
Space-ga WSA-SVR 15235 4.2888 3084 0.0164 0.5818
ϵ=0.001 SMO-MAT 719847 47.2932 3107 0.0558 0.5820
LIBSVM 2604864 9.6216 3094 0.0164 0.5823
Abalone WSA-SVR 34214 13.2915 4155 5.0608 0.5149
ϵ=0.01 SMO-MAT 276643 17.1626 4177 5.2831 0.4917
LIBSVM 2785316 7.8118 4158 5.0485 0.5320
Table 3. Performance comparison of WSA-SVR, SMO-MAT, and LIBSVM using the linear kernel.

In terms of convergence, the number of iterations necessary for WSA–SVR was generally lower than for its competitors to compute the optimal solution. For example, on the Space-ga dataset with an RBF kernel and ϵ=0.1, WSA–SVR converged in 1529 iterations (see Table 2), but SMO-MAT and LIBSVM needed 4897 and 4388 iterations, respectively. Similar trends were observed with the linear kernel, where WSA–SVR consistently and stably converged. For instance, when using the Mg dataset with the linear kernel and ϵ=0.01, WSA–SVR achieved similar prediction accuracy as the other methods while requiring fewer iterations (see Table 3).

The principal reason why WSA-SVR typically converges in fewer iterations than SMO-type methods lies in the nature of the subproblems solved at each step. In the case of an SVR problem, SMO updates two components for each two pairs of dual variables per iteration, which amounts to solving analytically a four-dimensional QP, without the need for an external solver. In contrast, our approach solves at each iteration a QP whose dimension is equal to the size of the working basis JB.

Concerning execution time, WSA–SVR demonstrates clear efficiency gains on most datasets when using the linear kernel, consistently outperforming or matching SMO-MAT and LIBSVM. This advantage is due to its strong convergence properties combined with the relatively low computational cost of kernel operations in the linear case. In contrast, for the RBF kernel, WSA–SVR’s runtime is sometimes higher despite requiring fewer iterations. For example, on the Space-ga dataset, WSA–SVR’s runtime reached 3.3463 s, compared to 0.2308 s for SMO-MAT and 1.7257 s for LIBSVM. This can be explained not only by the higher per–iteration cost when the size of the working-basis is large, but also by the way the methods are implemented: WSA–SVR is currently coded in Matlab (a high-level language), while SMO-MAT and LIBSVM are implemented in optimized C++ (a low-level language).

Concerning predictive performance, WSA–SVR achieved results comparable to those of the two SMO algorithms in all experiments. This can be explained by the fact that the compared algorithms are equally accurate and yield the same number of support vectors (NSV). For instance, on the Mg dataset with ϵ=0.1 and an RBF kernel, WSA–SVR obtained an R2 score of 0.7796 and an MSE of 0.0113 (see Table 2), which are nearly identical to LIBSVM’s values of R2=0.7807 and MSE = 0.0113. These findings indicate that WSA–SVR provides a favorable balance between accuracy and computational efficiency for both RBF and linear kernels.

The compact models that WSA–SVR can generate are another significant benefit. In many regression applications, the algorithm frequently chose a comparable or somewhat fewer number of support vectors when compared to LIBSVM and SMO-MAT, resulting in models that are simpler to understand and more useful for implementation.

7. Conclusion

In this paper, we presented WSA–SVR, a new algorithm for solving the dual form of the ϵ-SVR problem. Our method extends the primal simplex approach [4] to SVR by taking into account the specific structure of the problem, especially the paired variables and constraints. WSA–SVR algorithm proceeds iteratively by generating a sequence of feasible solutions that converges to an optimal solution. Unlike existing methods, it does not depend on the reduced Hessian matrices and can directly handle positive semidefinite kernels. We also proved that the algorithm converges in a finite number of steps and we calculated its computational complexity.

Numerical experiments on several benchmark datasets demonstrated both the effectiveness and robustness of the proposed algorithm. From a predictive perspective, WSA–SVR achieves accuracy comparable to state-of-the-art solvers, such as SMO-based methods [18, 9] and LIBSVM [5]. From an optimization viewpoint, WSA–SVR typically converges in fewer iterations than SMO-type algorithms. However, as highlighted in Section 5, the per-iteration computational cost depends strongly on the kernel structure: for dense RBF kernels, solving linear systems with a large basis matrix HB increases runtime, whereas for linear kernels, where matrix–vector operations are inexpensive and the basis size is bounded by the feature dimension, WSA–SVR exhibits superior performance in both convergence speed and CPU time.

Overall, WSA–SVR achieves an effective trade-off between predictive accuracy, convergence guarantees, and computational efficiency, offering a principled alternative to existing SMO-type algorithms. Future work will focus on developing optimized implementations (e.g., leveraging sparse linear algebra and low-rank kernel approximations) and extending the approach to large-scale learning tasks and other kernel-based models.

Acknowledgements.

We would like to thank the editor and reviewers for their valuable and constructive comments, which helped us to improve this paper.

References