Yet Another Math Programming Consultant

Monday, April 15, 2024

LP in statistics: The Dantzig Selector

Lots of statistical procedures are based on an underlying optimization problem. Least squares regression and maximum likelihood estimation are two obvious examples. In a few cases, linear programming is used. Some examples are:

Least absolute deviation (LAD) regression [1]
Chebyshev regression [2]
Quantile regression [3]

Here is another regression example that uses linear programming.

We want to estimate a sparse vector \(\color{darkred}\beta\) from the linear model \(\color{darblue}y=\color{darkblue}X\color{darkred}\beta+\color{darkred}e\) where the number of observations \(n\) (rows in \(\color{darkblue}X\)) is (much) smaller than the number of coefficients \(p\) to estimate (columns in \(\color{darkblue}X\)) [4]: \(p \gg n\). This is an alternative to the well-known Lasso method [5].

Instead of integers use binaries

In [1], a small model is proposed:

High-Level Model
\[\begin{align} \min\> & \sum_i \| \color{darkblue}a_i\cdot \color{darkred}x_i\| \\ & \max_i \|\color{darkred}x_i\| = 1 \\ & \color{darkred}x_i \in \{-1,0,1\} \end{align}\]

Can we formulate this as a straight MIP?

Water

Fascinating map with annual water throughput.
This is related to water availability for irrigation. An important topic.
The Rio Grande is not so grand here.
It must not be completely trivial to produce this map.
See:
Peter Gleick and Matthew Heberger, American Rivers: A Graphic, https://pacinst.org/american-rivers-a-graphic/

Saturday, February 10, 2024

Math vs Programming

A programmer writes about this blog:

(It is old, but I just came across this).

In my previous post, I just argued the other way around. To make sure: I don't hate programmers.

BTW, in quite a few programming languages for loops are very slow, and need to be replaced by something like sum(). Examples: Python, R, SQL.

Thursday, February 8, 2024

Small non-convex MINLP: Pyomo vs GAMS

In [1], the following Pyomo model (Python fragment) is presented:

model.x = Var(name="Number of batches", domain=NonNegativeIntegers, initialize=10)                    
model.a = Var(name="Batch Size", domain=NonNegativeIntegers, bounds=(5,20))

# Objective function
def total_production(model):
    return model.x * model.a
model.total_production = Objective(rule=total_production, sense=minimize)

# Constraints
# Minimum production of the two output products
def first_material_constraint_rule(model):
    return sum(0.2 * model.a * i for i in range(1, value(model.x)+1)) >= 70
model.first_material_constraint = Constraint(rule=first_material_constraint_rule)

def second_material_constraint_rule(model):
    return sum(0.8 * model.a * i for i in range(1, value(model.x)+1)) >= 90
model.second_material_constraint = Constraint(rule=second_material_constraint_rule)

# At least one production run
def min_production_rule(model):
    return model.x >= 1
model.min_production = Constraint(rule=min_production_rule)

One nonzero in set of free variables

In [1] the following question is posed:

I have free variables \(\color{darkred}x_i\). How can I impose the constraint that at least one of the variables is nonzero: \(\color{darkred}x_i\ne 0\).

Informs Test of Time Award for CONOPT paper

The Test of Time Award for papers published in the INFORMS Journal on Computing in the years 1993–1997 is awarded to
CONOPT: A Large-Scale GRG Code
Arne Stolbjerg Drud
ORSA Journal on Computing 6(2):207–216, 1994

As Arne notes in [1], he is helped a bit by the fact that CONOPT users may want to cite a published paper (and because there is no newer successor paper). Still, this is quite an achievement.

GAMS listing file: missing Unicode support

Newer versions of GAMS allow UTF-8 encoded strings as labels. That is very welcome, as these labels may come from data sources that just use Unicode characters. However, when printing to the listing file, we miss proper Unicode support. At first, I thought, "OK, just a few misaligned tables. No big deal." Here is a constructed example showing this may be a bit more problematic.

String Art

In [1], a greyscale picture is approximated by strings (lines) between points around the image. Here, I will try something similar with a formal optimization model.

Grouping items: a difficult combinatorial problem

In [1], a simple problem is described:

We have \(n\) items (or orders) with a certain width.
We need to combine these items in groups (called patterns) with rather tight limits on the total width. The total length of a pattern (the sum of the lengths of the items assigned to this pattern) must be between 335 and 340.
As a result, we may not be able to assign all items. The remaining items cannot be formed into valid patterns.
The objective is to try to place as many items as possible into patterns.
An indication of the size of the problem: \(n \approx 500\).

Data

Instead of immediately working on a full-known \(n=500\) problem, I generated a random data set with a very manageable \(n=50\) items. The widths were drawn from a discrete uniform distribution between 30 and 300. The data looks like:

----     15 PARAMETER w  item widths

order1   76.000,    order2  258.000,    order3  179.000,    order4  111.000,    order5  109.000,    order6   90.000
order7  124.000,    order8  262.000,    order9   48.000,    order10 165.000,    order11 300.000,    order12 186.000
order13 298.000,    order14 236.000,    order15  65.000,    order16 203.000,    order17  73.000,    order18  97.000
order19 211.000,    order20 147.000,    order21 127.000,    order22 125.000,    order23  65.000,    order24  70.000
order25 189.000,    order26 255.000,    order27  92.000,    order28 210.000,    order29 240.000,    order30 112.000
order31  59.000,    order32 166.000,    order33  73.000,    order34 266.000,    order35 101.000,    order36 107.000
order37 190.000,    order38 225.000,    order39 200.000,    order40 155.000,    order41 142.000,    order42  61.000
order43 115.000,    order44  42.000,    order45 121.000,    order46  79.000,    order47 204.000,    order48 181.000
order49 238.000,    order50 110.000

I stick to the pattern limits \(\color{darkblue}L=335\) and \(\color{darkblue}U=340\).

We need some estimate of the number of patterns to use. We could just guess. But a better approach is the following. An upper bound for the number patterns can be established quite easily: \[{\mathit{maxj}} = \left\lfloor \frac{\sum_i \color{darkblue}w_i}{\color{darkblue}L}\right\rfloor\] For our data set this number is:

----     29 PARAMETER maxj                 =       22.000  max number of patterns we can fill

This means we can safely use this number as the number of bins (patterns).

Yet Another Math Programming Consultant

Monday, April 15, 2024

LP in statistics: The Dantzig Selector

Friday, April 12, 2024

Instead of integers use binaries

Thursday, March 28, 2024

Water

Saturday, February 10, 2024

Math vs Programming

Thursday, February 8, 2024

Small non-convex MINLP: Pyomo vs GAMS

Tuesday, January 30, 2024

One nonzero in set of free variables

Tuesday, January 16, 2024

Informs Test of Time Award for CONOPT paper

CONOPT: A Large-Scale GRG Code

Monday, January 8, 2024

GAMS listing file: missing Unicode support

Thursday, January 4, 2024

String Art

Sunday, November 19, 2023

Grouping items: a difficult combinatorial problem

Data