How many Alpine packages can you install at once?
It’s annoying to try and run a command only to discover it hasn’t been installed. It’s even more annoying to build an entire container and miss some crucial dependency your app needs. Wouldn’t it be more convenient if your machine already had all of the software you could ever need installed ahead of time?
Thankfully, Linux distributions include a large selection of software in their repositories. However, we can’t just install everything, as a package may conflict with another, meaning they can’t both be installed. Additionally, we have to ensure all of the dependencies of each package are also installed, and that they don’t introduce any conflicts.
This all seems rather complicated, and indeed this problem is NP-Hard. Package managers include a constraint solver to determine what packages should be installed to bring the system to the desired state. We can also use a constraint solver, but instead ask it to maximise the number of installed packages subject to the dependency and conflict constraints.
I’ve chosen to target Alpine Linux for this
experiment, as the format of its repositories is easy to parse, and I enjoy
using it. As Alpine is commonly used as a basis for containers, this also
suggests the terrifying possibility of using this as a base image—very much
the opposite of FROM scratch
.
Parsing APK Repositories
The first step is to load a description of the package repository. Alpine has
two primary repositories: main
and community
. For each of these, we can
download an APKINDEX
file that gives us information about every package in the
repository. The community
repository expects to be used alongside main
, so
we’ll also need to consider the two together.
The format of APKINDEX
is straightforward. Packages are separated by blank
lines, and each line begins with a single character code describing how to
interpret the value on this line. For this experiment, we need to parse the
following properties for each package:
P:
, the package name;V:
, the package version;D:
, the dependencies of the package;p:
, what the package provides; andi:
, what other packages should cause this package to also be installed.
I haven’t explained the last two yet, and the concept of conflicts seems
conspicuously absent. Saying a package conflicts with another is the same as
saying it depends on the other package not being installed, which APK can
represent as !another_package
.
A package may provide any number of names, optionally with corresponding versions. If the package is installed, APK can then treat it as equivalent to these provided packages being installed. Packages can provide a name without a version: this is called a virtual package. Multiple packages providing the same virtual package can be installed at once, but only one package can provide a non-virtual package at a time. This system allows packages to depend on any of a number of semantically equivalent packages being installed.
Finally, a package can ask to be installed if a number of other packages are also installed. This is useful when another package augments an existing one, or connects two packages’ functionality somehow. I didn’t initially include this field, but from testing it seems that APK will always follow it.
Creating Constraints
With the APK repositories parsed, we can turn our attention to formulating a set of constraints to enforce the relationships between the packages. Additionally, we need an objective function: a goal for the constraint solver to maximise, which in our case is the total of number of packages installed.
To understand how APK interprets packages, I found the APKBUILD reference and Apk spec pages very helpful. The most complicated part, and the part I skimped on the most, was implementing version comparisons. APK supports requiring that a dependency is an exact version, or an exact version ignoring the final part of the version describing the package rather than the software it contains. Additionally, it can constrain dependencies based on inequalities: any of less-than-or-equal-to, less-than, greater-than, or greater-than-or-equal-to. Package versions can be in any number of formats, and from what I can see APK has a dedicated algorithm to parse and compare them, including support for things like alpha and beta versions. I took as simple an approach as I could get away with: converting all sets of consecutive digits in the version string to integers, then comparing these with the earliest set of digits having most significance.
For this experiment, I used PuLP, which is a Mixed Integer Linear Programming (MILP) solver. Our constraints take the form of inequalities between a linear combination of variables and an integer. Additionally, we constrain our variables to be integers themselves. Specifically, we can constrain variables to be either 0 or 1, representing a Boolean value, such as whether a package is installed. I chose PuLP as it has a nice Python API and I’ve had good success with it in the past on problems that z3 has struggled with.
We create a binary variable for each package in the repository. I’ll denote
these like ,
which represents whether to install the package firefox
. Note that there is a
separate variable for each version of the package in the repository, but I omit
that here for simplicity. The objective function we are trying to maximise is
simply the sum of all of these variables, which is the total number of packages
installed.
Conflicts
Now we need to express the package constraints such that PuLP can understand
them. The simplest constraints are those for packages that conflict. I find it
easiest to draw the boolean truth table we’re looking for, such as this for two
conflicting packages A
and B
:
Correct? | ||
---|---|---|
0 | 0 | 1 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
The only disallowed case is if both packages are installed. It is clear the boolean operation we’re looking for is NAND, but we can also express this with the constraint . We make packages conflict in a number of cases:
- with packages of the same name (different versions of a package can’t be installed at the same time);
- with packages of the same name as something this package provides;
- with packages also providing something this package provides, so long as the name provided isn’t a virtual package; and
- with packages this package explicitly conflict with (via a
!another_package
dependency).
Dependencies
Next, we can tackle dependencies. The truth table for a package A
that depends
on package B
is as follows:
Correct? | ||
---|---|---|
0 | 0 | 1 |
0 | 1 | 1 |
1 | 0 | 0 |
1 | 1 | 1 |
Here the boolean operation is that of logical implication:
.
We can formulate this as a constraint
.
This constraint also generalises nicely to when package A
depends on any of a
set of packages :
.
We can only subtract one after the sum, so as long as at least one of the
dependencies is installed, the total cannot be negative.
Install If
Finally, we must handle the i:
field. I was unable to come up with a simple
constraint to express that a package must be installed if all of the packages
listed in the field are satisfied by at least one package, so I turned to a
classic constraint embedding trick: auxiliary variables. We break the constraint
into multiple simpler constraints, with new variables representing intermediate
steps. These variables are auxiliary because we don’t directly care about their
value; they don’t correspond to whether a particular package should be
installed.
Specifically, for each entry in the i:
field, we introduce a variable
representing whether that entry has been satisfied by a package. In essence, we
make each package that could fulfil it depend on this variable. However, we must
add an additional constraint. With our package dependency example above, with
A
depending on B
, it is valid for B
to be installed and A
to not be.
Similarly, it would be possible for this auxiliary variable to be true if none
of its providers are installed. We need to ensure that this auxiliary variable
is zero if no package is installed that fulfils it. We do this by making it
depend on at least one of the packages that fulfils it, just as in the example
with A
and above.
Additionally, we introduce an auxiliary variable representing whether every
entry in the i:
field has been satisfied. We make this variable depend on all
the variables we introduced above. As before, we need an extra constraint to
make sure it is true if all of the sub-variables are true. Representing the
sub-variables as the set and the final
variable as I achieved this with the
constraint
.
If every variable in is true,
must be true to bring the total low enough. To
complete these constraints, we make this final variable depend on the package
being installed.
This was by far the most complicated set of constraints to write and debug. Either there is a simpler way of doing this, or they illustrate that a different solver or an automated way of creating constraints from boolean logic would be more convenient.
Solving
After setting up all of these constraints, we turn our problem over to the
solver and wait. On my laptop, solving them takes around two seconds for main
alone, and two minutes for main
and community
. The choice of solver and
constraints can make a huge difference about how long a problem takes to solve,
so it may be possible to make large improvements to the performance.
One interesting thing I stumbled across was the existence of packages in the
repository that were impossible to install. For example, in the APKINDEX
file
I downloaded, the package aconf-mod-network
depends on network
, but no such
package is provided. Indeed, APK is unable to install this package. This issue
has been fixed on
edge.
Testing the World Files
After the solver finishes, we can emit a world file, describing the exact version of every package we want to install. If we have set up our constraints right to mirror APK, no packages that aren’t listed in this world file should need to be installed.
We can copy our world file to /etc/apk/world
and run apk fix --simulate
to
test it out. This checks that the world isn’t broken without having to download
all of the packages—saving both bandwidth and disk space.
It took a number of iterations before APK was happy with my world files. The
majority of the nuances around virtual packages and providers were not present
in my initial tests, which were also missing the entire concept of the i:
field. APK gives an explanation of which packages conflict, so I would read
their entries in the APKINDEX
file to understand why that happened before
adding the relevant constraints.
Conclusion
Overall, I was able to get APK to commit to the following package counts in Alpine 3.20, compared to the total number of available packages:
main
: 5441/5525 (98.5%)main
+community
: 23644/24165 (97.8%)
I’ve posted my (terrible, ramshackle) code and the final world files I came up
with to this
gist. As
packages update in Alpine, these world files will drift out of date and become
uninstallable, so you may have to regenerate them yourself from the latest
APKINDEX
files.
Replicating all of the behaviour of APK is non-trivial, especially if, like me, you avoid looking at the source code and guess what it’s doing based on the conflicts you’re seeing. It’s absolutely possible I’ve been too harsh with my constraints or bungled some entirely, and that you can install more packages. If you’re able to improve upon these results, or do a similar thing for another operating system, I’d love to hear about it! I only humbly request that you don’t cause unnecessary burden on freely provided infrastructure.