How many Alpine packages can you install at once?

It’s annoying to try and run a command only to discover it hasn’t been installed. It’s even more annoying to build an entire container and miss some crucial dependency your app needs. Wouldn’t it be more convenient if your machine already had all of the software you could ever need installed ahead of time?

Thankfully, Linux distributions include a large selection of software in their repositories. However, we can’t just install everything, as a package may conflict with another, meaning they can’t both be installed. Additionally, we have to ensure all of the dependencies of each package are also installed, and that they don’t introduce any conflicts.

This all seems rather complicated, and indeed this problem is NP-Hard. Package managers include a constraint solver to determine what packages should be installed to bring the system to the desired state. We can also use a constraint solver, but instead ask it to maximise the number of installed packages subject to the dependency and conflict constraints.

I’ve chosen to target Alpine Linux for this experiment, as the format of its repositories is easy to parse, and I enjoy using it. As Alpine is commonly used as a basis for containers, this also suggests the terrifying possibility of using this as a base image—very much the opposite of FROM scratch.

Parsing APK Repositories

The first step is to load a description of the package repository. Alpine has two primary repositories: main and community. For each of these, we can download an APKINDEX file that gives us information about every package in the repository. The community repository expects to be used alongside main, so we’ll also need to consider the two together.

The format of APKINDEX is straightforward. Packages are separated by blank lines, and each line begins with a single character code describing how to interpret the value on this line. For this experiment, we need to parse the following properties for each package:

I haven’t explained the last two yet, and the concept of conflicts seems conspicuously absent. Saying a package conflicts with another is the same as saying it depends on the other package not being installed, which APK can represent as !another_package.

A package may provide any number of names, optionally with corresponding versions. If the package is installed, APK can then treat it as equivalent to these provided packages being installed. Packages can provide a name without a version: this is called a virtual package. Multiple packages providing the same virtual package can be installed at once, but only one package can provide a non-virtual package at a time. This system allows packages to depend on any of a number of semantically equivalent packages being installed.

Finally, a package can ask to be installed if a number of other packages are also installed. This is useful when another package augments an existing one, or connects two packages’ functionality somehow. I didn’t initially include this field, but from testing it seems that APK will always follow it.

Creating Constraints

With the APK repositories parsed, we can turn our attention to formulating a set of constraints to enforce the relationships between the packages. Additionally, we need an objective function: a goal for the constraint solver to maximise, which in our case is the total of number of packages installed.

To understand how APK interprets packages, I found the APKBUILD reference and Apk spec pages very helpful. The most complicated part, and the part I skimped on the most, was implementing version comparisons. APK supports requiring that a dependency is an exact version, or an exact version ignoring the final part of the version describing the package rather than the software it contains. Additionally, it can constrain dependencies based on inequalities: any of less-than-or-equal-to, less-than, greater-than, or greater-than-or-equal-to. Package versions can be in any number of formats, and from what I can see APK has a dedicated algorithm to parse and compare them, including support for things like alpha and beta versions. I took as simple an approach as I could get away with: converting all sets of consecutive digits in the version string to integers, then comparing these with the earliest set of digits having most significance.

For this experiment, I used PuLP, which is a Mixed Integer Linear Programming (MILP) solver. Our constraints take the form of inequalities between a linear combination of variables and an integer. Additionally, we constrain our variables to be integers themselves. Specifically, we can constrain variables to be either 0 or 1, representing a Boolean value, such as whether a package is installed. I chose PuLP as it has a nice Python API and I’ve had good success with it in the past on problems that z3 has struggled with.

We create a binary variable for each package in the repository. I’ll denote these like xfirefox, which represents whether to install the package firefox. Note that there is a separate variable for each version of the package in the repository, but I omit that here for simplicity. The objective function we are trying to maximise is simply the sum of all of these variables, which is the total number of packages installed.

Conflicts

Now we need to express the package constraints such that PuLP can understand them. The simplest constraints are those for packages that conflict. I find it easiest to draw the boolean truth table we’re looking for, such as this for two conflicting packages A and B:

xAxBCorrect?
001
011
101
110

The only disallowed case is if both packages are installed. It is clear the boolean operation we’re looking for is NAND, but we can also express this with the constraint xA+xB1. We make packages conflict in a number of cases:

Dependencies

Next, we can tackle dependencies. The truth table for a package A that depends on package B is as follows:

xAxBCorrect?
001
011
100
111

Here the boolean operation is that of logical implication: xAxB. We can formulate this as a constraint xB-xA0. This constraint also generalises nicely to when package A depends on any of a set of packages 𝐗: a𝐗a-xA0. We can only subtract one after the sum, so as long as at least one of the dependencies is installed, the total cannot be negative.

Install If

Finally, we must handle the i: field. I was unable to come up with a simple constraint to express that a package must be installed if all of the packages listed in the field are satisfied by at least one package, so I turned to a classic constraint embedding trick: auxiliary variables. We break the constraint into multiple simpler constraints, with new variables representing intermediate steps. These variables are auxiliary because we don’t directly care about their value; they don’t correspond to whether a particular package should be installed.

Specifically, for each entry in the i: field, we introduce a variable representing whether that entry has been satisfied by a package. In essence, we make each package that could fulfil it depend on this variable. However, we must add an additional constraint. With our package dependency example above, with A depending on B, it is valid for B to be installed and A to not be. Similarly, it would be possible for this auxiliary variable to be true if none of its providers are installed. We need to ensure that this auxiliary variable is zero if no package is installed that fulfils it. We do this by making it depend on at least one of the packages that fulfils it, just as in the example with A and 𝐗 above.

Additionally, we introduce an auxiliary variable representing whether every entry in the i: field has been satisfied. We make this variable depend on all the variables we introduced above. As before, we need an extra constraint to make sure it is true if all of the sub-variables are true. Representing the sub-variables as the set 𝐗 and the final variable as y I achieved this with the constraint a𝐗a-y|𝐱|1. If every variable in 𝐗 is true, y must be true to bring the total low enough. To complete these constraints, we make this final variable depend on the package being installed.

This was by far the most complicated set of constraints to write and debug. Either there is a simpler way of doing this, or they illustrate that a different solver or an automated way of creating constraints from boolean logic would be more convenient.

Solving

After setting up all of these constraints, we turn our problem over to the solver and wait. On my laptop, solving them takes around two seconds for main alone, and two minutes for main and community. The choice of solver and constraints can make a huge difference about how long a problem takes to solve, so it may be possible to make large improvements to the performance.

One interesting thing I stumbled across was the existence of packages in the repository that were impossible to install. For example, in the APKINDEX file I downloaded, the package aconf-mod-network depends on network, but no such package is provided. Indeed, APK is unable to install this package. This issue has been fixed on edge.

Testing the World Files

After the solver finishes, we can emit a world file, describing the exact version of every package we want to install. If we have set up our constraints right to mirror APK, no packages that aren’t listed in this world file should need to be installed.

We can copy our world file to /etc/apk/world and run apk fix --simulate to test it out. This checks that the world isn’t broken without having to download all of the packages—saving both bandwidth and disk space.

It took a number of iterations before APK was happy with my world files. The majority of the nuances around virtual packages and providers were not present in my initial tests, which were also missing the entire concept of the i: field. APK gives an explanation of which packages conflict, so I would read their entries in the APKINDEX file to understand why that happened before adding the relevant constraints.

Conclusion

Overall, I was able to get APK to commit to the following package counts in Alpine 3.20, compared to the total number of available packages:

I’ve posted my (terrible, ramshackle) code and the final world files I came up with to this gist. As packages update in Alpine, these world files will drift out of date and become uninstallable, so you may have to regenerate them yourself from the latest APKINDEX files.

Replicating all of the behaviour of APK is non-trivial, especially if, like me, you avoid looking at the source code and guess what it’s doing based on the conflicts you’re seeing. It’s absolutely possible I’ve been too harsh with my constraints or bungled some entirely, and that you can install more packages. If you’re able to improve upon these results, or do a similar thing for another operating system, I’d love to hear about it! I only humbly request that you don’t cause unnecessary burden on freely provided infrastructure.