External Dependencies

This chapter is based on a PR for pypi2nix. Got there if you want to learn about the intial implementation of external dependency detection.

Goal

The goal is to have a system that helps the user to deal with external dependencies. Currently the user has to know by heart (or trial and error) that e.g. lxml needs libxml2 and libxslt. We want to automate that for the user, at least for the most common packages.

The idea is to have a system that you pass your initial requirements and as a result you get a list of the necessary external dependencies. This will have synergy with implementing automatic setup requirement detection.

Mechanism

Every pypi2nix invocation has with it associated a set of requirements. This is usually the set of requirements that the user has for their project. To find out what kind of external dependencies are necessary to build the requested packages we need to solve two problems:

  1. Find out all the dependencies of the specified/requested packages from the user without building them.
  2. For all of these dependencies find out if and what external dependencies are required without building all the packages.

We have to know all required external dependencies in advance since restarting the whole build just because a dependency was detected us unacceptable. For some users with slower or older hardware even one single build might take more than 10 minutes for larger package sets. If the build for restart several times it would render pypi2nix unusable to those users.

That means that we have to have a place where dependency information can be collected and used by pypi2nix. The information about the dependencies is basically a directed acyclic graph since this implementation will not support circular dependencies (for now).

For collecting information about external dependencies we will rely on users reporting such external dependencies for know. Developing a tool to detect external dependencies from build output is out of scope of this PR.

The dependency graph for python dependencies can be generated by pypi2nix automatically. This PR will include a mechanism to generated dependency graphs in the right data format for ingestion by pypi2nix.

Infrastructure

We need way to distribute a dependency tree to users. This will make the detection mechanism much more useful since it frees users from maintaining fresh set of dependencies. We want to implement a similar mechanism as with pypi2nix-overrides. This means that we a have a semi-central git repository that contains all the detected dependencies. Since git is content addressable we can ensure reproducible builds. Our security model with that approach is Trust on First Use.

Data format

In order to minimize necessary labor we have to automate the generation of dependencies as much as possible. This means that we need to have data format that allows seamless merging of generated and curated dependency trees. Also we should use a data format that is easy to edit by humans and machines alike. A suitable candidate would be the yaml format. This would allow us to provide json schemas for the data format to allow for effective reuse of the data. A concern might be that the volume of the data makes a compact data file to large to download. If in the future we run into traffic or performance problems we might consider implementing a web API. Already using json schemas would make that transition easy as we could leverage OpenAPI 3.0 to make the data format and the API accessible to many people.