External Dependencies¶
This chapter is based on a PR for pypi2nix. Got there if you want to learn about the intial implementation of external dependency detection.
Goal¶
The goal is to have a system that helps the user to deal with external
dependencies. Currently the user has to know by heart (or trial and
error) that e.g. lxml
needs libxml2
and libxslt
. We want
to automate that for the user, at least for the most common packages.
The idea is to have a system that you pass your initial requirements and as a result you get a list of the necessary external dependencies. This will have synergy with implementing automatic setup requirement detection.
Mechanism¶
Every pypi2nix
invocation has with it associated a set of
requirements. This is usually the set of requirements that the user has
for their project. To find out what kind of external dependencies are
necessary to build the requested packages we need to solve two problems:
- Find out all the dependencies of the specified/requested packages from the user without building them.
- For all of these dependencies find out if and what external dependencies are required without building all the packages.
We have to know all required external dependencies in advance since
restarting the whole build just because a dependency was detected us
unacceptable. For some users with slower or older hardware even one
single build might take more than 10 minutes for larger package sets. If
the build for restart several times it would render pypi2nix
unusable to those users.
That means that we have to have a place where dependency information can be collected and used by pypi2nix. The information about the dependencies is basically a directed acyclic graph since this implementation will not support circular dependencies (for now).
For collecting information about external dependencies we will rely on users reporting such external dependencies for know. Developing a tool to detect external dependencies from build output is out of scope of this PR.
The dependency graph for python dependencies can be generated by
pypi2nix
automatically. This PR will include a mechanism to
generated dependency graphs in the right data format for ingestion by
pypi2nix
.
Infrastructure¶
We need way to distribute a dependency tree to users. This will make the detection mechanism much more useful since it frees users from maintaining fresh set of dependencies. We want to implement a similar mechanism as with pypi2nix-overrides. This means that we a have a semi-central git repository that contains all the detected dependencies. Since git is content addressable we can ensure reproducible builds. Our security model with that approach is Trust on First Use.
Data format¶
In order to minimize necessary labor we have to automate the generation of dependencies as much as possible. This means that we need to have data format that allows seamless merging of generated and curated dependency trees. Also we should use a data format that is easy to edit by humans and machines alike. A suitable candidate would be the yaml format. This would allow us to provide json schemas for the data format to allow for effective reuse of the data. A concern might be that the volume of the data makes a compact data file to large to download. If in the future we run into traffic or performance problems we might consider implementing a web API. Already using json schemas would make that transition easy as we could leverage OpenAPI 3.0 to make the data format and the API accessible to many people.