Preparing a consistent Python environment
-
Building QEMU is a complex task, split across several programs. the
configure
script finds the host and cross compilers that are needed to build emulators and firmware; Meson prepares the build environment for the emulators; finally, Make and Ninja actually perform the build, and in some cases they run tests as well.In addition to compiling C code, many build steps run tools and scripts which are mostly written in the Python language. These include processing the emulator configuration, code generators for tracepoints and QAPI, extensions for the Sphinx documentation tool, and the Avocado testing framework. The Meson build system itself is written in Python, too.
Some of these tools are run through the
python3
executable, while others are invoked directly assphinx-build
ormeson
, and this can create inconsistencies. For example, QEMU’sconfigure
script checks for a minimum version of Python and rejects too-old interpreters. However, what would happen if code run by Sphinx used a different version?This situation has been largely hypothetical until recently; QEMU’s Python code is already tested with a wide range of versions of the interpreter, and it would not be a huge issue if Sphinx used a different version of Python as long as both of them were supported. This will change in version 8.1 of QEMU, which will bump the minimum supported version of Python from 3.6 to 3.8. While all the distros that QEMU supports have a recent-enough interpreter, the default on RHEL8 and SLES15 is still version 3.6, and that is what all binaries in
/usr/bin
use unconditionally.As of QEMU 8.0, even if
configure
is told to use/usr/bin/python3.8
for the build, QEMU’s custom Sphinx extensions would still run under Python 3.6. configure does separately check that Sphinx is executing with a new enough Python version, but it would be nice if there were a more generic way to prepare a consistent Python environment.This post will explain how QEMU 8.1 will ensure that a single interpreter is used for the whole of the build process. Getting there will require some familiarity with Python packaging, so let’s start with virtual environments.
Virtual environments
It is surprisingly hard to find what Python interpreter a given script will use. You can try to parse the first line of the script, which will be something like
#! /usr/bin/python3
, but there is no guarantee of success. For example, on some version of Homebrew/usr/bin/meson
will be a wrapper script like:#!/bin/bash PYTHONPATH="/usr/local/Cellar/meson/0.55.0/lib/python3.8/site-packages" \ exec "/usr/local/Cellar/meson/0.55.0/libexec/bin/meson" "$@"
The file with the Python shebang line will be hidden somewhere in
/usr/local/Cellar
. Therefore, performing some kind of check on the files in/usr/bin
is ruled out. QEMU needs to set up a consistent environment on its own.If a user who is building QEMU wanted to do so, the simplest way would be to use Python virtual environments. A virtual environment takes an existing Python installation but gives it a local set of Python packages. It also has its own
bin
directory; place it at the beginning of yourPATH
and you will be able to control the Python interpreter for scripts that begin with#! /usr/bin/env python3
.Furthermore, when packages are installed into the virtual environment with
pip
, they always refer to the Python interpreter that was used to create the environment. Virtual environments mostly solve the consistency problem at the cost of an extrapip install
step to put QEMU’s build dependencies into the environment.Unfortunately, this extra step has a substantial downside. Even though the virtual environment can optionally refer to the base installation’s installed packages,
pip
will always install packages from scratch into the virtual environment. For all Linux distributions except RHEL8 and SLES15 this is unnecessary, and users would be happy to build QEMU using the versions of Meson and Sphinx included in the distribution.Even worse,
pip install
will access the Python package index (PyPI) over the Internet, which is often impossible on build machines that are sealed from the outside world. Automated installation of PyPI dependencies may actually be a welcome feature, but it must also remain strictly optional.In other words, the ideal solution would use a non-isolated virtual environment, to be able to use system packages provided by Linux distributions; but it would also ensure that scripts (
sphinx-build
,meson
,avocado
) are placed intobin
just likepip install
does.Distribution packages
When it comes to packages, Python surely makes an effort to be confusing. The fundamental unit for importing code into a Python program is called a package; for example
os
andsys
are two examples of a package. However, a program or library that is distributed on PyPI consists of many such “import packages”: that’s because whilepip
is usually said to be a “package installer” for Python, more precisely it installs “distribution packages”.To add to the confusion, the term “distribution package” is often shortened to either “package” or “distribution”. And finally, the metadata of the distribution package remains available even after installation, so “distributions” include things that are already installed (and are not being distributed anywhere).
All this matters because distribution metadata will be the key to building the perfect virtual environment. If you look at the content of
bin/meson
in a virtual environment, after installing the package withpip
, this is what you find:#!/home/pbonzini/my-venv/bin/python3 # -*- coding: utf-8 -*- import re import sys from mesonbuild.mesonmain import main if __name__ == '__main__': sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) sys.exit(main())
This looks a lot like automatically generated code, and in fact it is; the only parts that vary are the
from mesonbuild.mesonmain import main
import, and the invocation of themain()
function on the last line.pip
creates this invocation script based on thesetup.cfg
file in Meson’s source code, more specifically based on the following stanza:[options.entry_points] console_scripts = meson = mesonbuild.mesonmain:main
Similar declarations exist in Sphinx, Avocado and so on, and accessing their content is easy via
importlib.metadata
(available in Python 3.8+):$ python3 >>> from importlib.metadata import distribution >>> distribution('meson').entry_points [EntryPoint(name='meson', value='mesonbuild.mesonmain:main', group='console_scripts')]
importlib
looks up the metadata in the running Python interpreter’s search path; if Meson is installed under another interpreter’ssite-packages
directory, it will not be found:$ python3.8 >>> from importlib.metadata import distribution >>> distribution('meson').entry_points Traceback (most recent call last): ... importlib.metadata.PackageNotFoundError: meson
So finally we have a plan!
configure
can build a non-isolated virtual environment, useimportlib
to check that the required packages exist in the base installation, and create scripts inbin
that point to the right Python interpreter. Then, it can optionally usepip install
to install the missing packages.While this process includes a certain amount of specialized logic, Python provides a customizable
venv
module to create virtual environments. The custom steps can be performed by subclassingvenv.EnvBuilder
.This will provide the same experience as QEMU 8.0, except that there will be no need for the
--meson
and--sphinx-build
options to theconfigure
script. The path to the Python interpreter is enough to set up all Python programs used during the build.There is only one thing left to fix…
Nesting virtual environments
Remember how we started with a user that creates her own virtual environment before building QEMU? Well, this would not work anymore, because virtual environments cannot be nested. As soon as
configure
creates its own virtual environment, the packages installed by the user are not available anymore.Fortunately, the “appearance” of a nested virtual environment is easy to emulate. Detecting whether
python3
runs in a virtual environment is as easy as checkingsys.prefix != sys.base_prefix
; if it is, we need to retrieve the parent virtual environmentssite-packages
directory:>>> import sysconfig >>> sysconfig.get_path('purelib') '/home/pbonzini/my-venv/lib/python3.11/site-packages'
and write it to a
.pth
file in thelib
directory of the new virtual environment. The following demo shows how a distribution package in the parent virtual environment will be available in the child as well:A small detail is that
configure
’s new virtual environment should mirror the isolation setting of the parent. An isolated venv can be detected becausesys.base_prefix in site.PREFIXES
is false.Conclusion
Right now, QEMU only makes a minimal attempt at ensuring consistency of the Python environment; Meson is always run using the interpreter that was passed to the configure script with
--python
or$PYTHON
, but that’s it. Once the above technique will be implemented in QEMU 8.1, there will be no difference in the build experience, but configuration will be easier and a wider set of invalid build environments will be detected. We will merge these checks before dropping support for Python 3.6, so that users on older enterprise distributions will have a smooth transition.
/2023/03/24/python/
© Lightnetics 2024