QEMU News

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you may not be able to execute some actions.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

QEMU News

QEMU version 9.0.0 released
• sorangutan

1

1
Posts

16
Views
We’d like to announce the availability of the QEMU 9.0.0 release. This release contains 2700+ commits from 220 authors.

You can grab the tarball from our download page. The full list of changes are available in the changelog.

Highlights include:
- block: virtio-blk now supports multiqueue where different queues of a single disk can be processed by different I/O threads
- gdbstub: various improvements such as catching syscalls in user-mode, support for fork-follow modes, and support for siginfo:read
- memory: preallocation of memory backends can now be handled concurrently using multiple threads in some cases
- migration: support for “mapped-ram” capability allowing for more efficient VM snapshots, improved support for zero-page detection, and checkpoint-restart support for VFIO
- ARM: architectural feature support for ECV (Enhanced Counter Virtualization), NV (Nested Virtualization), and NV2 (Enhanced Nested Virtualization)
- ARM: board support for B-L475E-IOT01A IoT node, mp3-an536 (MPS3 dev board + AN536 firmware), and raspi4b (Raspberry Pi 4 Model B)
- ARM: additional IO/disk/USB/SPI/ethernet controller and timer support for Freescale i.MX6, Allwinner R40, Banana Pi, npcm7xxx, and virt boards
- HPPA: numerous bug fixes and SeaBIOS-hppa firmware updated to version 16
- LoongArch: KVM acceleration support, including LSX/LASX vector extensions
- RISC-V: ISA/extension support for Zacas, amocas, RVA22 profiles, Zaamo, Zalrsc, Ztso, and more
- RISC-V: SMBIOS support for RISC-V virt machine, ACPI support for SRAT, SLIT, AIA, PLIC and updated RHCT table support, and numerous fixes
- s390x: Emulation support for CVDG, CVB, CVBY and CVBG instructions, and fixes for LAE (Load Address Extended) emulation
- and lots more…
Thank you to everybody who contributed to this release, whether that was by writing code, reporting bugs, improving documentation, testing, or providing the project with CI resources. We couldn’t do these without you!

/2024/04/23/qemu-9-0-0/
QEMU version 8.2.0 released
• sorangutan

1

1
Posts

20
Views
We’d like to announce the availability of the QEMU 8.2.0 release. This release contains 3200+ commits from 238 authors.

You can grab the tarball from our download page. The full list of changes are available in the changelog.

Highlights include:
- New virtio-sound device emulation
- New virtio-gpu rutabaga device emulation used by Android emulator
- New hv-balloon for dynamic memory protocol device for Hyper-V guests
- New Universal Flash Storage device emulation
- Network Block Device (NBD) 64-bit offsets for improved performance
- dump-guest-memory now supports the standard kdump format
- ARM: Xilinx Versal board now models the CFU/CFI, and the TRNG device
- ARM: CPU emulation support for cortex-a710 and neoverse-n2
- ARM: architectural feature support for PACQARMA3, EPAC, Pauth2, FPAC, FPACCOMBINE, TIDCP1, MOPS, HBC, and HPMN0
- HPPA: CPU emulation support for 64-bit PA-RISC 2.0
- HPPA: machine emulation support for C3700, including Astro memory controller and four Elroy PCI bridges
- LoongArch: ISA support for LASX extension and PRELDX instruction
- LoongArch: CPU emulation support for la132
- RISC-V: ISA/extension support for AIA virtualization support via KVM, and vector cryptographic instructions
- RISC-V: Numerous extension/instruction cleanups, fixes, and reworks
- s390x: support for vfio-ap passthrough of crypto adapter for protected virtualization guests
- Tricore: support for TC37x CPU which implements ISA v1.6.2
- Tricore: support for CRCN, FTOU, FTOHP, and HPTOF instructions
- x86: Zen support for PV console and network devices
- and lots more…
Thank you to everybody who contributed to this release, whether that was by writing code, reporting bugs, improving documentation, testing, or providing the project with CI resources. We couldn’t do these without you!

/2023/12/20/qemu-8-2-0/
QEMU version 8.1.0 released
• sorangutan

1

1
Posts

25
Views
We’d like to announce the availability of the QEMU 8.1.0 release. This release contains 2900+ commits from 250 authors.

You can grab the tarball from our download page. The full list of changes are available in the changelog.

Highlights include:
- VFIO: improved live migration support, no longer an experimental feature
- GTK GUI now supports multi-touch events
- ARM, PowerPC, and RISC-V can now use AES acceleration on host processor
- PCIe: new QMP commands to inject CXL General Media events, DRAM events and Memory Module events
- ARM: KVM VMs on a host which supports MTE (the Memory Tagging Extension) can now use MTE in the guest
- ARM: emulation support for bpim2u (Banana Pi BPI-M2 Ultra) board and neoverse-v1 (Cortex Neoverse-V1) CPU
- ARM: new architectural feature support for: FEAT_PAN3 (SCTLR_ELx.EPAN), FEAT_LSE2 (Large System Extensions v2), and experimental support for FEAT_RME (Realm Management Extensions)
- Hexagon: new instruction support for v68/v73 scalar, and v68/v69 HVX
- Hexagon: gdbstub support for HVX
- MIPS: emulation support for Ingenic XBurstR1/XBurstR2 CPUs, and MXU instructions
- PowerPC: TCG SMT support, allowing pseries and powernv to run with up to 8 threads per core
- PowerPC: emulation support for Power9 DD2.2 CPU model, and perf sampling support for POWER CPUs
- RISC-V: ISA extension support for BF16/Zfa, and disassembly support for Zcm/Zinx/XVentanaCondOps/Xthead
- RISC-V: CPU emulation support for Veyron V1
- RISC-V: numerous KVM/emulation fixes and enhancements
- s390: instruction emulation fixes for LDER, LCBB, LOCFHR, MXDB, MXDBR, EPSW, MDEB, MDEBR, MVCRL, LRA, CKSM, CLM, ICM, MC, STIDP, EXECUTE, and CLGEBR(A)
- SPARC: updated target/sparc to use tcg_gen_lookup_and_goto_ptr() for improved performance
- Tricore: emulation support for TC37x CPU that supports ISA v1.6.2 instructions
- Tricore: instruction emulation of POPCNT.W, LHA, CRC32L.W, CRC32.B, SHUFFLE, SYSCALL, and DISABLE
- x86: CPU model support for GraniteRapids
- and lots more…
Thank you to everybody who contributed to this release, whether that was by writing code, reporting bugs, improving documentation, testing, or providing the project with CI resources. We couldn’t do these without you!

/2023/08/22/qemu-8-1-0/
QEMU version 8.0.0 released
• sorangutan

1

1
Posts

34
Views
We’d like to announce the availability of the QEMU 8.0.0 release. This release contains 2800+ commits from 238 authors.

You can grab the tarball from our download page. The full list of changes are available in the changelog.

Highlights include:
- ARM: emulation support for FEAT_EVT, FEAT_FGT, and AArch32 ARMv8-R
- ARM: CPU emulation for Cortex-A55 and Cortex-R52, and new Olimex STM32 H405 machine type
- ARM: gdbstub support for M-profile system registers
- HPPA: fid (Floating-Point Identify) instruction support and 32-bit emulation improvements
- RISC-V: additional ISA and Extension support for smstateen, native debug icount trigger, cache-related PMU events in virtual mode, Zawrs/Svadu/T-Head/Zicond extensions, and ACPI support
- RISC-V: updated machine support for OpenTitan, PolarFire, and OpenSBI
- RISC-V: wide ranges of fixes covering PMP propagation for TLB, mret exceptions, uncompressed instructions, and other emulation/virtualization improvements
- s390x: improved zPCI passthrough device handling
- s390x: support for asynchronous teardown of memory of secure KVM guests during reboot
- x86: support for Xen guests under KVM with Linux v5.12+
- x86: new SapphireRapids CPU model
- x86: TCG support for FSRM, FZRM, FSRS, and FSRC CPUID flags
- virtio-mem: support for using preallocation in conjunction with live migration
- VFIO: experimental migration support updated to v2 VFIO migration protocol
- qemu-nbd: improved efficient over TCP and when using TLS
- and lots more…
Thank you to everybody who contributed to this release, whether that was by writing code, reporting bugs, improving documentation, testing, or providing the project with CI resources. We couldn’t do these without you!

/2023/04/20/qemu-8-0-0/
Preparing a consistent Python environment
• sorangutan

1

1
Posts

43
Views
Building QEMU is a complex task, split across several programs. the configure script finds the host and cross compilers that are needed to build emulators and firmware; Meson prepares the build environment for the emulators; finally, Make and Ninja actually perform the build, and in some cases they run tests as well.

In addition to compiling C code, many build steps run tools and scripts which are mostly written in the Python language. These include processing the emulator configuration, code generators for tracepoints and QAPI, extensions for the Sphinx documentation tool, and the Avocado testing framework. The Meson build system itself is written in Python, too.

Some of these tools are run through the python3 executable, while others are invoked directly as sphinx-build or meson, and this can create inconsistencies. For example, QEMU’s configure script checks for a minimum version of Python and rejects too-old interpreters. However, what would happen if code run by Sphinx used a different version?

This situation has been largely hypothetical until recently; QEMU’s Python code is already tested with a wide range of versions of the interpreter, and it would not be a huge issue if Sphinx used a different version of Python as long as both of them were supported. This will change in version 8.1 of QEMU, which will bump the minimum supported version of Python from 3.6 to 3.8. While all the distros that QEMU supports have a recent-enough interpreter, the default on RHEL8 and SLES15 is still version 3.6, and that is what all binaries in /usr/bin use unconditionally.

As of QEMU 8.0, even if configure is told to use /usr/bin/python3.8 for the build, QEMU’s custom Sphinx extensions would still run under Python 3.6. configure does separately check that Sphinx is executing with a new enough Python version, but it would be nice if there were a more generic way to prepare a consistent Python environment.

This post will explain how QEMU 8.1 will ensure that a single interpreter is used for the whole of the build process. Getting there will require some familiarity with Python packaging, so let’s start with virtual environments.

Virtual environments

It is surprisingly hard to find what Python interpreter a given script will use. You can try to parse the first line of the script, which will be something like #! /usr/bin/python3, but there is no guarantee of success. For example, on some version of Homebrew /usr/bin/meson will be a wrapper script like:
```
#!/bin/bash
PYTHONPATH="/usr/local/Cellar/meson/0.55.0/lib/python3.8/site-packages" \
  exec "/usr/local/Cellar/meson/0.55.0/libexec/bin/meson" "$@"
```
The file with the Python shebang line will be hidden somewhere in /usr/local/Cellar. Therefore, performing some kind of check on the files in /usr/bin is ruled out. QEMU needs to set up a consistent environment on its own.

If a user who is building QEMU wanted to do so, the simplest way would be to use Python virtual environments. A virtual environment takes an existing Python installation but gives it a local set of Python packages. It also has its own bin directory; place it at the beginning of your PATH and you will be able to control the Python interpreter for scripts that begin with #! /usr/bin/env python3.

Furthermore, when packages are installed into the virtual environment with pip, they always refer to the Python interpreter that was used to create the environment. Virtual environments mostly solve the consistency problem at the cost of an extra pip install step to put QEMU’s build dependencies into the environment.

Unfortunately, this extra step has a substantial downside. Even though the virtual environment can optionally refer to the base installation’s installed packages, pip will always install packages from scratch into the virtual environment. For all Linux distributions except RHEL8 and SLES15 this is unnecessary, and users would be happy to build QEMU using the versions of Meson and Sphinx included in the distribution.

Even worse, pip install will access the Python package index (PyPI) over the Internet, which is often impossible on build machines that are sealed from the outside world. Automated installation of PyPI dependencies may actually be a welcome feature, but it must also remain strictly optional.

In other words, the ideal solution would use a non-isolated virtual environment, to be able to use system packages provided by Linux distributions; but it would also ensure that scripts (sphinx-build, meson, avocado) are placed into bin just like pip install does.

Distribution packages

When it comes to packages, Python surely makes an effort to be confusing. The fundamental unit for importing code into a Python program is called a package; for example os and sys are two examples of a package. However, a program or library that is distributed on PyPI consists of many such “import packages”: that’s because while pip is usually said to be a “package installer” for Python, more precisely it installs “distribution packages”.

To add to the confusion, the term “distribution package” is often shortened to either “package” or “distribution”. And finally, the metadata of the distribution package remains available even after installation, so “distributions” include things that are already installed (and are not being distributed anywhere).

All this matters because distribution metadata will be the key to building the perfect virtual environment. If you look at the content of bin/meson in a virtual environment, after installing the package with pip, this is what you find:
```
#!/home/pbonzini/my-venv/bin/python3
# -*- coding: utf-8 -*-
import re
import sys
from mesonbuild.mesonmain import main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())
```
This looks a lot like automatically generated code, and in fact it is; the only parts that vary are the from mesonbuild.mesonmain import main import, and the invocation of the main() function on the last line. pip creates this invocation script based on the setup.cfg file in Meson’s source code, more specifically based on the following stanza:
```
[options.entry_points]
console_scripts =
  meson = mesonbuild.mesonmain:main
```
Similar declarations exist in Sphinx, Avocado and so on, and accessing their content is easy via importlib.metadata (available in Python 3.8+):
```
$ python3
>>> from importlib.metadata import distribution
>>> distribution('meson').entry_points
[EntryPoint(name='meson', value='mesonbuild.mesonmain:main', group='console_scripts')]
```
importlib looks up the metadata in the running Python interpreter’s search path; if Meson is installed under another interpreter’s site-packages directory, it will not be found:
```
$ python3.8
>>> from importlib.metadata import distribution
>>> distribution('meson').entry_points
Traceback (most recent call last):
...
importlib.metadata.PackageNotFoundError: meson
```
So finally we have a plan! configure can build a non-isolated virtual environment, use importlib to check that the required packages exist in the base installation, and create scripts in bin that point to the right Python interpreter. Then, it can optionally use pip install to install the missing packages.

While this process includes a certain amount of specialized logic, Python provides a customizable venv module to create virtual environments. The custom steps can be performed by subclassing venv.EnvBuilder.

This will provide the same experience as QEMU 8.0, except that there will be no need for the --meson and --sphinx-build options to the configure script. The path to the Python interpreter is enough to set up all Python programs used during the build.

There is only one thing left to fix…

Nesting virtual environments

Remember how we started with a user that creates her own virtual environment before building QEMU? Well, this would not work anymore, because virtual environments cannot be nested. As soon as configure creates its own virtual environment, the packages installed by the user are not available anymore.

Fortunately, the “appearance” of a nested virtual environment is easy to emulate. Detecting whether python3 runs in a virtual environment is as easy as checking sys.prefix != sys.base_prefix; if it is, we need to retrieve the parent virtual environments site-packages directory:
```
>>> import sysconfig
>>> sysconfig.get_path('purelib')
'/home/pbonzini/my-venv/lib/python3.11/site-packages'
```
and write it to a .pth file in the lib directory of the new virtual environment. The following demo shows how a distribution package in the parent virtual environment will be available in the child as well:

A small detail is that configure’s new virtual environment should mirror the isolation setting of the parent. An isolated venv can be detected because sys.base_prefix in site.PREFIXES is false.

Conclusion

Right now, QEMU only makes a minimal attempt at ensuring consistency of the Python environment; Meson is always run using the interpreter that was passed to the configure script with --python or $PYTHON, but that’s it. Once the above technique will be implemented in QEMU 8.1, there will be no difference in the build experience, but configuration will be easier and a wider set of invalid build environments will be detected. We will merge these checks before dropping support for Python 3.6, so that users on older enterprise distributions will have a smooth transition.

/2023/03/24/python/
KVM Forum 2023: Call for presentations
• sorangutan

1

1
Posts

24
Views
KVM Forum is an annual event that presents a rare opportunity for KVM and QEMU developers and users to discuss the state of Linux virtualization technology and plan for the challenges ahead. Sessions include updates on the state of the KVM virtualization stack, planning for the future, and many opportunities for attendees to collaborate.

This year’s event will be held in Brno, Czech Republic on June 14-15, 2023. It will be in-person only and will be held right before the DevConf.CZ open source community conference.

June 14 will be at least partly dedicated to a hackathon or “day of BoFs”. This will provide time for people to get together and discuss strategic decisions, as well as other topics that are best solved within smaller groups.

Call for presentations

We encourage you to submit presentations via the KVM Forum CfP page. Suggested topics include:
- Scalability and Optimization
- Hardening and security
- Confidential computing
- Testing
- KVM and the Linux Kernel:
  
  New Features and Ports
  
  Device Passthrough: VFIO, mdev, vDPA
  
  Network Virtualization
  
  Virtio and vhost
- Virtual Machine Monitors and Management:
  
  VMM Implementation: APIs, Live Migration, Performance Tuning, etc.
  
  Multi-process VMMs: vhost-user, vfio-user, QEMU Storage Daemon
  
  QEMU without KVM: Hypervisor.framework and other hypervisors
  
  Managing KVM: Libvirt, KubeVirt, Kata Containers
- Emulation:
  
  New Devices, Boards and Architectures
  
  CPU Emulation and Binary Translation
The deadline for submitting presentations is April 2, 2023 - 11:59 PM PDT. Accepted speakers will be notified on April 17, 2023.

Attending KVM Forum

Admission to KVM Forum and DevConf.CZ is free. However, registration is required and the number of attendees is limited by the space available at the venue.

The DevConf.CZ program will feature technical talks on a variety of topics, including cloud and virtualization infrastructure—so make sure to register for DevConf.CZ as well if you would like to attend.

Both conferences are committed to fostering an open and welcoming environment for everybody. Participants are expected to abide by the Devconf.cz code of conduct and media policy.

/2023/03/08/kvm-forum-cfp/
Announcing QEMU Google Summer of Code and Outreachy 2023 internships
• sorangutan

1

1
Posts

22
Views

QEMU is participating in Google Summer of Code and Outreachy again this year! Google Summer of Code and Outreachy are open source internship programs that offer paid remote work opportunities for contributing to open source. Internships generally run May through August, so if you have time and want to experience open source development, read on to find out how you can apply.

Each intern is paired with one or more mentors, experienced QEMU contributors who support them during the internship. Code developed by the intern is submitted through the same open source development process that all QEMU contributions follow. This gives interns experience with contributing to open source software. Some interns then choose to pursue a career in open source software after completing their internship.

Find out if you are eligible

Information on who can apply is here for Google Summer of Code and here for Outreachy. Note that Outreachy initial applications ended on February 6th so only those who have been accepted into Outreachy can apply for QEMU Outreachy internships.

Select a project idea

Look through the the list of QEMU project ideas and see if there is something you are interested in working on. Once you have found a project idea you want to apply for, email the mentor for that project idea to ask any questions you may have and discuss the idea further.

Submit your application

You can apply for Google Summer of Code from March 20th to April 4th and apply for Outreachy from March 6th to April 3rd.

Good luck with your applications!

If you have questions about applying for QEMU GSoC or Outreachy, please email Stefan Hajnoczi or ask on the #qemu-gsoc IRC channel.

/2023/02/23/gsoc-outreachy-2023/
QEMU version 7.2.0 released
• sorangutan

1

1
Posts

26
Views
We’d like to announce the availability of the QEMU 7.2.0 release. This release contains 1800+ commits from 205 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:
- ARM: emulation support for the following CPU features: Enhanced Translation Synchronization, PMU Extensions v3.5, Guest Translation Granule size, Hardware management of access flag/dirty bit state, and Preventing EL0 access to halves of address maps
- ARM: emulation support for Cortex-A35 CPUs
- LoongArch: support for fw_cfg DMA functionality, memory hotplug, and TPM device emulation
- OpenRISC: support for multi-threaded TCG, stability improvements, and new ‘virt’ machine type for CI/device testing.
- RISC-V: ‘virt’ machine support for booting S-mode firmware from pflash, and general device tree improvements
- s390x: support for Message-Security-Assist Extension 5 (RNG via PRNO instruction), SHA-512 via KIMD/KLMD instructions, and enhanced zPCI interpretation support for KVM guests
- x86: TCG performance improvements, including SSE
- x86: TCG support for AVX, AVX2, F16C, FMA3, and VAES instructions
- x86: KVM support for “notify vmexit” mechanism to prevent processor bugs from hanging whole system
- LUKS block device headers are validated more strictly, creating LUKS images is supported on macOS
- Memory backends now support NUMA-awareness when preallocating memory
- and lots more…
Thank you to everyone involved!

/2022/12/14/qemu-7-2-0/
Introduction to Zoned Storage Emulation
• sorangutan

1

1
Posts

25
Views
This summer I worked on adding Zoned Block Device (ZBD) support to virtio-blk as part of the Outreachy internship program. QEMU hasn’t directly supported ZBDs before so this article explains how they work and why QEMU needed to be extended.

Zoned block devices

Zoned block devices (ZBDs) are divided into regions called zones that can only be written sequentially. By only allowing sequential writes, SSD write amplification can be reduced by eliminating the need for a Flash Translation Layer, and potentially lead to higher throughput and increased capacity. Providing a new storage software stack, zoned storage concepts are standardized as ZBC (SCSI standard), ZAC (ATA standard), and ZNS (NVMe). Meanwhile, the virtio protocol for block devices(virtio-blk) should also be aware of ZBDs instead of taking them as regular block devices. It should be able to pass such devices through to the guest. An overview of necessary work is as follows:
1. Virtio protocol: extend virtio-blk protocol with main zoned storage concept, Dmitry Fomichev
2. Linux: implement the virtio specification extensions, Dmitry Fomichev
3. QEMU: add zoned storage APIs to the block layer, Sam Li
4. QEMU: implement zoned storage support in virtio-blk emulation, Sam Li
Once the QEMU and Linux patches have been merged it will be possible to expose a virtio-blk ZBD to the guest like this:
```
-blockdev node-name=drive0,driver=zoned_host_device,filename=/path/to/zbd,cache.direct=on \
-device virtio-blk-pci,drive=drive0 \
```
And then we can perform zoned block commands on that device in the guest os.
```
# blkzone report /dev/vda
start: 0x000000000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000020000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000040000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000060000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000080000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x0000a0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x0000c0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x0000e0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
start: 0x000100000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000120000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000140000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
start: 0x000160000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
```
Zoned emulation

Currently, QEMU can support zoned devices by virtio-scsi or PCI device passthrough. It needs to specify the device type it is talking to. Whereas storage controller emulation uses block layer APIs instead of directly accessing disk images. Extending virtio-blk emulation avoids code duplication and simplify the support by hiding the device types under a unified zoned storage interface, simplifying VM deployment for different types of zoned devices. Virtio-blk can also be implemented in hardware. If those devices wish to follow the zoned storage model then the virtio-blk specification needs to natively support zoned storage. With such support, individual NVMe namespaces or anything that is a zoned Linux block device can be exposed to the guest without passing through a full device.

For zoned storage emulation, zoned storage APIs support three zoned models (conventional, host-managed, host-aware) , four zone management commands (Report Zone, Open Zone, Close Zone, Finish Zone), and Append Zone. The QEMU block layer has a BlockDriverState graph that propagates device information inside block layer. File-posix driver is the lowest level within the graph where zoned storage APIs reside.

After receiving the block driver states, Virtio-blk emulation recognizes zoned devices and sends the zoned feature bit to guest. Then the guest can see the zoned device in the host. When the guest executes zoned operations, virtio-blk driver issues corresponding requests that will be captured by viritio-blk device inside QEMU. Afterwards, virtio-blk device sends the requests to file-posix driver which will perform zoned operations using Linux ioctls.

Unlike zone management operations, Linux doesn’t have a user API to issue zone append requests to zoned devices from user space. With the help of write pointer emulation tracking locations of write pointer of each zone, QEMU block layer can perform append writes by modifying regular writes. Write pointer locks guarantee the execution of requests. Upon failure it must not update the write pointer location which is only got updated when the request is successfully finished.

Problems can always be solved with right mind and right tools. A good approach to avoid pitfalls of programs is test-driven. In the beginning, users like qemu-io commands utility can invoke new block layer APIs. Moving towards to guest, existing tools like blktests, zonefs-tools, and fio are introduced for broader testing. Depending on the size of the zoned device, some tests may take long enough time to finish. Besides, tracing is also a good tool for spotting bugs. QEMU tracking tools and blktrace monitors block layer IO, providing detailed information to analysis.

Starting the journey with open source

As a student interested in computer science, I am enthusiastic about making real applications and fortunate to find the opportunity in this summer. I have a wonderful experience with QEMU where I get chance to work with experienced engineers and meet peers sharing same interests. It is a good starting point for me to continue my search on storage systems and open source projects.

Public communication, reaching out to people and admitting to failures used to be hard for me. Those feelings had faded away as I put more effort to this project over time. For people may having the same trouble as me, it might be useful to focus on the tasks ahead of you instead of worrying about the consequences of rejections from others.

Finally, I would like to thank Stefan Hajnoczi, Damien Le Moal, Dmitry Fomichev, and Hannes Reinecke for mentoring me - they have guided me through this project with patience and expertise, when I hit obstacles on design or implementations, and introduced a fun and vibrant open source world for me. Also thank QEMU community and Outreachy for organizing this program.

Conclusion

The current status for this project is waiting for virtio specifications extension and Linux driver support patches got accepted. And the up-to-date patch series of zoned device support welcome any new comments.

The next step for zoned storage emulation in QEMU is to enable full zoned emulation through virtio-blk. Adding support on top of a regular file, it allows developers accessing a zoned device environment without real zoned storage hardwares. Furthermore, virtio-scsi may need to add full emulation support to complete the zoned storage picture in QEMU. QEMU NVMe ZNS emulation can also use new block layer APIs to attach real zoned storage if the emulation is used in production in future.

/2022/11/17/zoned-emulation/
QEMU version 7.1.0 released
• sorangutan

1

1
Posts

44
Views
We’d like to announce the availability of the QEMU 7.1.0 release. This release contains 2800+ commits from 238 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:
- Live migration: support for zero-copy-send on Linux
- QMP: new options for exporting NBD images with dirty bitmaps via ‘block-export-add’ command
- QMP: new ‘query-stats’ and ‘query-stats-schema’ commands for retrieving statistics from various QEMU subsystems
- QEMU guest agent: improved Solaris support, new commands ‘guest-get-diskstats’/’guest-get-cpustats’, ‘guest-get-disks’ now reports NVMe SMART information, and ‘guest-get-fsinfo’ now reports NVMe bus-type
- ARM: emulation support for new machine types: Aspeed AST1030 SoC, Qaulcomm, and fby35 (AST2600 / AST1030)
- ARM: emulation support for Cortex-A76 and Neoverse-N1 CPUs
- ARM: emulation support for Scalable Matrix Extensions, cache speculation control, RAS, and many other CPU extensions
- ARM: ‘virt’ board now supports emulation of GICv4.0
- HPPA: new SeaBIOS v6 firmware with support for PS/2 keyboard in boot menu when running with GTK UI, improved serial port emulation, and additional STI text fonts
- LoongArch: initial support for LoongArch64 architecture, Loongson 3A5000 multiprocessor SoC, and the Loongson 7A1000 host bridge
- MIPS: Nios2 board (-machine 10m50-ghrd) now support Vectored Interrupt Controller, shadow register sets, and improved exception handling
- OpenRISC: ‘or1k-sim’ machine now support 4 16550A UART serial devices instead of 1
- RISC-V: new ISA extensions with support for privileged spec version 1.12.0, software access to MIP SEIP, Sdtrig extension, vector extension improvements, native debug, PMU improvements, and many other features and miscellaneous fixes/improvements
- RISC-V: ‘virt’ board now supports TPM
- RISC-V: ‘OpenTitan’ board now supports Ibex SPI
- s390x: emulation support for s390x Vector-Enhancements Facility 2
- s390x: s390-ccw BIOS now supports booting from drives with non-512 sector sizes
- x86: virtualization support for architectural LBRs
- Xtensa: support for lx106 core and cache testing opcodes
- and lots more…
Thank you to everyone involved!

/2022/08/30/qemu-7-1-0/
QEMU version 7.0.0 released
• sorangutan

1

1
Posts

38
Views
We’d like to announce the availability of the QEMU 7.0.0 release. This release contains 2500+ commits from 225 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:
- ACPI: support for logging guest events via ACPI ERST interface
- virtiofs: improved security label support
- block: improved flexibility for fleecing backups, including support for non-qcow2 images
- ARM: ‘virt’ board support for virtio-mem-pci, specifying guest CPU topology, and enabling PAuth when using KVM/hvf
- ARM: ‘xlnx-versal-virt’ board support for PMC SLCR and emulating the OSPI flash memory controller
- ARM: ‘xlnx-zynqmp’ now models the CRF and APU control
- HPPA: support for up to 16 vCPUs, improved graphics driver for HP-UX VDE/CDE environments, setting SCSI boot order, and a number of other new features
- OpenRISC: ‘sim’ board support for up to 4 cores, loading an external initrd image, and automatically generating a device tree for the boot kernel
- PowerPC: ‘pseries’ emulation support for running guests as a nested KVM hypervisor, and new support for spapr-nvdimm device
- PowerPC: ‘powernv’ emulation improvements for XIVE and PHB 3/4, and new support for XIVE2 and PHB5
- RISC-V: support for KVM
- RISC-V: support for ratified 1.0 Vector extension, as well as Zve64f, Zve32f, Zfhmin, Zfh, zfinx, zdinx, and zhinx{min} extensions.
- RISC-V: ‘spike’ machine support for OpenSBI binary loading
- RISC-V: ‘virt’ machine support for 32 cores, and AIA support.
- s390x: support for “Miscellaneous-Instruction-Extensions Facility 3” (a z15 extension)
- x86: Support for Intel AMX
- and lots more…
Thank you to everyone involved!

/2022/04/19/qemu-7-0-0/
Apply for a QEMU Google Summer of Code internship
• sorangutan

1

1
Posts

45
Views

We have great news to share: QEMU has been accepted as a Google Summer of Code 2022 organization! Google Summer of Code is an open source internship program offering paid remote work opportunities for contributing to open source. The internship runs from June 13th to September 12th.

Now is the chance to get involved in QEMU development! The QEMU community has put together a list of project ideas here.

Google has dropped the requirement that you need to be enrolled in a higher education course. We’re excited to work with a wider range of contributors this year! For details on the new eligibility requirements, see here.

You can submit your application from April 4th to 19th.

GSoC interns work together with their mentors, experienced QEMU contributors who support their interns in their projects. Code developed during the internship is submitted through the same open source development process that all QEMU contributions follow. This gives interns experience with contributing to open source software. Some interns then choose to pursue a career in open source software after completing their internship.

If you have questions about applying for QEMU GSoC, please email Stefan Hajnoczi or ask on the #qemu-gsoc IRC channel.

/2022/03/07/gsoc-2022/
QEMU welcomes Outreachy internship applicants
• sorangutan

1

1
Posts

42
Views

QEMU is offering open source internships in Outreachy’s May-August 2022 round. You can submit your application until February 25th 2022 if you want to contribute to QEMU in a remote work internship this summer.

Outreachy internships are extended to people who are subject to systemic bias and underrepresentation in the technical industry where they are living. For details on applying, please see the Outreachy website. If you are not eligible, don’t worry, QEMU is also applying to participate in Google Summer of Code again and we hope to share news about additional internships later this year.

Outreachy interns work together with their mentors, experienced QEMU contributors who support their interns in their projects. Code developed during the internship is submitted via the same open source development process that all QEMU code follows. This gives interns experience with contributing to open source software. Some interns then choose to pursue a career in open source software after completing their internship.

Now is the chance to get involved in QEMU development!

If you have questions about applying for QEMU Outreachy, please email Stefan Hajnoczi or ask on the #qemu-gsoc IRC channel.

/2022/02/15/outreach-2022/
QEMU version 6.2.0 released
• sorangutan

1

1
Posts

41
Views
We’d like to announce the availability of the QEMU 6.2.0 release. This release contains 2300+ commits from 189 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:
- virtio-mem: guest memory dumps are now fully supported, along with pre-copy/post-copy migration and background guest snapshots
- QMP: support for nw DEVICE_UNPLUG_GUEST_ERROR to detect guest-reported hotplug failures
- TCG: improvements to TCG plugin argument syntax, and multi-core support for cache plugin
- 68k: improved support for Apple’s NuBus, including ability to load declaration ROMs, and slot IRQ support
- ARM: macOS hosts with Apple Silicon CPUs now support ‘hvf’ accelerator for AArch64 guests
- ARM: emulation support for Fujitsu A64FX processor model
- ARM: emulation support for kudo-mbc machine type
- ARM: M-profile MVE extension is now supported for Cortex-M55
- ARM: ‘virt’ machine now supports an emulated ITS (Interrupt Translation Service) and supports more than 123 CPUs in emulation mode
- ARM: xlnx-zcu102 and xlnx-versal-virt machines now support BBRAM and eFUSE devices
- PowerPC: improved POWER10 support for the ‘powernv’ machine type
- PowerPC: initial support for POWER10 DD2.0 CPU model
- PowerPC: support for FORM2 PAPR NUMA descriptions for ‘pseries’ machine type
- RISC-V: support for Zb[abcs] instruction set extensions
- RISC-V: support for vhost-user and numa mem options across all boards
- RISC-V: SiFive PWM support
- x86: support for new Snowridge-v4 CPU model
- x86: guest support for Intel SGX
- x86: AMD SEV guests now support measurement of kernel binary when doing direct kernel boot (not using a bootloader)
- and lots more…
Thank you to everyone involved!

/2021/12/14/qemu-6-2-0/
QEMU version 6.1.0 released
• sorangutan

1

1
Posts

35
Views
We’d like to announce the availability of the QEMU 6.1.0 release. This release contains 3000+ commits from 221 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:
- block: support for changing block node options after creation via ‘blockdev-reopen’ QMP command
- Crypto: more performant backend recommendations and improved documentation
- I2C: emulation support for I2C muxes (pca9546, pca9548) and PMBus
- TCG Plugins: now enabled by default, with new execlog and cache modelling plugins.
- ARM: new board support for Aspeed (rainier-bmc, quanta-q7l1), npcm7xx (quanta-gbs-bmc), and Cortex-M3 (stm32vldiscovery) based machines
- ARM: Aspeed support of Hash and Crypto Engine
- ARM: emulation support for SVE2 (including bfloat16), integer matrix multiply accumulate operations, TLB invalidate in Outer Shareable domain, TLB range invalidate, and more.
- PowerPC: pseries: support for detecting hotplug failures in newer guests
- PowerPC: pseries: increased maximum CPU count
- PowerPC: pseries: emulation support for some POWER10 prefixed instructions
- PowerPC: new board support for Genesi/bPlan Pegasos II (pegasos2)
- RISC-V: updates to OpenTitan platform support, including OpenTitan timer
- RISC-V: support for virtio-vga
- RISC-V: documentation improvements and general code cleanups/fixes
- s390: emulation support for the vector-enhancements facility
- s390: support for gen16 CPU models
- x86: new Intel CPU model versions with support for XSAVES instruction
- x86: added ACPI based PCI hotplug support for Q35 machine (now the default)
- x86: improvements to emulation of AMD virtualization extensions
- and lots more…
Thank you to everyone involved!

/2021/08/24/qemu-6-1-0/
Exporting block devices as raw image files with FUSE
• sorangutan

1

1
Posts

261
Views
Sometimes, there is a VM disk image whose contents you want to manipulate without booting the VM. For raw images, that process is usually fairly simple, because most Linux systems bring tools for the job, e.g.:
- dd to just copy data to and from given offsets,
- parted to manipulate the partition table,
- kpartx to present all partitions as block devices,
- mount to access filesystems’ contents.
Sadly, but naturally, such tools only work for raw images, and not for images e.g. in QEMU’s qcow2 format. To access such an image’s content, the format has to be translated to create a raw image, for example by:
- Exporting the image file with qemu-nbd -c as an NBD block device file,
- Converting between image formats using qemu-img convert,
- Accessing the image from a guest, where it appears as a normal block device.
Unfortunately, none of these methods is perfect: qemu-nbd -c generally requires root rights, converting to a temporary raw copy requires additional disk space and the conversion process takes time, and accessing the image from a guest is just quite cumbersome in general (and also specifically something that we set out to avoid in the first sentence of this blog post).

As of QEMU 6.0, there is another method, namely FUSE block exports. Conceptually, these are rather similar to using qemu-nbd -c, but they do not require root rights.

Note: FUSE block exports are a feature that can be enabled or disabled during the build process with --enable-fuse or --disable-fuse, respectively; omitting either configure option will enable the feature if and only if libfuse3 is present. It is possible that the QEMU build you are using does not have FUSE block export support, because it was not compiled in.

FUSE (Filesystem in Userspace) is a technology to let userspace processes provide filesystem drivers. For example, sshfs is a program that allows mounting remote directories from a machine accessible via SSH.

QEMU can use FUSE to make a virtual block device appear as a normal file on the host, so that tools like kpartx can interact with it regardless of the image format.

Background information

File mounts

A perhaps little-known fact is that, on Linux, filesystems do not need to have a root directory, they only need to have a root node. A filesystem that only provides a single regular file is perfectly valid.

Conceptually, every filesystem is a tree, and mounting works by replacing one subtree of the global VFS tree by the mounted filesystem’s tree. Normally, a filesystem’s root node is a directory, like in the following example:

Fig. 1: Mounting a regular filesystem with a directory as its root node

Here, the directory /foo and its content (the files /foo/a and /foo/b) are shadowed by the new filesystem (showing /foo/x and /foo/y).

Note that a filesystem’s root node generally has no name. After mounting, the filesystem’s root directory’s name is determined by the original name of the mount point.

Because a tree does not need to have multiple nodes but may consist of just a single leaf, a filesystem with a file for its root node works just as well, though:

Fig. 2: Mounting a filesystem with a regular (unnamed) file as its root node

Here, FS B only consists of a single node, a regular file with no name. (As above, a filesystem’s root node is generally unnamed.) Consequently, the mount point for it must also be a regular file (/foo/a in our example), and just like before, the content of /foo/a is shadowed, and when opening it, one will instead see the contents of FS B’s unnamed root node.

QEMU block exports

QEMU allows exporting block nodes via various protocols (as of 6.0: NBD, vhost-user, FUSE). A block node is an element of QEMU’s block graph (see e.g. Managing the New Block Layer, a talk given at KVM Forum 2017), which can for example be attached to guest devices. Here is a very simple example:

Fig. 3: A simple block graph for attaching a qcow2 image to a virtio-blk guest device

This is the simplest example for a block graph that connects a virtio-blk guest device to a qcow2 image file. The file block driver, instanced in the form of a block node named prot-node, accesses the actual file and provides the node above it access to the raw content. This node above, named fmt-node, is handled by the qcow2 block driver, which is capable of interpreting the qcow2 format. Parents of this node will therefore see the actual content of the virtual disk that is represented by the qcow2 image. There is only one parent here, which is the virtio-blk guest device, which will thus see the virtual disk.

The command line to achieve the above could look something like this:
```
$ qemu-system-x86_64 \
    -blockdev node-name=prot-node,driver=file,filename=$image_path \
    -blockdev node-name=fmt-node,driver=qcow2,file=prot-node \
    -device virtio-blk,drive=fmt-node
```
Besides attaching guest devices to block nodes, you can also export them for users outside of qemu, for example via NBD. Say you have a QMP channel open for the QEMU instance above, then you could do this:
```
{
    "execute": "nbd-server-start",
    "arguments": {
        "addr": {
            "type": "inet",
            "data": {
                "host": "localhost",
                "port": "10809"
            }
        }
    }
}
{
    "execute": "block-export-add",
    "arguments": {
        "type": "nbd",
        "id": "fmt-node-export",
        "node-name": "fmt-node",
        "name": "guest-disk"
    }
}
```
This opens an NBD server on localhost:10809, which exports fmt-node (under the NBD export name guest-disk). The block graph looks as follows:

Fig. 4: Block graph extended by an NBD server

NBD clients connecting to this server will see the raw disk as seen by the guest – we have exported the guest disk:
```
$ qemu-img info nbd://localhost/guest-disk
image: nbd://localhost:10809/guest-disk
file format: raw
virtual size: 20 GiB (21474836480 bytes)
disk size: unavailable
```
QEMU storage daemon

If you are not running a guest, and so do not need guest devices, but all you want is to use the QEMU block layer (for example to interpret the qcow2 format) and export nodes from the block graph, then you can use the more lightweight QEMU storage daemon instead of a full-blown QEMU process:
```
$ qemu-storage-daemon \
    --blockdev node-name=prot-node,driver=file,filename=$image_path \
    --blockdev node-name=fmt-node,driver=qcow2,file=prot-node \
    --nbd-server addr.type=inet,addr.host=localhost,addr.port=10809 \
    --export type=nbd,id=fmt-node-export,node-name=fmt-node,name=guest-disk
```
Which creates the following block graph:

Fig. 5: Exporting a qcow2 image over NBD

FUSE block exports

Besides NBD exports, QEMU also supports vhost-user and FUSE exports. FUSE block exports make QEMU become a FUSE driver that provides a filesystem that consists of only a single node, namely a regular file that has the raw contents of the exported block node. QEMU will automatically mount this filesystem on a given existing regular file (which acts as the mount point, as described in the “File mounts” section).

Thus, FUSE exports can be used like this:
```
$ touch mount-point

$ qemu-storage-daemon \
  --blockdev node-name=prot-node,driver=file,filename=$image_path \
  --blockdev node-name=fmt-node,driver=qcow2,file=prot-node \
  --export type=fuse,id=fmt-node-export,node-name=fmt-node,mountpoint=mount-point
```
The mount point now appears as the raw VM disk that is stored in the qcow2 image:
```
$ qemu-img info mount-point
image: mount-point
file format: raw
virtual size: 20 GiB (21474836480 bytes)
disk size: 196 KiB
```
And mount tells us that this is indeed its own filesystem:
```
$ mount | grep mount-point
/dev/fuse on /tmp/mount-point type fuse (rw,nosuid,nodev,relatime,user_id=1000,
group_id=100,default_permissions,allow_other,max_read=67108864)
```
The block graph looks like this:

Fig. 6: Exporting a qcow2 image over FUSE

Closing the storage daemon (e.g. with Ctrl-C) automatically unmounts the export, turning the mount point back into an empty normal file:
```
$ mount | grep -c mount-point
0

$ qemu-img info mount-point
image: mount-point
file format: raw
virtual size: 0 B (0 bytes)
disk size: 0 B
```
Mounting an image on itself

So far, we have seen what FUSE exports are, how they work, and how they can be used. Now let’s add an interesting twist.

What happens to the old tree under a mount point?

Mounting a filesystem only shadows the mount point’s original content, it does not remove it. The original content can no longer be looked up via its (absolute) path, but it is still there, much like a file that has been unlinked but is still open in some process. Here is an example:

First, create some file in some directory, and have some process keep it open:
```
$ mkdir foo

$ echo 'Is anyone there?' > foo/bar

$ irb
irb(main):001:0> f = File.open('foo/bar', 'r+')
=> #<File:foo/bar>
irb(main):002:0> ^Z
[1]  + 35494 suspended  irb
```
Next, mount something on the directory:
```
$ sudo mount -t tmpfs tmpfs foo
```
The file cannot be found anymore (because foo’s content is shadowed by the mounted filesystem), but the process who kept it open can still read from it, and write to it:
```
$ ls foo

$ cat foo/bar
cat: foo/bar: No such file or directory

$ fg
f.read
irb(main):002:0> f.read
=> "Is anyone there?\n"
irb(main):003:0> f.puts('Hello from the shadows!')
=> nil
irb(main):004:0> exit

$ ls foo

$ cat foo/bar
cat: foo/bar: No such file or directory
```
Unmounting the filesystem lets us see our file again, with its updated content:
```
$ sudo umount foo

$ ls foo
bar

$ cat foo/bar
Is anyone there?
Hello from the shadows!
```
Letting a FUSE export shadow its image file

The same principle applies to file mounts: The original inode is shadowed (along with its content), but it is still there for any process that opened it before the mount occurred. Because QEMU (or the storage daemon) opens the image file before mounting the FUSE export, you can therefore specify an image’s path as the mount point for its corresponding export:
```
$ qemu-img create -f qcow2 foo.qcow2 20G
Formatting 'foo.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off
 compression_type=zlib size=21474836480 lazy_refcounts=off refcount_bits=16

$ qemu-img info foo.qcow2
image: foo.qcow2
file format: qcow2
virtual size: 20 GiB (21474836480 bytes)
disk size: 196 KiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

$ qemu-storage-daemon --blockdev \
   node-name=node0,driver=qcow2,file.driver=file,file.filename=foo.qcow2 \
   --export type=fuse,id=node0-export,node-name=node0,mountpoint=foo.qcow2 &
[1] 40843

$ qemu-img info foo.qcow2
image: foo.qcow2
file format: raw
virtual size: 20 GiB (21474836480 bytes)
disk size: 196 KiB

$ kill %1
[1]  + 40843 done       qemu-storage-daemon --blockdev  --export
```
In graph form, that looks like this:

Fig. 6: Exporting a qcow2 image via FUSE on its own path

QEMU (or the storage daemon in this case) keeps the original (qcow2) file open, and so it keeps access to it, even after the mount. However, any other process that opens the image by name (i.e. open("foo.qcow2")) will open the raw disk image exported by QEMU. Therefore, it looks like the qcow2 image is in raw format now.

qemu-fuse-disk-export.py

Because the QEMU storage daemon command line tends to become kind of long, I’ve written a script to facilitate the process: qemu-fuse-disk-export.py (direct download link). This script automatically detects the image format, and its --daemonize option allows safe use in scripts, where it is important that the process blocks until the export is fully set up.

Using qemu-fuse-disk-export.py, the above example looks like this:
```
$ qemu-img info foo.qcow2 | grep 'file format'
file format: qcow2

$ qemu-fuse-disk-export.py foo.qcow2 &
[1] 13339
All exports set up, ^C to revert

$ qemu-img info foo.qcow2 | grep 'file format'
file format: raw

$ kill -SIGINT %1
[1]  + 13339 done       qemu-fuse-disk-export.py foo.qcow2

$ qemu-img info foo.qcow2 | grep 'file format'
file format: qcow2
```
Or, with --daemonize/-d:
```
$ qemu-img info foo.qcow2 | grep 'file format'
file format: qcow2

$ qemu-fuse-disk-export.py -dp qfde.pid foo.qcow2

$ qemu-img info foo.qcow2 | grep 'file format'
file format: raw

$ kill -SIGINT $(cat qfde.pid)

$ qemu-img info foo.qcow2 | grep 'file format'
file format: qcow2
```
Bringing it all together

Now we know how to make disk images in any format understood by QEMU appear as raw images. We can thus run any application on them that works with such raw disk images:
```
$ qemu-fuse-disk-export.py \
    -dp qfde.pid \
    Arch-Linux-x86_64-basic-20210711.28787.qcow2

$ parted Arch-Linux-x86_64-basic-20210711.28787.qcow2 p
WARNING: You are not superuser.  Watch out for permissions.
Model:  (file)
Disk /tmp/Arch-Linux-x86_64-basic-20210711.28787.qcow2: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  2097kB  1049kB                     bios_grub
 2      2097kB  42.9GB  42.9GB  btrfs

$ sudo kpartx -av Arch-Linux-x86_64-basic-20210711.28787.qcow2
add map loop0p1 (254:0): 0 2048 linear 7:0 2048
add map loop0p2 (254:1): 0 83881951 linear 7:0 4096

$ sudo mount /dev/mapper/loop0p2 /mnt/tmp

$ ls /mnt/tmp
bin   boot  dev  etc  home  lib  lib64  mnt  opt  proc  root  run  sbin  srv
swap  sys   tmp  usr  var

$ echo 'Hello, qcow2 image!' > /mnt/tmp/home/arch/hello

$ sudo umount /mnt/tmp

$ sudo kpartx -d Arch-Linux-x86_64-basic-20210711.28787.qcow2
loop deleted : /dev/loop0

$ kill -SIGINT $(cat qfde.pid)
```
And launching the image, in the guest we see:
```
[arch@archlinux ~] cat hello
Hello, qcow2 image!
```
A note on allow_other

In the example presented in the above section, we access the exported image with a different user than the one who exported it (to be specific, we export it as a normal user, and then access it as root). This does not work prior to QEMU 6.1:
```
$ qemu-fuse-disk-export.py -dp qfde.pid foo.qcow2

$ sudo stat foo.qcow2
stat: cannot statx 'foo.qcow2': Permission denied
```
QEMU 6.1 has introduced support for FUSE’s allow_other mount option. Without that option, only the user who exported the image has access to it. By default, if the system allows for non-root users to add allow_other to FUSE mount options, QEMU will add it, and otherwise omit it. It does so by simply attempting to mount the export with allow_other first, and if that fails, it will try again without. (You can also force the behavior with the allow_other=(on|off|auto) export parameter.)

Non-root users can pass allow_other if and only if /etc/fuse.conf contains the user_allow_other option.

Conclusion

As shown in this blog post, FUSE block exports are a relatively simple way to access images in any format understood by QEMU as if they were raw images. Any tool that can manipulate raw disk images can thus manipulate images in any format, simply by having the QEMU storage daemon provide a translation layer. By mounting the FUSE export on the original image path, this translation layer will effectively be invisible, and the original image will look like it is in raw format, so it can directly be accessed by those tools.

The current main disadvantage of FUSE exports is that they offer relatively bad performance. That should be fine as long as your use case is just light manipulation of some VM images, like manually modifying some files on them. However, we did not yet really try to optimize performance, so if more serious use cases appear that would require better performance, we can try.

/2021/08/22/fuse-blkexport/
Cache Modelling TCG Plugin
• sorangutan

1

1
Posts

47
Views
Caches are a key way that enables modern CPUs to keep running at full speed by avoiding the need to fetch data and instructions from the comparatively slow system memory. As a result understanding cache behaviour is a key part of performance optimisation.

TCG plugins provide means to instrument generated code for both user-mode and full system emulation. This includes the ability to intercept every memory access and instruction execution. This post introduces a new TCG plugin that’s used to simulate configurable L1 separate instruction cache and data cache.

While different microarchitectures often have different approaches at the very low level, the core concepts of caching are universal. As QEMU is not a microarchitectural emulator we model an ideal caching system with a few simple parameters. By doing so, we can adequately simulate the behaviour of L1 private (per-core) caches.

Overview

The plugin simulates how L1 user-configured caches would behave when given a working set defined by a program in user-mode, or system-wide working set. Subsequently, it logs performance statistics along with the most N cache-thrashing instructions.

Configurability

The plugin is configurable in terms of:
- icache size parameters: icachesize, iblksize, iassoc, All of which take a numeric value
- dcache size parameters: dcachesize, dblksize, dassoc. All of which take a numeric value
- Eviction policy: evict=lru|rand|fifo
- How many top-most thrashing instructions to log: limit=TOP_N
- How many core caches to keep track of: cores=N_CORES
Multicore caching

Multicore caching is achieved by having independent L1 caches for each available core.

In full-system emulation, the number of available vCPUs is known to the plugin at plugin installation time, so separate caches are maintained for those.

In user-space emulation, the index of the vCPU initiating memory access monotonically increases and is limited with however much the kernel allows creating. The approach used is that we allocate a static number of caches, and fit all memory accesses into those cores. This approximation is sufficiently similar to real systems since having more threads than cores will result in interleaving those threads between the available cores so they might thrash each other anyway.

Design and implementation

General structure

A generic cache data structure, Cache, is used to model either an icache or dcache. For each known core, the plugin maintains an icache and a dcache. On a memory access coming from a core, the corresponding cache is interrogated.

Each cache has a number of cache sets that are used to store the actual cached locations alongside metadata that backs eviction algorithms. The structure of a cache with n sets, and m blocks per sets is summarized in the following figure:

Eviction algorithms

The plugin supports three eviction algorithms:
- Random eviction
- Least recently used (LRU)
- FIFO eviction
Random eviction

On a cache miss that requires eviction, a randomly chosen block is evicted to make room for the newly-fetched block.

Using random eviction effectively requires no metadata for each set.

Least recently used (LRU)

For each set, a generation number is maintained that is incremented on each memory access and. The current generation number is assigned to the block currently being accessed. On a cache miss, the block with the least generation number is evicted.

FIFO eviction

A FIFO queue instance is maintained for each set. On a cache miss, the evicted block is the first-in block, and the newly-fetched block is enqueued as the last-in block.

Usage

Now a simple example usage of the plugin is demonstrated by running a program that does matrix multiplication, and how the plugin helps identify code that thrashes the cache.

A program, test_mm uses the following function to carry out matrix multiplication:
```
void mm(int n, int m1[n][n], int m2[n][n], int res[n][n])
{
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            int sum = 0;
            for (int k = 0; k < n; k++) {
                int op1 = m1[i][k];
                int op2 = m2[k][j];
                sum += op1 * op2;
            }
            res[i][j] = sum;
        }
    }
}
```
Running mm_test inside QEMU using the following command:
```
./qemu-x86_64 $(QEMU_ARGS) \
  -plugin ./contrib/plugins/libcache.so,dcachesize=8192,dassoc=4,dblksize=64,\
      icachesize=8192,iassoc=4,iblksize=64 \
  -d plugin \
  -D matmul.log \
  ./mm_test
```
The preceding command will run QEMU and attach the plugin with the following configuration:
- dcache: cache size = 8KBs, associativity = 4, block size = 64B.
- icache: cache size = 8KBs, associativity = 4, block size = 64B.
- Default eviction policy is LRU (used for both caches).
- Default number of cores is 1.
The following data is logged in matmul.log:
```
core #, data accesses, data misses, dmiss rate, insn accesses, insn misses, imiss rate
0       4908419        274545          5.5933%  8002457        1005            0.0126%

address, data misses, instruction
0x4000001244 (mm), 262138, movl (%rdi, %rsi, 4), %esi
0x400000121c (mm), 5258, movl (%rdi, %rsi, 4), %esi
0x4000001286 (mm), 4096, movl %edi, (%r8, %rsi, 4)
0x400000199c (main), 257, movl %edx, (%rax, %rcx, 4)

...
```
We can observe two things from the logs:
- The most cache-thrashing instructions belong to a symbol called mm, which happens to be the matrix multiplication function.
- Some array-indexing instructions are generating the greatest share of data misses.
test_mm does a bunch of other operations other than matrix multiplication. However, Using the plugin data, we can narrow our investigation space to mm, which happens to be generating about 98% of the overall number of misses.

Now we need to find out why is the instruction at address 0x4000001224 thrashing the cache. Looking at the disassembly of the program, using objdump -Sl test_mm:
```
/path/to/test_mm.c:11 (discriminator 3)
                int op2 = m2[k][j];  <- The line of code we're interested in
    1202:   8b 75 c0               mov    -0x40(%rbp),%esi
    1205:   48 63 fe               movslq %esi,%rdi
    1208:   48 63 f2               movslq %edx,%rsi
    120b:   48 0f af f7            imul   %rdi,%rsi
    120f:   48 8d 3c b5 00 00 00   lea    0x0(,%rsi,4),%rdi
    1216:   00
    1217:   48 8b 75 a8            mov    -0x58(%rbp),%rsi
    121b:   48 01 f7               add    %rsi,%rdi
    121e:   8b 75 c8               mov    -0x38(%rbp),%esi
    1221:   48 63 f6               movslq %esi,%rsi
    1224:   8b 34 b7               mov    (%rdi,%rsi,4),%esi
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^
    1227:   89 75 d4               mov    %esi,-0x2c(%rbp)
```
It can be seen that the most problematic instruction is associated with loading m2[k][j]. This happens because we’re traversing m2 in a column-wise order. So if the matrix m2 is larger than the data cache, we end up with fetching blocks that we only use one integer from and not use again before getting evicted.

A simple solution to this problem is to transpose the second matrix and access it in a row-wise order.

By editing the program to transpose m2 before calling mm and run it inside QEMU with the plugin attached and using the same configuration as previously, the following data is logged in matmul.log:
```
core #, data accesses, data misses, dmiss rate, insn accesses, insn misses, imiss rate
0       4998994        24235           0.4848%  8191937        1009            0.0123%

address, data misses, instruction
0x4000001244 (mm), 16447, movl (%rdi, %rsi, 4), %esi
0x4000001359 (tran), 3994, movl (%rcx, %rdx, 4), %ecx
0x4000001aa7 (main), 257, movl %edx, (%rax, %rcx, 4)
0x4000001a72 (main), 257, movl %ecx, (%rax, %rdx, 4)

...
```
It can be seen that a minor number of misses is generated at transposition time in tran. The rest of the matrix multiplication is carried out using the same procedure but to multiply m1[i][k] by m2[j][k]. So m2 is traversed row-wise and hence utilized cache space much more optimally.

Multi-core caching

The plugin accepts a cores=N_CORES argument that represents the number of cores that the plugin must keep track of. Memory accesses generated by excess threads will be served through the available core caches. The model is an approximation, as described, and is most-akin to idealized behaviour when the number of threads generated by the program is less than cores available, otherwise inter-thread thrashing will invariably occur.

An example usage of the plugin using the cores argument to use 4 per-core caches against a multithreaded program:
```
./qemu-x86_64 $(QEMU_ARGS) \
    -plugin ./contrib/plugins/libcache.so,cores=4 \
    -d plugin \
    -D logfile \
    ./threaded_prog
```
This reports out the following:
```
core #, data accesses, data misses, dmiss rate, insn accesses, insn misses, imiss rate
0       76739          4195          5.411666%  242616         1555            0.6409%
1       29029          932           3.211106%  70939          988             1.3927%
2       6218           285           4.511835%  15702          382             2.4328%
3       6608           297           4.411946%  16342          384             2.3498%
sum     118594         5709          4.811139%  345599         3309            0.9575%

...
```
Conclusion

By emulating simple configurations of icache and dcache we can gain insights into how a working set is utilizing cache memory. Simplicity is sought and L1 cache is emphasized since its under-utilization can be severe to the overall system performance.

This plugin is made as part of my GSoC participation for the year 2021 under the mentorship of Alex Bennée.

List of posted patches related to the plugin:
The first series, (plugins: New TCG plugin for cache modelling), along with the bug fixes patches are already merged to the QEMU main tree, the remaining patches are merged to the plugins/next tree, awaiting merging to the main tree, since we’re in a release cycle as of the time of posting.

/2021/08/19/tcg-cache-modelling-plugin/
QEMU version 6.0.0 released
• sorangutan

1

1
Posts

50
Views
We’d like to announce the availability of the QEMU 6.0.0 release. This release contains 3300+ commits from 268 authors.

You can grab the tarball from our download page. The full list of changes are available in the Wiki.

Highlights include:
- 68k: new ‘virt’ machine type based on virtio devices
- ARM: support for ARMv8.1-M ‘Helium’ architecture and Cortex-M55 CPU
- ARM: support for ARMv8.4 TTST, SEL2, and DIT extensions
- ARM: ARMv8.5 MemTag extension now available for both system and usermode emulation
- ARM: support for new mps3-an524, mps3-an547 board models
- ARM: additional device emulation support for xlnx-zynqmp, xlnx-versal, sbsa-ref, npcm7xx, and sabrelite board models
- Hexagon: new emulation support for Qualcomm hexagon DSP units
- MIPS: new Loongson-3 ‘virt’ machine type
- PowerPC: external BMC support for powernv machine type
- PowerPC: pseries machines now report memory unplug failures to management tools, as well as retrying unsuccessful CPU unplug requests
- RISC-V: Microchip PolarFire board now supports QSPI NOR flash
- Tricore: support for new TriBoard board model emulating Infineon TC27x SoC
- x86: AMD SEV-ES support for running guests with secured CPU register state
- x86: TCG emulation support for protection keys (PKS)
- ACPI: support for assigning NICs to known names in guest OS independently of PCI slot placement
- NVMe: new emulation support for v1.4 spec with many new features, experimental support for Zoned Namespaces, multipath I/O, and End-to-End Data Protection.
- virtiofs: performance improvements with new USE_KILLPRIV_V2 guest feature
- VNC: virtio-vga support for scaling resolution based on client window size
- QMP: backup jobs now support multiple asynchronous requests in parallel
- and lots more…
Thank you to everyone involved!

/2021/04/30/qemu-6-0-0/
Google Summer of Code 2021 is on!
• sorangutan

1

1
Posts

36
Views
QEMU has been accepted into Google Summer of Code 2021 and we look forward to mentoring talented students from around the world as they make open source contributions this summer. GSoC is a remote work open source internship program where students work on a project for an open source organization like QEMU.

Check out the project ideas page where there are 10 projects that eligible students can apply for. This year we have C, Rust, and Python projects in various areas related to emulation and virtualization.

If you are a student who is interested in doing an internship this summer, head over to QEMU’s GSoC organization page where you can read about how to apply and learn more about Google Summer of Code in general.

The GSoC 2021 timeline is:
- Student application period - March 29 - April 13
- Student projects announced - May 17
- Community bonding period - May 17 - June 7
- Coding - June 7 - August 16
We look forward to meeting you and answering questions on the #qemu-gsoc IRC channel on irc.oftc.net!

/2021/03/10/gsoc-and-outreachy-2021-timelines/
QEMU is applying to Google Summer of Code and Outreachy 2021
• sorangutan

1

1
Posts

38
Views

QEMU is applying to Google Summer of Code 2021 and is participating in Outreachy May-August 2021. Both of these open source internship programs offer remote work opportunities for new developers wishing to get involved in our community.

Interns work with mentors who support them in their project. The code developed during the project is submitted via the same open source development process that all QEMU code follows. This gives interns experience with contributing to open source software.

QEMU’s mentors are experienced contributors who enjoy working with talented individuals who are getting started in open source. You can find a list of project ideas that mentors are proposing here.

Outreachy

Initial applications are open until February 22nd at 16:00 UTC. Outreachy’s goal is to increase diversity in open source and is open to anyone who faces under-representation, systemic bias, or discrimination in the technology industry of their country.

You can learn more about Outreachy May-August and how to apply at the Outreachy website.

Google Summer of Code

Google Summer of Code (GSOC) is a 10-week internship for students. Applications are open from March 29th to April 13th. You can find the details of how to apply at the Google Summer of Code website.

Google will announced accepted organizations on March 9th. QEMU is applying and we hope to mentors GSoC interns again this year!

Please review the eligibility criteria for GSoC before applying.

/2021/02/17/gsoc-and-outreachy-2021/

1 / 4

QEMU version 9.0.0 released • sorangutan

QEMU version 8.2.0 released • sorangutan

QEMU version 8.1.0 released • sorangutan

QEMU version 8.0.0 released • sorangutan

Preparing a consistent Python environment • sorangutan

Virtual environments

Distribution packages

Nesting virtual environments

Conclusion

KVM Forum 2023: Call for presentations • sorangutan

Call for presentations

Attending KVM Forum

Announcing QEMU Google Summer of Code and Outreachy 2023 internships • sorangutan

Find out if you are eligible

Select a project idea

Submit your application

QEMU version 7.2.0 released • sorangutan

Introduction to Zoned Storage Emulation • sorangutan

Zoned block devices

Zoned emulation

Starting the journey with open source

Conclusion

QEMU version 7.1.0 released • sorangutan

QEMU version 7.0.0 released • sorangutan

Apply for a QEMU Google Summer of Code internship • sorangutan

QEMU welcomes Outreachy internship applicants • sorangutan

QEMU version 6.2.0 released • sorangutan

QEMU version 6.1.0 released • sorangutan

Exporting block devices as raw image files with FUSE • sorangutan

Background information

File mounts

QEMU block exports

QEMU storage daemon

FUSE block exports

Mounting an image on itself

What happens to the old tree under a mount point?

Letting a FUSE export shadow its image file

qemu-fuse-disk-export.py

Bringing it all together

A note on allow_other

Conclusion

Cache Modelling TCG Plugin • sorangutan

Overview

Configurability

Multicore caching

Design and implementation

General structure

Eviction algorithms

Random eviction

Least recently used (LRU)

FIFO eviction

Usage

Multi-core caching

Conclusion

QEMU version 6.0.0 released • sorangutan

Google Summer of Code 2021 is on! • sorangutan

QEMU is applying to Google Summer of Code and Outreachy 2021 • sorangutan

Outreachy

Google Summer of Code

QEMU version 9.0.0 released
• sorangutan

QEMU version 8.2.0 released
• sorangutan

QEMU version 8.1.0 released
• sorangutan

QEMU version 8.0.0 released
• sorangutan

Preparing a consistent Python environment
• sorangutan

KVM Forum 2023: Call for presentations
• sorangutan

Announcing QEMU Google Summer of Code and Outreachy 2023 internships
• sorangutan

QEMU version 7.2.0 released
• sorangutan

Introduction to Zoned Storage Emulation
• sorangutan

QEMU version 7.1.0 released
• sorangutan

QEMU version 7.0.0 released
• sorangutan

Apply for a QEMU Google Summer of Code internship
• sorangutan

QEMU welcomes Outreachy internship applicants
• sorangutan

QEMU version 6.2.0 released
• sorangutan

QEMU version 6.1.0 released
• sorangutan

Exporting block devices as raw image files with FUSE
• sorangutan

`qemu-fuse-disk-export.py`

A note on `allow_other`

Cache Modelling TCG Plugin
• sorangutan

QEMU version 6.0.0 released
• sorangutan

Google Summer of Code 2021 is on!
• sorangutan

QEMU is applying to Google Summer of Code and Outreachy 2021
• sorangutan