<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Akihiro Suda on Medium]]></title>
        <description><![CDATA[Stories by Akihiro Suda on Medium]]></description>
        <link>https://medium.com/@AkihiroSuda?source=rss-814b1fd299ce------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/0*8V-MwAEKzwAfS9tk</url>
            <title>Stories by Akihiro Suda on Medium</title>
            <link>https://medium.com/@AkihiroSuda?source=rss-814b1fd299ce------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Mon, 15 Jun 2026 23:25:09 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@AkihiroSuda/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Improvements to Rootless mode in Docker v29.5]]></title>
            <link>https://medium.com/nttlabs/improvements-to-rootless-mode-in-docker-v29-5-4e0347464ad0?source=rss-814b1fd299ce------2</link>
            <guid isPermaLink="false">https://medium.com/p/4e0347464ad0</guid>
            <category><![CDATA[docker]]></category>
            <category><![CDATA[rootless]]></category>
            <category><![CDATA[user-mode]]></category>
            <dc:creator><![CDATA[Akihiro Suda]]></dc:creator>
            <pubDate>Fri, 15 May 2026 20:31:33 GMT</pubDate>
            <atom:updated>2026-05-15T20:31:33.073Z</atom:updated>
            <content:encoded><![CDATA[<p>Rootless mode, which enables running Docker daemon without root privileges, has been significantly improved in <a href="https://docs.docker.com/engine/release-notes/29/#2950">Docker v29.5</a> (May 15, 2026):</p><ul><li>Faster image pulling and pushing</li><li>Support for docker run --net=host</li><li>Support for localhost registries</li><li>Source IP propagation without the legacy slirp4netns dependency</li></ul><h3>What is Rootless mode?</h3><p>Rootless mode means running the entire Docker daemon (not just containers) as a non-root user, for protecting the host from potential Docker vulnerabilities and misconfigurations. Even if an attacker escapes from a container, they can access only the files and processes available to the non-root daemon user.</p><p>Rootless mode itself is not new; it was originally implemented in 2018 and has been merged into Docker since <a href="https://docs.docker.com/engine/release-notes/19.03/#19030">v19.03</a> (2019).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/638/0*u-0uZGTVLLQ8EGcj" /><figcaption><a href="https://www.slideshare.net/Docker/dcsf19-hardening-docker-daemon-with-rootless-mode/3">https://www.slideshare.net/Docker/dcsf19-hardening-docker-daemon-with-rootless-mode/3</a></figcaption></figure><p><a href="https://medium.com/nttlabs/rootless-docker-12decb900fb9">RootlessモードでDockerをより安全にする [DockerCon発表レポート]</a></p><h4>Getting started</h4><p>To get started with Rootless Docker, install the docker-ce-rootless-extras package and run dockerd-rootless-setuptool.sh install as a non-root user.</p><pre># See https://docs.docker.com/engine/install for how to configure apt<br>sudo apt install docker-ce-rootless-extras<br><br>dockerd-rootless-setuptool.sh install</pre><p>See also:</p><ul><li><a href="https://docs.docker.com/engine/security/rootless/">https://docs.docker.com/engine/security/rootless/</a></li><li><a href="https://rootlesscontaine.rs/">https://rootlesscontaine.rs/</a></li></ul><h3>Improvements in Docker v29.5</h3><p>Rootless Docker had been notorious for its limitations in networking, as the entire daemon was encapsulated in a network namespace (NetNS) associated with a user-mode TCP/IP stack such as <a href="https://github.com/rootless-containers/slirp4netns">slirp4netns</a>.</p><p>Such limitations included:</p><ul><li>Poor throughput of image pulling and pushing (typically less than 10 Gbps)</li><li>Lack of support for docker run --net=host</li><li>Lack of support for localhost registries ( docker pull localhost:PORT/IMAGE)</li></ul><p><strong>These limitations have been resolved in Docker v29.5</strong>, by moving the daemon out of the NetNS associated with user-mode TCP/IP stack.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*dPqFlHLmvHcFuRhzPPQ2WA.png" /><figcaption>NetNS for User-mode TCP/IP is now detached from dockerd</figcaption></figure><p>Notably, support for docker run --net=host should be highly useful, as it allows containers to bypass the overhead of the user-mode TCP/IP entirely. It should be still noted that --net=host carries the security risk of exposing abstract UNIX sockets to the containers, however, it is less likely catastrophic in the case of rootless mode. This concern can be even alleviated by specifying --user=SUBUSER in conjunction.</p><p><a href="https://medium.com/nttlabs/dont-use-host-network-namespace-f548aeeef575">[CVE-2020–15257] Don’t use --net=host . Don’t use spec.hostNetwork .</a></p><h4>Elimination of slirp4netns dependency</h4><p>Besides, this release also replaces <a href="https://github.com/rootless-containers/slirp4netns">slirp4netns</a> with <a href="https://github.com/containers/gvisor-tap-vsock">gvisor-tap-vsock</a> in the default setup, as slirp4netns is based on very old and potentially unsafe C code <a href="https://en.wikipedia.org/wiki/Slirp">dating back to the 1990s</a>. In contrast, gvisor-tap-vsock is written in pure Go and expected to have fewer potential vulnerabilities, although it is still not completely free from <a href="https://github.com/containers/gvisor-tap-vsock/blob/v0.8.8/vendor/gvisor.dev/gvisor/pkg/tcpip/checksum/checksum_unsafe.go">unsafe code</a>.</p><p>In prior releases of Rootless Docker, users often specified an environment variable DOCKERD_ROOTLESS_ROOTLESSKIT_PORT_DRIVER=slirp4netns to switch the port driver of <a href="https://github.com/rootless-containers/rootlesskit">RootlessKit</a> from builtin to slirp4netns for enabling source IP propagation in port forwarding ( docker run -p ). Otherwise, the source IP information in TCP packets was always forged to have the IP of Docker’s bridge interface (typically 172.17.0.1).</p><p>In Docker v29.5, users do not need to specify DOCKERD_ROOTLESS_ROOTLESSKIT_PORT_DRIVER , however, they have to disable userland-proxy for enabling source IP propagation:</p><pre>mkdir -p ~/.config/docker<br>echo &#39;{&quot;userland-proxy&quot;: false}&#39; &gt;~/.config/docker/daemon.json<br>systemctl --user restart docker</pre><p>The userland-proxy is planned to be disabled by default in a future release of Docker:</p><p><a href="https://github.com/moby/moby/issues/14856">Disable Userland proxy by default · Issue #14856 · moby/moby</a></p><p>Also, depending on the host configuration, users may need to load the br_netfilter kernel module:</p><pre>sudo tee /etc/modules-load.d/docker.conf &lt;&lt;EOF &gt;/dev/null<br>br_netfilter<br>EOF<br>sudo systemctl restart systemd-modules-load.service</pre><h3>NTT is hiring!</h3><p>We at <a href="https://www.rd.ntt/e/">NTT</a> are looking for engineers who work in Open Source communities in the fields of containers, etc. Visit &lt;<a href="https://www.rd.ntt/e/sic/recruit/">https://www.rd.ntt/e/sic/recruit/</a>&gt; to see how to join us.</p><p>私たち<a href="https://www.rd.ntt/">NTT</a>は、コンテナなどの領域のオープンソースコミュニティで、共に活動する仲間を募集しています。ぜひ弊社採用情報ページをご覧ください: &lt;<a href="https://www.rd.ntt/sic/recruit/">https://www.rd.ntt/sic/recruit/</a>&gt;</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=4e0347464ad0" width="1" height="1" alt=""><hr><p><a href="https://medium.com/nttlabs/improvements-to-rootless-mode-in-docker-v29-5-4e0347464ad0">Improvements to Rootless mode in Docker v29.5</a> was originally published in <a href="https://medium.com/nttlabs">nttlabs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Ubuntu 26.04 can install APT packages from GitHub Container Registry]]></title>
            <link>https://medium.com/nttlabs/ubuntu-26-04-can-install-apt-packages-from-github-container-registry-532412990318?source=rss-814b1fd299ce------2</link>
            <guid isPermaLink="false">https://medium.com/p/532412990318</guid>
            <category><![CDATA[ubuntu]]></category>
            <category><![CDATA[apt-get]]></category>
            <category><![CDATA[github-container-registry]]></category>
            <category><![CDATA[open-container-initiative]]></category>
            <dc:creator><![CDATA[Akihiro Suda]]></dc:creator>
            <pubDate>Thu, 23 Apr 2026 12:58:50 GMT</pubDate>
            <atom:updated>2026-04-23T12:58:50.307Z</atom:updated>
            <content:encoded><![CDATA[<p>With the release of version 26.04 planned today, Ubuntu now supports installing APT packages hosted on OCI-compliant container image registries such as GitHub Container Registry (ghcr.io), via the <a href="https://github.com/AkihiroSuda/apt-transport-oci">apt-transport-oci </a>plugin I wrote <a href="https://x.com/_AkihiroSuda_/status/1410126837280215044">5 years ago</a>.</p><p>This means third-party package maintainers no longer need to maintain their own web servers for hosting apt packages. They still have to maintain their container image registry, but it is already offered by GitHub for free.</p><blockquote><strong><em>Note</em></strong><em>: &quot;OCI&quot; in this article refers to the “</em><a href="https://opencontainers.org/"><em>Open Container Initiative</em></a><em>”, not to “Oracle Cloud Infrastructure”.</em></blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hZpaub1V49liToJlFJ69ow.png" /></figure><h3>Example (for package consumers)</h3><p>The following example installs the hello-apt-transport-oci package from the <a href="https://ghcr.io/akihirosuda/apt-transport-oci-examples:latest">oci://ghcr.io/akihirosuda/apt-transport-oci-examples:latest</a> image, which is built from the GitHub repository <a href="https://github.com/AkihiroSuda/apt-transport-oci-examples">https://github.com/AkihiroSuda/apt-transport-oci-examples</a> .</p><p>First, install the <a href="https://github.com/AkihiroSuda/apt-transport-oci">apt-transport-oci </a>plugin:</p><pre>sudo apt install apt-transport-oci</pre><p>Then create /etc/apt/sources.list.d/oci.sources with the following content:</p><pre>Types: deb<br>URIs: oci://ghcr.io/akihirosuda/apt-transport-oci-examples:latest<br>Suites: stable<br>Components: main<br>Signed-By: /etc/apt/keyrings/apt-transport-oci-examples.gpg</pre><p>Download the GPG key:</p><pre>curl -fsSL https://raw.githubusercontent.com/AkihiroSuda/apt-transport-oci-examples/refs/heads/master/apt-transport-oci-examples.gpg \<br>  | sudo gpg --dearmor -o /etc/apt/keyrings/apt-transport-oci-examples.gpg</pre><p>Confirm the signature:</p><pre>$ <strong>gpg --show-keys --with-fingerprint /etc/apt/keyrings/apt-transport-oci-examples.gpg </strong><br>pub   ed25519 2026-04-21 [SC] [expires: 2029-04-20]<br>      E26B 12C8 C96A 4E4B CDF5  517A 3EB0 4A34 581C DAF6<br>uid                      Akihiro Suda, on behalf of apt-transport-oci-examples &lt;akihiro.suda.cz@hco.ntt.co.jp&gt;<br>sub   cv25519 2026-04-21 [E]</pre><p>Update the apt cache and install the hello-apt-transport-oci package:</p><pre>sudo apt update<br>sudo apt install hello-apt-transport-oci</pre><p>Confirm it works:</p><pre>$ <strong>hello-apt-transport-oci</strong><br>Hello, apt-transport-oci</pre><h3>Example (for package maintainers)</h3><h4>Packaging dpkg</h4><p>A dpkg file can be created using the traditional <a href="https://man7.org/linux/man-pages/man1/dpkg-deb.1.html">dpkg-deb</a> command with the file tree to be packaged and the DEBIAN/control metadata file as follows:</p><pre>$ <strong>tree hello-apt-transport-oci/</strong><br>hello-apt-transport-oci/<br>├── DEBIAN<br>│   └── control<br>└── usr<br>    └── bin<br>        └── hello-apt-transport-oci<br><br>4 directories, 2 files<br><br>$ <strong>cat hello-apt-transport-oci/DEBIAN/control </strong><br>Package: hello-apt-transport-oci<br>Version: 0.1<br>Architecture: all<br>Maintainer: example@example.com<br>Description: hello apt-transport-oci<br><br><strong>$</strong> <strong>dpkg-deb --build --root-owner-group hello-apt-transport-oci hello-apt-transport-oci_0.1_all.deb</strong><br>dpkg-deb: building package &#39;hello-apt-transport-oci&#39; in &#39;hello-apt-transport-oci_0.1_all.deb&#39;.</pre><p>There are also several package building tools. Notably, <a href="https://project-dalec.github.io/dalec/">Dalec</a> is useful for Docker users, as it is implemented as a custom syntax for Dockerfiles: <a href="https://project-dalec.github.io/dalec/quickstart">https://project-dalec.github.io/dalec/quickstart</a></p><h4>Creating an APT tree</h4><p><a href="https://www.aptly.info/">aptly</a> is convenient for creating the APT repository tree:</p><pre>sudo apt install aptly</pre><pre>aptly repo create hello-apt-transport-oci<br>aptly repo add hello-apt-transport-oci hello-apt-transport-oci_0.1_all.deb<br>aptly publish repo -distribution=stable -architectures=all,amd64,arm64 hello-apt-transport-oci</pre><p>The repository data will be locally published on ~/.aptly/public :</p><pre>$ <strong>tree ~/.aptly/public</strong><br>/home/USER/.aptly/public<br>├── dists<br>│   └── stable<br>│       ├── Contents-all.gz<br>│       ├── Contents-amd64.gz<br>│       ├── Contents-arm64.gz<br>│       ├── InRelease<br>│       ├── main<br>│       │   ├── binary-all<br>│       │   │   ├── Packages<br>│       │   │   ├── Packages.bz2<br>│       │   │   ├── Packages.gz<br>│       │   │   └── Release<br>│       │   ├── binary-amd64<br>│       │   │   ├── Packages<br>│       │   │   ├── Packages.bz2<br>│       │   │   ├── Packages.gz<br>│       │   │   └── Release<br>│       │   ├── binary-arm64<br>│       │   │   ├── Packages<br>│       │   │   ├── Packages.bz2<br>│       │   │   ├── Packages.gz<br>│       │   │   └── Release<br>│       │   ├── Contents-all.gz<br>│       │   ├── Contents-amd64.gz<br>│       │   └── Contents-arm64.gz<br>│       ├── Release<br>│       └── Release.gpg<br>└── pool<br>    └── main<br>        └── h<br>            └── hello-apt-transport-oci<br>                └── hello-apt-transport-oci_0.1_all.deb<br><br>11 directories, 22 files</pre><h4>Pushing to a registry</h4><p>Use <a href="https://oras.land">ORAS</a> (OCI Registry As Storage) to push an APT tree to a registry:</p><pre>sudo apt install oras</pre><pre>cd ~/.aptly/public<br>find . -type f -printf &quot;%p:application/octet-stream\n&quot; \<br>  | xargs oras push ghcr.io/USERNAME/hello-apt-transport:latest</pre><p>See <a href="https://github.com/AkihiroSuda/apt-transport-oci-examples">https://github.com/AkihiroSuda/apt-transport-oci-examples</a> for further details.</p><h3>FAQ: Do I need to use containers?</h3><p>No. The topic is about using the distribution protocol that has been used for containers, not about using containers themselves.</p><h3>FAQ: Why use container image registries?</h3><p>Because GitHub Container Registry is free in both storage size and bandwidth:</p><blockquote><strong>Billing for container image storage</strong>: Container image storage and bandwidth for the Container registry is currently free.</blockquote><blockquote><a href="https://docs.github.com/en/billing/concepts/product-billing/github-packages#free-use-of-github-packages"><em>https://docs.github.com/en/billing/concepts/product-billing/github-packages#free-use-of-github-packages</em></a></blockquote><p>It still has a few limitations, but those limitations are very loose and almost negligible:</p><blockquote>・The Container registry has a 10 GB size limit for each layer.<br>・The Container registry has a 10 minute timeout limit for uploads.</blockquote><blockquote><a href="https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#troubleshooting"><em>https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#troubleshooting</em></a></blockquote><p>For these benefits, Homebrew has also been using GitHub Container Registry since <a href="https://brew.sh/2021/04/12/homebrew-3.1.0/">2021</a>.</p><h4>Why not use GitHub Packages?</h4><p>Because <a href="https://docs.github.com/en/packages/learn-github-packages/introduction-to-github-packages#support-for-package-registries">GitHub Packages</a> does not support APT. The service has been focusing on language package managers such as NPM and Maven so far.</p><h4>Why not use GitHub Pages?</h4><p>Because <a href="https://docs.github.com/en/pages/getting-started-with-github-pages/github-pages-limits">GitHub Pages</a> is intended to be used for Web pages, not for serving arbitrary HTTP(S) content such as APT packages. For that reason, it comes with relatively tight usage limits:</p><blockquote>・Published GitHub Pages sites may be no larger than 1 GB.<br>[...]<br>・GitHub Pages sites have a <em>soft</em> bandwidth limit of 100 GB per month.</blockquote><blockquote><a href="https://docs.github.com/en/pages/getting-started-with-github-pages/github-pages-limits#usage-limits"><em>https://docs.github.com/en/pages/getting-started-with-github-pages/github-pages-limits#usage-limits</em></a></blockquote><h3>FAQ: What about DNF?</h3><p>I didn&#39;t support DNF when I wrote the apt-transport-oci plugin in 2021, because DNF at that time didn&#39;t appear to have a plugin system that is as flexible as APT.</p><p>However, the situation has changed with the release of DNF5 (Fedora 41). Luiz Carvalho at Red Hat recently implemented an experimental DNF5 plugin that enables OCI transport in the same way as apt-transport-oci: <a href="https://github.com/lcarva/libdnf5-oci-plugin">https://github.com/lcarva/libdnf5-oci-plugin</a> . I hope that libdnf5-oci-plugin will eventually be included in Fedora, RHEL, and similar distributions.</p><h3>NTT is hiring!</h3><p>We at <a href="https://www.rd.ntt/e/">NTT</a> are looking for engineers who work in Open Source communities in the fields of containers, etc. Visit &lt;<a href="https://www.rd.ntt/e/sic/recruit/">https://www.rd.ntt/e/sic/recruit/</a>&gt; to see how to join us.</p><p>私たち<a href="https://www.rd.ntt/">NTT</a>は、コンテナなどの領域のオープンソースコミュニティで、共に活動する仲間を募集しています。ぜひ弊社採用情報ページをご覧ください: &lt;<a href="https://www.rd.ntt/sic/recruit/">https://www.rd.ntt/sic/recruit/</a>&gt;</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=532412990318" width="1" height="1" alt=""><hr><p><a href="https://medium.com/nttlabs/ubuntu-26-04-can-install-apt-packages-from-github-container-registry-532412990318">Ubuntu 26.04 can install APT packages from GitHub Container Registry</a> was originally published in <a href="https://medium.com/nttlabs">nttlabs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[gomodjail: library sandboxing for Go modules]]></title>
            <link>https://medium.com/nttlabs/gomodjail-library-sandboxing-for-go-modules-451b22d02700?source=rss-814b1fd299ce------2</link>
            <guid isPermaLink="false">https://medium.com/p/451b22d02700</guid>
            <category><![CDATA[golang]]></category>
            <category><![CDATA[supply-chain-security]]></category>
            <category><![CDATA[sandbox]]></category>
            <category><![CDATA[fosdem]]></category>
            <dc:creator><![CDATA[Akihiro Suda]]></dc:creator>
            <pubDate>Tue, 27 Jan 2026 00:38:02 GMT</pubDate>
            <atom:updated>2026-01-27T00:38:02.776Z</atom:updated>
            <content:encoded><![CDATA[<p>This article introduces <a href="https://github.com/AkihiroSuda/gomodjail/tree/master"><strong>gomodjail</strong></a>, an experimental tool that “jails” Go modules by applying syscall restrictions using seccomp and symbol tables, in order to mitigate potential supply chain attacks and other vulnerabilities.</p><p>In other words, <strong>gomodjail provides a “container” engine for Go modules </strong>but with finer granularity than Docker containers, FreeBSD jails, etc.</p><p><strong>gomodjail focuses on simplicity</strong>; a security policy for gomodjail can be applied just by adding // gomodjail:confined comment to the go.mod file of the target program, and running it with thegomodjail run command.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/963/1*AnLkuGkP62u6RehAWyhWKA.png" /><figcaption><strong>gomodjail focuses on simplicity</strong></figcaption></figure><h3>Background: open source is under attack</h3><p>Software is practically never written from scratch; It’s always assembled from enormous library dependencies, often including open-source ones. This supply chain is under attack:</p><ul><li><a href="https://research.swtch.com/xz-timeline"><strong>xz/liblzma backdoor incident</strong></a><strong> </strong>(2024): A backdoor was injected to xz/liblzma by its maintainer (not by the original author), who had been making harmless contributions to the project for more than two years. <strong>This incident proved that even maintainers of widely adopted libraries cannot be blindly trusted.</strong></li><li><a href="https://socket.dev/blog/wget-to-wipeout-malicious-go-modules-fetch-destructive-payload"><strong>Massive campaign of fake Go modules</strong></a><strong> </strong>(circa 2025-): In spring 2025, hundreds of fake Go modules were published on GitHub. The repositories impersonated genuine ones but contained malicious code. In some cases, the numbers of the GitHub stars even exceeded those of the genuine repositories.</li><li><a href="https://arxiv.org/html/2501.19012v1"><strong>Slopsquatting</strong></a> (circa 2024-): AI coding agents may hallucinate and inject malicious dependencies with plausible package names. Even when an LLM itself doesn’t hallucinate, it can be still deceived by fake sites on the Internet. The <a href="https://x.com/longer_n/status/2014335971505123760">chat session</a> below shows Microsoft Copilot being deceived to suggest downloading a malicious copy of 7-zip from a fake “official” site.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*47q48CW7ZKhzzezWRR0Udw.png" /><figcaption><strong>Microsoft Copilot being deceived to suggest downloading 7-zip from a fake “official” site. </strong>The chat session is from <a href="https://x.com/longer_n/status/2014335971505123760">https://x.com/longer_n/status/2014335971505123760</a></figcaption></figure><h3>Library sandboxing</h3><p><strong>Library sandboxing means confining capabilities of a library so that it cannot perform specific operations, such as reading/writing arbitrary files and executing arbitrary shell commands.</strong> This is similar to containers such as Docker and FreeBSD jails, but it differs from containers in that the confinement applies at the granularity of a library, not an OS process.</p><p>For example, <a href="https://blog.mozilla.org/attack-and-defense/2021/12/06/webassembly-and-back-again-fine-grained-sandboxing-in-firefox-95/">Firefox</a> adopted <a href="https://rlbox.dev">RLBox</a> in 2021 to wrap C library calls in a WebAssembly sandbox. However, library sandboxing hasn’t seen wide adoption, perhaps due to its complexity: sandboxing a library typically takes days, not minutes.</p><blockquote>“On average, sandboxing a library takes only a few days”<br> — <a href="https://www.usenix.org/system/files/sec20_slides_narayan.pdf">https://www.usenix.org/system/files/sec20_slides_narayan.pdf</a></blockquote><h3>Introducing gomodjail</h3><p><a href="https://github.com/AkihiroSuda/gomodjail"><strong>gomodjail</strong></a><strong> is a library sandboxing tool for Go, focusing on simplicity.</strong></p><p>Take a look at <a href="https://github.com/AkihiroSuda/gomodjail/blob/v0.3.0/examples/victim/main.go">examples/victim/main.go</a> :</p><pre>package main<br><br>import (<br> &quot;fmt&quot;<br><br> p &quot;github.com/AkihiroSuda/gomodjail/examples/poisoned&quot;<br>)<br><br>func main() {<br> const x, y = 42, 43<br> fmt.Printf(&quot;%d + %d = %d\n&quot;, x, y, p.Add(x, y))<br>}</pre><p>The code is expected to just print 42 + 43 = 85 without any side effects. However, the <a href="https://github.com/AkihiroSuda/gomodjail/blob/v0.3.0/examples/poisoned/poisoned.go#L9-L25">Add(x, y)</a> function here is poisoned to execute a “malicious” command:</p><pre><strong>$</strong> go build<br><strong>$</strong> ./victim<br>*** ARBITRARY SHELL CODE EXECUTION ***<br><br>This &#39;vi&#39; command was executed by the &#39;github.com/AkihiroSuda/gomodjail/examples/poisoned&#39; module.<br><br>This example is harmless, of course, but suppose that this was a malicious code.<br><br>Type &#39;:q!&#39; to leave this screen.</pre><p>gomodjail can confine this poisoned module so that it cannot execute such commands. It can be applied in just the following two steps:</p><ul><li><strong>Step 1</strong>: Make sure that <a href="https://github.com/AkihiroSuda/gomodjail/blob/v0.3.0/examples/victim/go.mod">go.mod</a> has the comment directive //gomodjail:confined</li></ul><pre>require github.com/AkihiroSuda/gomodjail/examples/poisoned v0.0.0-00010101000000-000000000000 // gomodjail:confined</pre><ul><li><strong>Step 2</strong>: Run the program with the gomodjail run command:</li></ul><pre><strong>$</strong> gomodjail run --go-mod=go.mod -- ./victim<br>level=WARN msg=***Blocked*** syscall=pidfd_open module=github.com/AkihiroSuda/gomodjail/examples/poisoned</pre><h4>How it works</h4><p>gomodjail hooks dangerous syscalls such as open() and execve() using seccomp on Linux, or DYLD_INSERT_LIBRARIES on macOS. When a hooked syscall is executed, gomodjail unwinds the call stack to identify the Go module that invoked the syscall. If the module belongs to a blocklist, gomodjail blocks the syscall and injects EPERM as errno .</p><p>This run-time approach comes with several caveats; notably it is not applicable to modules that import unsafe , reflect , C, etc., since such modules may alter the call stack. A future version of gomodjail may incorporate a compilation-time approach to mitigate these caveats.</p><h3>Meet me at FOSDEM 2026 for further details</h3><p>I’ll talk about gomodjail at <a href="https://fosdem.org/2026/">FOSDEM</a>:</p><ul><li><strong>Title</strong>: <a href="https://fosdem.org/2026/schedule/event/37NC8K-gomodjail/">“gomodjail: library sandboxing for Go modules”</a></li><li><strong>DevRoom</strong>: Go (UB5.132, <a href="https://fosdem.org/2026/schedule/buildings/#u">Building U</a>)</li><li><strong>Date</strong>: February 1, 2026 (Sunday)</li><li><strong>Time:</strong> 12:00–12:30</li></ul><p>Feel free to visit my session for further details about the project.</p><p>Besides that, I’ll also have a session titled <a href="https://fosdem.org/2026/schedule/event/RGCTDY-lima/">“<strong>Lima v2.0: expanding the focus to hardening AI</strong>”</a> on <strong>Saturday (Jan 31, 15:30–16:00)</strong>. Lima has been an <a href="https://github.com/lima-vm/lima/blob/v2.0.3/go.mod#L1">adopter</a> of gomodjail, although gomodjail is not going to be the main topic in the session on Saturday.</p><h3>NTT is hiring!</h3><p>We at <a href="https://www.rd.ntt/e/">NTT</a> are looking for engineers who work in Open Source communities in the fields of software supply chain security, sandboxing, etc. Visit &lt;<a href="https://www.rd.ntt/e/sic/recruit/">https://www.rd.ntt/e/sic/recruit/</a>&gt; to see how to join us.</p><p>私たち<a href="https://www.rd.ntt/">NTT</a>は、ソフトウェアサプライチェーンセキュリティやサンドボックスなどの領域のオープンソースコミュニティで、共に活動する仲間を募集しています。ぜひ弊社採用情報ページをご覧ください: &lt;<a href="https://www.rd.ntt/sic/recruit/">https://www.rd.ntt/sic/recruit/</a>&gt;</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=451b22d02700" width="1" height="1" alt=""><hr><p><a href="https://medium.com/nttlabs/gomodjail-library-sandboxing-for-go-modules-451b22d02700">gomodjail: library sandboxing for Go modules</a> was originally published in <a href="https://medium.com/nttlabs">nttlabs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Alcoholless: A Lightweight Security Sandbox for macOS Programs (Homebrew, AI Agents, etc.)]]></title>
            <link>https://medium.com/nttlabs/alcoholless-lightweight-security-sandbox-for-macos-ccf0d1927301?source=rss-814b1fd299ce------2</link>
            <guid isPermaLink="false">https://medium.com/p/ccf0d1927301</guid>
            <category><![CDATA[google-gemini]]></category>
            <category><![CDATA[sandbox]]></category>
            <category><![CDATA[alcoholless]]></category>
            <category><![CDATA[macos]]></category>
            <category><![CDATA[homebrew]]></category>
            <dc:creator><![CDATA[Akihiro Suda]]></dc:creator>
            <pubDate>Tue, 22 Jul 2025 07:15:14 GMT</pubDate>
            <atom:updated>2025-07-22T07:15:14.767Z</atom:updated>
            <content:encoded><![CDATA[<p>This article introduces <a href="https://github.com/AkihiroSuda/alcless"><strong><em>Alcoholless</em></strong></a><strong>: a lightweight security sandbox for macOS program</strong>s. While Alcoholless was originally made for the sake of securing Homebrew, basically it can be used for almost any CLI programs on macOS. <strong>Notably, Alcoholless is useful for allowing an AI agent to run shell commands with less risk of breaking the host operating system.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/814/1*XxuHqJCqyot6SmaT-1aMYg.png" /><figcaption>AI may hallucinate, or may be deceived by a web search result, to run a malicious command</figcaption></figure><h3>Software supply chain under attack</h3><p>Homebrew is the most popular open source package manager on macOS, with more than 7700 package formulae. The large number of packages is both a strength and a weakness; While having many packages is certainly convenient, it raises doubts about whether the entire source code for all of them has been comprehensively reviewed.</p><p>For example, the notorious xz/liblzma backdoor incident (<a href="https://tukaani.org/xz-backdoor/">CVE-2024–3094</a>) has shown even well-known packages can be compromised, although Homebrew was not affected in this case by chance.</p><p>This kind of supply chain attack may actually happen with any package manager; this month (July 2025) saw <a href="https://socket.dev/blog/npm-phishing-campaign-leads-to-prettier-tooling-packages-compromise">a very sophisticated phishing campaign that compromised several packages on npm</a>.</p><h3>AI agents may make mistakes</h3><p>For better or worse, it is becoming common practice to allow AI agents to run arbitrary shell commands. This practice is extremely dangerous of course; an AI agent may hallucinate, or may be deceived by a web search result, to<a href="https://www.trendmicro.com/vinfo/ae/security/news/cybercrime-and-digital-threats/slopsquatting-when-ai-agents-hallucinate-malicious-packages"> install malware with plausible package names</a>:</p><pre>pip install &lt;PLAUSIBLE_PACKAGE_NAME&gt;</pre><p>To alleviate the risk of running such arbitrary commands, AI agents such as <a href="https://github.com/openai/codex/blob/rust-v0.8.0/codex-cli/src/utils/agent/sandbox/macos-seatbelt.ts#L80-L150">OpenAI Codex CLI </a>and <a href="https://github.com/google-gemini/gemini-cli/blob/v0.1.13/packages/cli/src/utils/sandbox-macos-restrictive-open.sb">Google Gemini CLI</a> utilize Apple’s sandbox-execcommand on macOS.</p><pre>sandbox-exec -f PROFILE COMMAND [ARGS]</pre><p>sandbox-exec supports limiting file access with a profile like:</p><pre>(<strong>allow</strong> file-read*)<br>(<strong>deny</strong> file-write*)<br>(<strong>allow</strong> file-write* (<strong>literal</strong> “/dev/null”))</pre><p>However, sandbox-exec seems to have been deprecated since circa 2016.</p><pre>$ <strong>man sandbox-exec</strong><br>[…]<br>DESCRIPTION<br> The sandbox-exec command is DEPRECATED.<br> Developers who wish to sandbox an app should instead adopt<br> the App Sandbox feature described in the App Sandbox Design Guide.</pre><p>In the manual page, Apple recommends using “<a href="https://developer.apple.com/documentation/security/app-sandbox">App Sandbox</a>” instead, however, App Sandbox doesn’t actually provide the direct replacement for the sandbox-exec command.</p><h3>Introducing Alcoholless</h3><p><a href="https://github.com/AkihiroSuda/alcless">Alcoholless</a> provides a simple CLI to run shell commands with reduced security risk:</p><pre>cd ~/SOME_DIRECTORY<br>alcless brew install xz<br>alcless xz SOME_FILE</pre><p>In the example above, xz works as a separate user with an access for the copy of the current directory. Changed files are synced back to the current directory when the command exits.</p><h4>How it works</h4><p><strong>Alcoholless just utilizes 1990s’ commands</strong> ( su , sudo , rsync) and the macOS equivalent of useradd to implement container-like environments, without extending the XNU kernel to support Linux-style container syscalls. A fun fact is that both su and sudo have to be utilized because sudo can’t fully switch the user context on macOS by itself, <a href="https://github.com/AkihiroSuda/alcless/blob/v0.1.1/README.md#why-wrap-su-inside-sudo">due to Mach’s quirks that are not recognized by POSIX</a>.</p><p>Alcoholless could even harden security if it utilized Apple’s <a href="https://developer.apple.com/documentation/virtualization">Virtualization.framework</a>, however, it doesn’t do that, as <a href="https://github.com/AkihiroSuda/alcless/blob/v0.1.1/README.md#why-not-use-vm">the framework apparently does not provide a way to automate the initialization steps of a macOS VM</a> (accept EULA, skip enabling iCloud, set up SSH, etc.).</p><p>This barrier is annoying, but it also comes with several bonuses in avoiding virtualization:</p><ul><li>No performance overhead</li><li>Minimal disk consumption</li><li>Direct access to the host hardware (GPU, etc.)</li><li>Works fine on GitHub Actions (no nested virtualization)</li></ul><h3>Getting started</h3><p><a href="https://github.com/AkihiroSuda/alcless">Alcoholless</a> can be installed from the source code as follows:</p><pre>brew install go<br>git clone https://github.com/AkihiroSuda/alcless.git<br>cd alcless<br>git checkout v0.1.1<br>make<br>sudo make install</pre><p>Alternatively, you can also download binary packages from &lt;<a href="https://github.com/AkihiroSuda/alcless/releases">https://github.com/AkihiroSuda/alcless/releases</a>&gt;.</p><p>For the first run, the alclessctl create default command has to be executed. You’ll be asked to type the password to create the new user account alcless_${USER}_default :</p><pre>$ <strong>alclessctl create default</strong><br>7:41PM INF Creating an instance instance=default instUser=alcless_user_default<br>⚠️  The following commands will be executed:<br>sudo sysadminctl -addUser alcless_user_default -password -<br>sudo chmod go-rx /Users/alcless_user_default<br>sudo sh -c &#39;echo &#39;&quot;&#39;&quot;&#39;suda ALL=(root) NOPASSWD: /usr/bin/su - alcless_user_default -c *&#39;&quot;&#39;&quot;&#39; &gt;&#39;&quot;&#39;&quot;&#39;/etc/sudoers.d/alcless_user_default&#39;&quot;&#39;&quot;&#39;&#39;<br>❓ Press return to continue, or Ctrl-C to abort<br><strong>[RETURN]</strong><br>CONTINUE<br>7:42PM INF Running command cmd=&quot;sudo sysadminctl -addUser alcless_user_default -password -&quot;<br>2025-07-21 19:42:06.758 sysadminctl[37537:5738895] ----------------------------<br>2025-07-21 19:42:06.758 sysadminctl[37537:5738895] No clear text password or interactive option was specified (adduser, change/reset password will not allow user to use FDE) !<br>2025-07-21 19:42:06.758 sysadminctl[37537:5738895] ----------------------------<br>User password: <strong>[PASSWORD]</strong><br>[...]</pre><p>You may also have to verify that your home directory has restrictive permissions:</p><pre>$ <strong>ls -ld ~</strong><br>drwxr-x---+ 55 user staff  1760  7 22 05:19 /Users/user/<br><br>$ <strong>chmod 700 ~</strong><br><br>$ <strong>ls -ld ~</strong><br>drwx------+ 55 user staff  1760  7 22 05:19 /Users/user/</pre><h4>Basic usage</h4><p>After the completion of alcless create default, you have to cd to a directory that is to be synced to the alcess_${USER}_default environment:</p><pre>mkdir -p ~/tmp<br>cd ~/tmp<br># Create some content in the current project directory<br>echo foo &gt;foo</pre><p>Then you can install and run a Homebrew package such as xz :</p><pre>$ <strong>alcless brew install xz</strong><br>[...]<br><br>$ <strong>alcless xz foo</strong><br>7:44PM INF ➡️Syncing the files src=/Users/user/tmp/ dst=default:/Users/alcless_user_default/Users/user/tmp<br>7:44PM INF ⬅️Syncing the files back (dry run) src=default:/Users/alcless_user_default/Users/user/tmp/ dst=/Users/user/tmp<br>*deleting foo<br>.d..t.... ./<br>&gt;f+++++++ foo.xz<br>7:44PM INF ⬅️Syncing the files back src=default:/Users/alcless_user_default/Users/user/tmp/ dst=/Users/user/tmp<br>⚠️  The following commands will be executed:<br>rsync -rai --delete -e &#39;/usr/local/bin/alclessctl shell --workdir=/ --plain&#39; default:/Users/alcless_user_default/Users/user/tmp/ /Users/user/tmp<br>❓ Press return to continue, or Ctrl-C to abort<br><strong>[RETURN]</strong><br>CONTINUE<br>*deleting foo<br>.d..t.... ./<br>&gt;f+++++++ foo.xz</pre><p>alcless syncs the current directory /Users/${USER}/tmp to /Users/alcless_${USER}_default/Users/${USER}/tmp , executes the specified command using the user credential of alcless_${USER}_default, and syncs back the directory with a confirmation prompt.</p><h4>Usage with Gemini</h4><p><a href="https://github.com/google-gemini/gemini-cli">Gemini CLI</a> can be installed in Alcoholless as follows:</p><pre>alcless brew install gemini-cli</pre><p>This command may take around 10 minutes due to recompilation of several dependency packages such as node. This recompilation is needed as the Homebrew prefix (/Users/alcless_${USER}_default/homebrew) differs from the standard installation of Homebrew (/opt/homebrew).</p><p>Then add your <a href="https://aistudio.google.com/app/apikey">GEMINI_API_KEY</a> to .zshenv inside Alcoholless:</p><pre>alcless sh -c &#39;vi ~/.zshenv&#39;</pre><p>Now you can run Gemini CLI and let it run arbitrary shell commands inside Alcoholless:</p><pre>$ <strong>alcless gemini</strong><br><br>╭─────────────────────────────────────────────────────────────╮<br>│  &gt; <strong>Install a Python package that shows the current weather  </strong>│<br>╰─────────────────────────────────────────────────────────────╯<br><br> ╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮<br> │ ✔  GoogleSearch Searching the web for: &quot;python package current weather&quot;                              │<br> │                                                                                                      │<br> │    Search results for &quot;python package current weather&quot; returned.                                     │<br> ╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯<br>✦ I will install the python-weather package, which allows you to get the current weather for a location.<br> ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────╮<br> │ ?  Shell pip install python-weather (Install the `python-weather` package using pip.) ←                 │<br> │                                                                                                         │<br> │   pip install python-weather                                                                            │<br> │                                                                                                         │<br> │ Allow execution?                                                                                        │<br> │                                                                                                         │<br> │ ● 1. Yes, allow once                                                                                    │<br> │   2. Yes, allow always &quot;pip ...&quot;                                                                        │<br> │   3. No (esc)                                                                                           │<br> │                                                                                                         │<br> ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯</pre><p>It should be noted that Alcoholless is not a panacea; <strong>it is still highly recommended to review AI-generated commands before execution.</strong><br>While the AI is unlikely able to steal or falsify a file outside the working directory (unless macOS has a bug around the user isolation), it might be still able to exploit other attacks such as cryptomining or denial-of-service.</p><h3>NTT is hiring!</h3><p>We at <a href="https://www.rd.ntt/e/">NTT</a> are looking for engineers who work in Open Source communities in the fields of software supply chain security, AI sandboxing, etc. Visit &lt;<a href="https://www.rd.ntt/e/sic/recruit/">https://www.rd.ntt/e/sic/recruit/</a>&gt; to see how to join us.</p><p>私たち<a href="https://www.rd.ntt/">NTT</a>は、ソフトウェアサプライチェーンセキュリティやAIサンドボックスなどの領域のオープンソースコミュニティで、共に活動する仲間を募集しています。ぜひ弊社採用情報ページをご覧ください: &lt;<a href="https://www.rd.ntt/sic/recruit/">https://www.rd.ntt/sic/recruit/</a>&gt;</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ccf0d1927301" width="1" height="1" alt=""><hr><p><a href="https://medium.com/nttlabs/alcoholless-lightweight-security-sandbox-for-macos-ccf0d1927301">Alcoholless: A Lightweight Security Sandbox for macOS Programs (Homebrew, AI Agents, etc.)</a> was originally published in <a href="https://medium.com/nttlabs">nttlabs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[containerd v2.1, nerdctl v2.1, and Lima v1.1]]></title>
            <link>https://medium.com/nttlabs/containerd-v2-1-nerdctl-v2-1-and-lima-v1-1-74400ee87c2c?source=rss-814b1fd299ce------2</link>
            <guid isPermaLink="false">https://medium.com/p/74400ee87c2c</guid>
            <category><![CDATA[lima-vm]]></category>
            <category><![CDATA[containerd]]></category>
            <category><![CDATA[nerdctl]]></category>
            <dc:creator><![CDATA[Akihiro Suda]]></dc:creator>
            <pubDate>Thu, 12 Jun 2025 10:59:02 GMT</pubDate>
            <atom:updated>2025-06-12T11:29:47.377Z</atom:updated>
            <content:encoded><![CDATA[<p>This post highlights the updates in <a href="https://github.com/containerd/containerd/releases/tag/v2.1.0">containerd</a> v2.1, <a href="https://github.com/containerd/nerdctl/releases/tag/v2.1.0">nerdctl</a> (contaiNERD CTL) v2.1, and <a href="https://github.com/lima-vm/lima/releases/tag/v1.1.0">Lima</a> v1.1, all released last month.</p><p>See also my previous post on containerd v2.0, nerdctl v2.0, and Lima v1.0 (Nov 2024):</p><p><a href="https://medium.com/nttlabs/containerd-v2-0-nerdctl-v2-0-lima-v1-0-93026b5839f8">containerd v2.0, nerdctl v2.0, and Lima v1.0</a></p><h3>containerd v2.1</h3><p><a href="https://github.com/containerd/containerd">containerd</a> is the industry’s standard container runtime used by Docker and several Kubernetes-based products such as Azure Kubernetes Service (AKS), Amazon Elastic Kubernetes Service (EKS), and Google Kubernetes Engine (GKE).</p><p><a href="https://medium.com/nttlabs/the-internals-and-the-latest-trends-of-container-runtimes-2023-22aa111d7a93">The internals and the latest trends of container runtimes (2023)</a></p><p><a href="https://github.com/containerd/containerd/releases/tag/v2.1.0">containerd v2.1 </a>introduces several improvements, particularly in filesystem support:</p><ul><li><a href="https://github.com/containerd/containerd/pull/10705"><strong>Support for EROFS (Enhanced Read-Only File System)</strong></a>. <a href="https://github.com/containerd/containerd/pull/10705#issuecomment-2579222912">Efficient for images with many layers.</a> See <a href="https://github.com/containerd/containerd/blob/release/2.1/docs/snapshotters/erofs.md">here</a> for the usage.</li><li><a href="https://github.com/containerd/containerd/pull/10579"><strong>Mounting a container image as a Kubernetes volume</strong></a>. Allows separating the application code image from the data images (e.g., AI models). See <a href="https://kubernetes.io/blog/2024/08/16/kubernetes-1-31-image-volume-source/">here</a> for the usage.</li><li><a href="https://github.com/containerd/containerd/pull/11131"><strong>Writable cgroupfs (</strong></a><a href="https://github.com/containerd/containerd/pull/11131"><strong>/sys/fs/cgroup) without the root privilege</strong></a>. Enables containers to self-control computation resources (CPU time, memory limits, etc.) for its descendant processes. See <a href="https://docs.gitlab.com/administration/gitaly/kubernetes/#enable-cgroup_writable-field-in-containerd">here</a> for the usage.</li></ul><p>Aside from the filesystem enhancements, this release also<a href="https://github.com/containerd/containerd/pull/10722"><strong> improves the support UserNS-Remap mode</strong></a> by allowing non-contiguous UID mapping ranges (e.g., uidmap=0:666:1000,1000:6666:64536).</p><h3>nerdctl v2.1</h3><p><a href="https://github.com/containerd/nerdctl">nerdctl</a> (<em>contaiNERD CTL</em>) is a Docker-like command line interface tool for containerd.</p><p><a href="https://github.com/containerd/nerdctl/releases/tag/v2.1.0">nerdctl v2.1</a> adds the support for <a href="https://github.com/containerd/nerdctl/pull/3941"><strong>UserNS-Remap mode</strong></a>, which balances security and performance between <a href="https://rootlesscontaine.rs">Rootless</a> and Rootful modes:</p><ul><li><strong>Rootless</strong>: executes everything as a non-root user. Network performance is limited by default (but can be accelerated via the experimental<a href="https://github.com/rootless-containers/bypass4netns"> bypass4netns</a>)</li><li><strong>UserNS-Remap</strong>: executes containers as a non-root, but containerd per se still runs as the root.</li><li><strong>Rootful</strong>: executes everything as the root user.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Z3IYCRpudiHqGDNUI4lhsg.png" /><figcaption>Rootless vs. UserNS-Remap vs. Rootful</figcaption></figure><p>UserNS-Remap mode has been supported in the containerd daemon for a long time, however, it was not supported in nerdctl so far.</p><p>nerdctl v2.1 also brings the <a href="https://github.com/containerd/nerdctl/pull/4012">experimental support</a> for <a href="https://github.com/AkihiroSuda/gomodjail"><strong>gomodjail: Jail for Go Modules</strong></a>. gomodjail imposes syscall restrictions on a specific set of Go modules so as to mitigate potential vulnerabilities and supply chain attacks (<a href="https://github.com/AkihiroSuda/gomodjail/blob/v0.1.2/README.md#caveats">some caveats apply</a>).</p><p>In thego.mod snippet below, most of the dependency modules (e.g., github.com/Masterminds/semver/v3 ) are confined so that they cannot execute commands or open new files.</p><pre>//gomodjail:confined<br>module github.com/containerd/nerdctl/v2<br><br>require (<br>    github.com/Masterminds/semver/v3 v3.3.1<br>    ...<br>    golang.org/x/sys v0.31.0 //gomodjail:unconfined<br>    ...<br>)</pre><p>gomodjail is enabled in the nerdctl.gomodjail binary included in <a href="https://github.com/containerd/nerdctl/releases/tag/v2.1.2">the </a><a href="https://github.com/containerd/nerdctl/releases/tag/v2.1.2">nerdctl-full distribution</a>. Usage is identical to nerdctl :</p><pre>nerdctl.gomodjail run hello-world</pre><h3>Lima v1.1</h3><p><a href="https://lima-vm.io/">Lima</a> is a command line utility for creating Linux virtual machines. Lima was originally made with an opinionated focus on running containerd and nerdctl on desktop operating systems, however, its current scope is extended to support non-container workloads as well. In that sense, Lima is more comparable to WSL2 and Vagrant than to Docker Desktop and <a href="https://github.com/apple/container">Apple’s Containerization</a> (to appear in macOS 26).</p><p><a href="https://github.com/lima-vm/lima/releases/tag/v1.1.0">Lima v1.1</a> adds support for <strong>inheritance and composition of template files</strong>. With the new syntax ( base ), a template with a custom provision command can be written as follows:</p><pre>base:<br>- template://_images/ubuntu-lts<br>- template://_default/mounts<br><br>provision:<br>- mode: system<br>  script: |<br>    #!/bin/bash<br>    set -eux<br>    apt-get install -y build-essential</pre><p>In previous versions, you had to duplicate the entire content of the ubuntu-lts template in your own template.</p><p>Other notable updates in Lima v1.1 include:</p><ul><li><a href="https://lima-vm.io/docs/config/port/">New port forwarder implementation by default</a>. Faster and supports both TCP and UDP.</li><li>Support for DragonFly BSD hosts</li><li>Support for S390X and PPC64LE guests</li><li>The lima package is now split to lima and lima-additional-guestagents . The latter one is needed only for running a guest with a non-native architecture (e.g., Intel on ARM).</li></ul><h3>Visit the containerd maintainers at KubeCon Japan</h3><p>Some containerd maintainers, including myself, will be presenting at <a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-japan/">KubeCon Japan 2025</a>:</p><h4><strong>Tuesday, June 17, 2025, 16:30–17:00 JST</strong></h4><p><a href="https://kccncjpn2025.sched.com/event/1x6zq/containerd-project-update-and-deep-dive-akihiro-suda-kohei-tokunaga-ntt-kirtana-ashok-microsoft-akhil-mohan-vmware-by-broadcom"><strong>containerd: Project Update and Deep Dive</strong> — Akihiro Suda &amp; Kohei Tokunaga (NTT), Kirtana Ashok (Microsoft), Akhil Mohan (VMware by Broadcom</a>)</p><h3>NTT is hiring!</h3><p>We at <a href="https://www.rd.ntt/e/">NTT</a> are looking for engineers who work in Open Source communities in the fields of containers, etc. Visit &lt;<a href="https://www.rd.ntt/e/sic/recruit/">https://www.rd.ntt/e/sic/recruit/</a>&gt; to see how to join us.</p><p>私たち<a href="https://www.rd.ntt/">NTT</a>は、コンテナなどの領域でのオープンソースコミュニティで共に活動する仲間を募集しています。ぜひ弊社採用情報ページをご覧ください: &lt;<a href="https://www.rd.ntt/sic/recruit/">https://www.rd.ntt/sic/recruit/</a>&gt;</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=74400ee87c2c" width="1" height="1" alt=""><hr><p><a href="https://medium.com/nttlabs/containerd-v2-1-nerdctl-v2-1-and-lima-v1-1-74400ee87c2c">containerd v2.1, nerdctl v2.1, and Lima v1.1</a> was originally published in <a href="https://medium.com/nttlabs">nttlabs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[なぜオープンソースソフトウェアにコントリビュートすべきなのか]]></title>
            <link>https://medium.com/nttlabs/why-you-should-contribute-to-open-source-software-06064db030a0?source=rss-814b1fd299ce------2</link>
            <guid isPermaLink="false">https://medium.com/p/06064db030a0</guid>
            <category><![CDATA[オープンソース]]></category>
            <dc:creator><![CDATA[Akihiro Suda]]></dc:creator>
            <pubDate>Sun, 30 Mar 2025 23:31:37 GMT</pubDate>
            <atom:updated>2025-03-30T23:31:37.952Z</atom:updated>
            <content:encoded><![CDATA[<p>NTTの須田です。2024年9月に開催された <a href="https://wakate.org/2024/08/13/57th-general/">第57回 情報科学若手の会</a> にて、「なぜオープンソースソフトウェアにコントリビュートすべきなのか」と題して招待講演させていただきました。講演内容をブログとして再編成しました。</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XjCqY7CPYTV4PCd3nDxNyA.jpeg" /><figcaption><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/refs/heads/master/slides/2024/20240914%20%5Bwakate.org%20(Japanese)%5D%20Why%20you%20should%20contribute%20to%20OSS.pdf">講演資料 (PDF)</a></figcaption></figure><h3>なぜOSSにコントリビュートすべきなのか</h3><p>結論から言うと、主に<strong>OSSの持続可能性</strong>のためです。</p><p>OSSは「タダ飯」(free lunch) であるかの如く、対価を支払うことなく消費されがちです。ミートアップなどで提供される実際の「タダ飯」🍕🍣とは異なり、遠慮なく好きなだけ食べても他の人の迷惑にはなりませんが、この「タダ飯」を提供する側のことを誰かが気にかけていないと次の問題が生じます:</p><ul><li>「タダ飯」が出てこなくなる (OSSの開発が停滞する)</li><li>毒入りの「タダ飯」が出てくる (OSSにマルウェアが混入する)</li></ul><p>前者はましな方で、後者が特に事業や社会にとっての脅威となります。後述しますが、xz・liblzmaのように広く普及しているOSSにさえ、バックドアが混入する事件が実際に起こっています。</p><p>結局、<strong>OSSは「タダ飯」ではありません</strong> (<a href="https://www.asahi.com/english/weekly/memo/memo061219.html">No such thing as a free lunch</a>)。持続的に安全に利用するには、対価、すなわちコントリビューションが不可欠です。</p><h3>OSSとは</h3><p>まず、そもそもOSSとは何かについて見ていきます。誤解されがちですが、OSSとは「無料で使えるソフトウェア」(freeware) のことではありません。</p><p>非営利公益法人 Open Source Initiative は<a href="https://opensource.org/osd">OSSの定義</a> (Open Source Definition, v1.9) として10個の条件を挙げています。主に次の内容が含まれます:</p><ul><li>再配布の自由</li><li>ソースコード形式での配布</li><li>派生ソフトウェアの作成・配布の自由</li><li>利用目的制限の禁止</li></ul><p>無料で使えるソフトウェアであっても、ソースコードが公開されていなかったり、利用目的に制限があったりするソフトウェアはOSSではありません。</p><h4>OSS と Free Software</h4><p>OSSと非常によく似た概念として、Richard M. Stallman 氏らが唱える <a href="https://www.fsf.org/about/what-is-free-software">Free Software</a> があります。Free Software もまた、「無料で使えるソフトウェア」(freeware) のことではありません。Free Software は日本語では「自由(な)ソフトウェア」と訳されます。なお、日本語の「フリーソフト」はFree Softwareのことではなく、単なる無料ソフトを指すことが多いようです。</p><p>OSS と Free Software は実践上はほとんど区別がつかないこともありますが、異なる動機に基づいています。Free Software が「<a href="https://www.gnu.org/philosophy/open-source-misses-the-point.en.html">自由と正義のための運動</a>」(movement for freedom and justice) であると位置付けられているのに対し、OSS では理念よりも実利が重視されています。</p><p>両者に思想的な違いはありますが、まとめて FOSS (Free and Open Source Software) と呼ばれたり、「自由」(libre) の側面を強調して <a href="https://www.gnu.org/philosophy/floss-and-foss.en.html">FLOSS (Free/Libre and Open Source Software) </a>と呼ばれたりもします。</p><h4>主要なOSSライセンス</h4><p>一口にOSSといっても、そのライセンスには様々なものがあります。ライセンスが異なるソフトウェアは、混ぜて良いこともありますし、混ぜてはいけないこともあります。例えば、Apache License v2.0 を採用するソフトウェアに MIT License のコードを混ぜても問題ありませんが、GPL のコードを混ぜてしまうとApache License v2.0 での配布を継続できなくなります。近年では、GitHub Copilot などのコーディング支援AI の普及により、意図せず他ライセンスのコードが混ぜられることも懸念されつつあります。</p><p>また、利用目的に制限を加えるなど、OSSに似て非なるライセンスも多数存在し、OSSとの混同が問題視されています。</p><p>本記事ではOSSへのコントリビュートを推奨していますが、<strong>ライセンスについて最低限の知識を得るまでは、OSSにコントリビュートしてはいけません</strong>。ライセンスを無視してコードを切り貼りされると、ソフトウェアの配布を継続できなくなることもあるため、むしろ迷惑になります。</p><p>以下では主要なOSSライセンスをいくつか紹介しますが、ここでは特徴的な点についてのみ触れます。<strong>利用にあたっては、必ずライセンス原文をご参照ください。</strong></p><ul><li><a href="https://spdx.org/licenses/MIT.html"><strong>MIT License</strong></a>: 制約が緩いことで知られます。プロプライエタリな派生物の作成も許可しています。<a href="https://github.com/rails/rails/blob/v8.0.2/MIT-LICENSE">Ruby on Rails</a> や <a href="https://github.com/nodejs/node/blob/v23.10.0/LICENSE">Node.js</a> など多数のプロジェクトで採用されており、GitHub上で最もよく使われているライセンス (<a href="https://github.blog/open-source/open-source-license-usage-on-github-com/">44.69%、2021年</a>) とも言われています。類似ライセンスに<a href="https://spdx.org/licenses/X11.html">X11 License</a>、<a href="https://spdx.org/licenses/BSD-2-Clause.html">BSD License (2-Clause)</a>、<a href="https://spdx.org/licenses/ISC.html">ISC License</a> などがあります。</li><li><a href="https://spdx.org/licenses/Apache-2.0.html"><strong>Apache License v2.0</strong></a>: MIT License に似ていますが、特許ライセンスを明示的に付与している点で異なっています。ユーザが特許訴訟を起こすと、ユーザ自身に付与された特許ライセンスは終了します。<a href="https://github.com/apache/httpd/blob/2.4.63/LICENSE">Apache HTTP Server</a>、<a href="https://github.com/moby/moby/blob/v28.0.4/LICENSE">Docker Engine (Moby)</a>、<a href="https://github.com/kubernetes/kubernetes/blob/v1.32.3/LICENSE">Kubernetes</a> など多数のプロジェクトで採用されています。</li><li><a href="https://spdx.org/licenses/GPL-2.0-only.html"><strong>GPL (GNU General Public License) v2</strong></a>: バイナリを受け取ったユーザへのソースコード再配布を義務付けている点が特徴です。<a href="https://github.com/torvalds/linux/blob/v6.14/COPYING">Linux</a> や <a href="https://github.com/git/git/blob/v2.49.0/COPYING">git</a> などで採用されています。</li><li><a href="https://spdx.org/licenses/GPL-3.0-only.html"><strong>GPL v3</strong></a>: GPL v2の後継ライセンスです。改変したソフトウェアをインストーすることの妨害 (“<a href="https://www.gnu.org/philosophy/tivoization.en.html">Tivoization</a>”) の禁止などの条項を追加しています。<a href="https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/main.cc;hb=releases/gcc-14.2.0#l8">GCC</a> や <a href="https://git.savannah.gnu.org/cgit/emacs.git/tree/COPYING?h=emacs-30.1">Emacs</a> など多くのGNUプロジェクトはGPL v2 から v3 に移行しましたが、Linux は v3 への移行を予定していません。<a href="https://lkml.org/lkml/2007/6/13/289">”Tivoization” を禁止する意図がない</a>ためです。</li><li><a href="https://spdx.org/licenses/AGPL-3.0-only.html"><strong>AGPL (Affero GPL) v3</strong></a>: バイナリを受け取っていないSaaSユーザに対してもソースコードを再配布することを義務付けています。<a href="https://github.com/mastodon/mastodon/blob/v4.3.6/LICENSE">Mastodon</a> や <a href="https://github.com/ONLYOFFICE/DocumentServer/blob/v8.3.2/LICENSE.txt">OnlyOffice</a> などで採用されています。</li><li><a href="https://spdx.org/licenses/LGPL-2.1-only.html"><strong>LGPL (Lesser GPL) v2.1</strong></a><strong>、</strong><a href="https://spdx.org/licenses/LGPL-3.0-only.html"><strong>v3</strong></a>: GPLと異なり、LGPLを採用しているライブラリは動的ライブラリとして呼び出す場合にはライセンスが「感染」しません。<a href="https://sourceware.org/git/?p=glibc.git;a=blob;f=COPYING.LIB;hb=glibc-2.41.9000">glibc</a> や <a href="https://gitlab.gnome.org/GNOME/glib/-/blob/2.84.0/COPYING?ref_type=tags">glib</a> などで採用されています。</li></ul><h4>OSSライセンスに似た非OSSライセンス</h4><p>OSSライセンス と混同されがちな、非OSSライセンスも紹介します。</p><ul><li><a href="https://spdx.org/licenses/BUSL-1.1.html"><strong>BUSL (Business Source License) v1.1</strong></a><strong> </strong>: OSSライセンスと異なり、商用利用を禁じています。ただし、一定期間後に別のライセンス(大抵はOSSライセンス)が適用されます。元々は <a href="https://mariadb.com/bsl-faq-mariadb/">MariaDB MaxScale</a> のために作られたライセンスで、<a href="https://github.com/hashicorp/terraform/pull/33661/commits/b145fbcaadf0fa7d0e7040eac641d9aef2a26433">Terraform</a> (2023年8月以降) や <a href="https://github.com/hashicorp/vagrant/pull/13248/commits/731e9cb6c48df71d8c3f8d2b929dd35875250d2d">Vagrant</a> (同) でも採用されています。</li><li><a href="https://spdx.org/licenses/SSPL-1.0.html"><strong>SSPL (Server Side Public License) v1.0</strong></a>: AGPL v3 に類似しますが、当該のソフトウェアのみならず、提供するサービス全体のソースコード開示を要求しています。元々は<a href="https://github.com/mongodb/mongo/commit/5851c894963cb2d675f2c0628e2dc782e23e65a9">MongoDB</a> (2018年10月以降) のために作られたライセンスで、<a href="https://github.com/elastic/elasticsearch/commit/a92a647b9f17d1bddf5c707490a19482c273eda3">Elasticsearch</a> (2021年2月以降) や <a href="https://github.com/redis/redis/pull/13157/commits/18721a442e635fbcbf37d9368edfcbc04c688fa4">Redis</a> (2024年3月以降)でも採用されています。Elasticsearch や Redis では他のライセンスも併用しています。</li><li><a href="https://www.llama.com/llama3/license/"><strong>Llama 3 Community License</strong></a>: ユーザ数や使用目的に制限を課しています。Llama 3 で採用されています。</li></ul><p>これらの非OSSライセンスを採用するプロプライエタリソフトウェアも、OSS を自称していたり、あるいはマスコミなどによってOSSと混同されていたりすることがあるため注意が必要です。</p><h4>OSS略史</h4><p>OSSとは何かを理解するためには、OSSが定義されるまでの歴史を把握する必要があります。</p><p>OSSの歴史がいつ始まったかの問いに答えるのは容易ではありませんが、OSSの文脈での “open source” との表記は1998年まで確認できません。さらに古い用例 <a href="https://groups.google.com/g/no.linux/c/1UZo-3iv0tM">“Caldera Announces Open-Source Code Model for DOS”</a> (1996)も見受けられますが、1998年以降の用法とは意味が異なっています。</p><p>ただし、OSSに似通ったソフトウェア配布形態は、20世紀半ばの電子計算機黎明期から既に見られます。そもそも、<a href="https://digital-law-online.info/lpdi1.0/treatise17.html">1960年代半ば(米国の場合)までソフトウェアの著作権自体が確立していませんでした</a>。それ以降のソフトウェアはプロプライエタリな著作物としての配布が進みました。</p><ul><li><strong>1974年</strong>: UNIX のAT&amp;T (米国電信電話会社)社外への提供が始まりました。UNIXは後に広く普及したOSですが、この時点での提供先はごく少数に限られたようです。当時のUNIXは無料ではありましたが、<a href="https://www.nokia.com/bell-labs/about/dennis-m-ritchie/licenses.html">その利用は自由ではなく、学術・教育目的に限られていました</a>。翌年には有償化されました。UNIXはやがて、AT&amp;T系の System V と、カリフォルニア大学バークリー校系の Berkeley Software Distribution (BSD) とに分かれましたが、BSDもAT&amp;T や関連会社の著作物を含んでいました。</li><li><strong>1983年</strong>: Richard M. Stallman 氏が<a href="https://www.gnu.org/gnu/initial-announcement.en.html"> “<em>Free Unix!</em>” </a>を標語とし、<a href="https://www.gnu.org/">GNU</a> (Gnu’s Not Unix) プロジェクトを創設しました。カーネルを含む完全なUNIX互換OSの開発を目指しましたが、実際には bash などのユーザ空間のみが普及し、今日でも広く使われています。カーネルの開発にも1985年頃には着手していたものの、一度失敗しています。1990年頃にMach 3 マイクロカーネルをベースとして開発が始まった<a href="https://www.gnu.org/software/hurd/history.html">GNU Hurd</a> は、普及には至っていないものの、現在でも開発が続いています。</li></ul><blockquote>“I consider that the golden rule requires that if I like a program I must share it with other people who like it.”<br><em> —</em><a href="https://www.gnu.org/gnu/initial-announcement.en.html"><em> Richard M. Stallman (September 27, 1983)</em></a></blockquote><ul><li><strong>1988年</strong>: 初のfreeなBSD (FreeBSDではない) を目指した<a href="https://diswww.mit.edu/MENELAUS.MIT.EDU/zbugs/140">4.3BSD Net/1</a> がリリースされました。AT&amp;T 関連の著作物を除去したことにより、自由な再配布が可能になったとされていましたが、除去が不十分であったため1992年には訴訟に至りました。4.4BSD Lite (1994) にて、改めて自由な再配布が可能になりました。Net/1 の系譜は、Net/2 (1991) から 386BSD (1992) を経て、NetBSD (1993) や FreeBSD (1993) に連なります。NetBSDやFreeBSDは、後に4.4BSD Lite のコードを元にして書き直されました。</li><li><strong>1991年</strong>: <a href="https://www.kernel.org/pub/linux/kernel/Historic/">Linux v0.01</a> がリリースされました。<a href="https://cdn.kernel.org/pub/linux/kernel/Historic/old-versions/RELNOTES-0.01">当初は有償再配布を厳格に禁じており、「無料」ではあっても「自由」なソフトウェアではありませんでした</a>。Linux v0.12 (1992) にてGPL v2 を採用し、晴れて「自由」ソフトウェアとなりました。Linuxは初のfreeなUNIX互換OSというわけではありませんが、GNUは技術的に、BSDは法的に難航している間に、漁夫の利を得て勢力を築きました。なお、この時点での Linux は Linus Torvalds 氏の趣味として開発されており、大規模なOSとなることは想定されていませんでした。</li></ul><blockquote>“just a hobby, won’t be big and professional like gnu”<br><em> — </em><a href="https://lwn.net/2001/0823/a/lt-announcement.php3"><em>Linus Torvalds (August 25, 1991)</em></a></blockquote><ul><li><strong>1997年</strong>: Eric S. Raymond 氏が講演「<a href="https://cruel.org/freeware/cathedral.html">伽藍とバザール</a>」(<a href="http://www.catb.org/%7Eesr/writings/cathedral-bazaar/">The Cathedral and Bazaar</a>) にて、閉鎖的な「伽藍」型 (GNU Emacs等) と、開放的な「バザール」型 (Linux 等) の開発モデルとを比較し、後者の優位性を指摘しました。なお、混同されがちですが、「伽藍」がプロプライエタリなソフトウェアを、「バザール」がOSSを意味するわけでは<strong>ありません</strong>。</li><li><strong>1998年</strong>: 当時広く使われていたWebブラウザ Netscape Communicator の開発元である Netscape 社が、<a href="https://web.archive.org/web/19980127155653/http:/home.netscape.com/newsref/pr/newsrelease558.html">次期製品のソースコードを公開すると発表</a>しました。この時に公開されたソースコードの大半は一旦破棄されましたが、<a href="http://www.andrewturnbull.net/mozilla/history.html">紆余曲折を経て</a>今日の Mozilla Firefox に繋がっています。Netscapeのソースコード公開は、前述の「伽藍とバザール」の<a href="http://www.catb.org/esr/writings/homesteading/cathedral-bazaar/ar01s13.html">影響を受けたもの</a>とされています。この時点では ”<em>free source distribution with a license which allows source code modification and redistribution</em>” との文言が使われており、未だ “open source” とは呼ばれていませんでしたが、まもなく Christine Peterson 氏らにより<a href="https://opensource.org/history"> “open source”</a> の表現が提案されました。”free software “の表現は既に長らく存在していましたが、「自由」ではなく「無料」ソフトウェアのように解釈されがちなことが<a href="https://opensource.com/article/18/2/coining-term-open-source-software">問題視</a>されたようです。</li><li><strong>1999年</strong>: OSS向けホスティングサービスとして <a href="https://sourceforge.net">SourceForge</a> が始まりました。以後、OSSコミュニティが活発化しました。この頃には、Linux 等の主要OSSの商用導入も進みました。2000年には、後の <a href="https://docs.fedoraproject.org/en-US/quick-docs/fedora-and-red-hat-enterprise-linux/index.html#_history_of_red_hat_enterprise_linux_and_fedora">Red Hat Enterprise Linux</a> や <a href="https://web.archive.org/web/20010605154144/http://www.suse.de/en/produkte/susesoft/s390/S390release.html">SUSE Linux Enterprise Server</a> に繋がる製品が発売されました。</li><li><strong>2007年</strong>: Linus Torvalds 氏を雇用する非営利団体 Open Source Development Labs (OSDL) が Free Standards Group (FSG) と合併し、<a href="https://www.linuxfoundation.org">Linux Foundation</a> となりました。Linux Foundation は Linux に限らず、<a href="https://www.linuxfoundation.org/projects">極めて多数の OSS プロジェクト</a>の開発を推進しています。</li><li><strong>2008年</strong>: <a href="https://github.com">GitHub</a> のサービスが開始し、OSSコミュニティの一層の活発化が進みました。</li><li><strong>2000年代末</strong>: OSSをかつては癌 (cancer) とも呼んでいた Microsoft 社が、<a href="https://www.networkworld.com/article/735210/windows-microsoft-we-love-open-source.html">OSS への敵対を中止</a>し、むしろ積極的に OSS に貢献するようになりました。他の大企業からも OSS への貢献が進むようになりました。</li><li><strong>2023年</strong>: OSSの定義を管理する非営利公益法人 Open Source Initiative が、Open Source AI の定義の策定を開始しました。策定された定義は <a href="https://opensource.org/ai/open-source-ai-definition">Open Source AI Definition (OSAID) v1.0</a> (2024) として公開されました。機械学習モデルの「ソースコード」とも言える(が、似て非なる)訓練データについては、<a href="https://hackmd.io/@opensourceinitiative/osaid-faq#Why-do-you-allow-the-exclusion-of-some-training-data">法的観点から非公開を許容</a>しています。</li></ul><h3>なぜOSSにコントリビュートすべきなのか</h3><p>OSSの概要や歴史を踏まえた上で、表題の問いについて考えてみます。</p><h4>OSSへの依存は不可避</h4><p>Synopsys社は、2023年の時点で <a href="https://www.synopsys.com/software-integrity/resources/analyst-reports/open-source-security-risk-analysis.html">96% の商用コードはOSSを含んでいる</a>と報告しています (2024 Open Source Security and Risk Analysis Report)。また、OSSを直接使っている認識がない場合でも、開発ツールやOSなどのことを考えると、<strong>誰しもが少なくとも間接的にはOSSに依存している</strong>といえます。身近な例では、<a href="https://github.com/apple-oss-distributions">iOS</a> や <a href="https://source.android.com/">Android </a>を搭載したスマートフォンには多数のOSSが含まれているので、ソフトウェアエンジニア以外の方でも知らないうちにOSSに依存しています。</p><p>OSSへの依存が不可避である今日では、<strong>OSSの停滞や脆弱性がビジネスや社会の脅威に直結します</strong>。特に、OSSプロジェクトが乗っ取られて悪意のあるコードを仕込まれたりすると多大な損害が発生する可能性があります。</p><p><strong>こうした脅威は、企業、学校、団体、個人がOSSに自ら積極的に関与することで抑えることができます。</strong>ここでの関与とはソフトウェアのコーディングに限った話ではなく、むしろプロジェクトのマネジメントにも関わる話です。とはいえ、コーディングで貢献しなければマネジメントに携われないことが多いので、まずはコーディングでの貢献が重要となります。</p><h4>OSSは誰が開発・維持しているのか</h4><p>OSS の開発・維持は個人の趣味とみなされがちですが、それは必ずしも正しい認識ではありません。1991年時点でのLinuxなど、趣味 (“Just for Fun”) で開発されたOSSも多数存在するのは事実ですが、今日の主要なOSSには企業や団体の業務で開発されているものも多数存在します。Linux を開発した Linus Torvalds 氏の場合は、2003年より 前述のOSDL (現 Linux Foundation) に雇用されています。</p><p>Tidelift社の調査 (The 2023 Tidelift state of the open source maintainer report) によると、<a href="https://tidelift.com/open-source-maintainer-survey-2023">OSSメンテナの13%は収入の大半を、23%は収入の一部をOSS活動により得ている</a>とされています。併せて<strong>36%のメンテナはOSS活動で収入を得ている</strong>ことになります。「メンテナ」とはプロジェクトの管理権限を持つ開発者のことで、プロジェクトによっては「コミッタ」とも呼ばれます。メンテナ以外の開発者については、OSS活動で収入を得ている割合が下がるものと思われます。大規模で活発なOSSに限れば、収入を得ている割合は上がるようにも思われます (個人的な感覚)。</p><p>OSSに強く依存している企業であっても、従業員がOSSに業務で取り組むことを認めていないこともありますが、OSSを個人の趣味とみなして「<strong>やりがい搾取」するのは持続可能ではありません</strong>。</p><h4>合成の誤謬</h4><p>市場経済の下では、営利企業は経済的に合理的な行動を選択するはずです。しかしながら、何が合理的な行動であるのかは自明ではありません。ミクロな視点で合理性を追求すると、マクロでは却って非合理的になることもあります。これを合成の誤謬 (fallacy of composition) と呼びます。経済学者 <a href="https://kotobank.jp/word/さみゆえるそん-3153128">Paul Samuelson</a> 氏の言葉とされます。</p><p>OSSに当てはめてみると、OSSは無料で使えますし、他の誰かが勝手に開発・維持してくれるので、自らはコントリビュートしないことがミクロな視点では合理的に思えます。しかしながら、誰しもがこの「合理的」な行動を選択すると、誰もOSSにコントリビュートしなくなってしまうため、結局は合理的ではありません。自らOSSにコントリビュートし、その価値を他者と分かち合うのが実は合理的であると言えます。</p><h4>OSSは贈与経済か</h4><p>Eric S. Raymond 氏は OSS文化の理解を促進するため、エッセイ <a href="https://cruel.org/freeware/noosphere.pdf">”Homesteading the Noosphere”</a> (1998) にて、社会地位を築く方法 (ways of gaining social status) を<a href="http://catb.org/~esr/writings/homesteading/homesteading/ar01s06.html">3つに分類</a>しました:</p><ol><li><strong>上意下達</strong> (command hierarchy): 軍事力・強制力に依る方法です。</li><li><strong>交換経済</strong> (exchange economy): 使用・交換するモノ (things) に対する支配力に依る方法です。典型例として自由市場経済が挙げられます。</li><li><strong>贈与文化</strong> (gift culture): 何を贈与するかに依る方法です。典型例として北米先住民の<a href="https://kotobank.jp/word/ぽとらつち-3170650">ポトラッチ</a> (potlatch) が挙げられます。ポトラッチは競覇的贈与とも呼ばれます。相手が返礼をできなくなるまで食糧や毛皮の贈与を繰り返すことで権力を得る風習です。なお、Raymond氏のエッセイでは触れられていませんが、関連する著作として 社会学者 <a href="https://kotobank.jp/word/もーす-1601627">Marcel Mauss</a> 氏の <a href="https://kotobank.jp/word/%E8%B4%88%E4%B8%8E%E8%AB%96-1355992">贈与論</a> (Essai sur le don, 1925) が挙げられます。日本語訳「<a href="https://dl.ndl.go.jp/pid/1902440/1/75">太平洋民族の原始經濟 : 古制社會に於ける交換の形式と理由</a>」(1943) は国立国会図書館デジタルコレクションで無料で閲読できます (要登録)。Mauss 氏は、贈与は無償ではなく、時として過大なまでの返礼の義務が伴うことを指摘しています。この義務を果たせないものは地位を失うとしています。</li></ol><p>Raymond氏はOSSを3番目の「贈与文化」 (gift culture) に分類し、<strong>身内での評判 (reputation among one’s peers) が競争の成功を測る唯一の指標となる状況が生まれる</strong>と指摘しました。しかしながら、<strong>今日のOSSは評判のみがモチベーションであるとは言い難い</strong>ように思われます。個人レベルでみると、業務で取り組んでいる場合は給与が第一のモチベーションとなり得ます。趣味の場合でも、自己研鑽がモチベーションとなり得ます。企業レベルでみると、例えば次のようなモチベーションが考えられます:</p><ul><li>プロジェクト維持によるセキュリティの担保</li><li>社外開発者との協力による新技術創出</li><li>社内forkを維持する負担の軽減</li></ul><p>もちろん、評判も主要なモチベーションにはなり得ますが、これは承認欲求や感情論ではなく経済的実益としても解釈できます。個人レベルでは昇進や転職、企業レベルでは売上や人材獲得が経済的実益となり得ます。</p><p>結局、<strong>古典的な贈与モデルではOSSコミュニティのダイナミクスを説明しきれない</strong>ように思われます。特に、元々開発に参加していないユーザは「贈与」に対する「返礼」を怠っても開発者コミュニティ内での地位を失いません 。というより、元々築いていない地位は失いようがありません。OSS活動には贈与モデルを当てはめなくても、自由市場経済の下での合理的行動として解釈できます。これは<a href="https://kotobank.jp/word/公共財-61761">純粋公共財</a> (pure public goods) が非公共部門 (non-public sectors) により効率的・持続的に供給されうる稀有な例であると言えます。</p><h3>OSSにただ乗りし続けると何が起こるか</h3><p>古典的な贈与モデルがOSSに当てはまりきらないとしても、ただ乗りが望ましくない点での結論は変わりません。ただ乗りが続くとOSSコミュニティは停滞し、新機能が追加されなくなったり、バグが修正されなくなったりします。ですが、これ自体はユーザにとっては大した問題ではありません。他のソフトウェアに乗り換えるとか、自分でforkしてメンテナンスするとかの選択肢があるからです。<strong>問題なのは、悪意を持った開発者からの “</strong><a href="https://dictionary.cambridge.org/dictionary/german-english/gift?q=Gift"><strong><em>Gift</em></strong></a><strong>” です。</strong>”Gift” は英語では「贈与物」を意味しますが、ドイツ語では「毒」を意味します。贈与物→投与物→毒 と意味が変化したようです。</p><h4>xz 乗っ取り事件</h4><p>そのような “Gift” の一例としては、2024年3月に発覚した、<a href="https://tukaani.org/xz-backdoor/">xz・liblzma 乗っ取り事件</a>を挙げることができます。圧縮・展開ツールであるxz および、そのライブラリであるliblzmaはほとんどのLinuxディストリビューションに標準で含まれているコンポーネントであり、当然に信頼できるものと思われがちでした。実際、元々のメンテナは悪意を持っていませんでしたが、途中から開発に参加した “Jia Tan” (おそらく偽名)と名乗るメンテナによって、不正なSSH接続を可能にするバックドアが仕掛けられていました。</p><p>“Jia Tan” が xz・liblzma の乗っ取りに成功した背景には、xz・liblzma が広く使われているにも関わらず、開発コミュニティが停滞していたことが挙げられます。<strong>停滞するコミュニティにおいて、”Jia Tan” は有益・無害(と思われる)コントリビューションに</strong><a href="https://research.swtch.com/xz-timeline"><strong>2年を費やして信頼を築き</strong></a><strong>、身元が明らかでないにも関わらずメンテナ権限を付与されていました。</strong>個人のいたずらにしてはあまりに長い時間をかけていることから、組織的な犯行とも推測されています。</p><p>xz・liblzmaの場合は、幸いなことに主要なディストリビューションにパッケージングされる前にバックドアが発見されました。しかしながら、xz・lzmaの事例は<strong>氷山の一角</strong>かもしれません。他のOSSも、悪意ある個人ないしは組織によって乗っ取られている可能性があります。</p><h4>他の事案</h4><p>やや似た事案をいくつか紹介します。</p><ul><li><strong>2022年1月</strong>: JavaScriptライブラリ colors.js および faker.js が開発者自身によって<a href="https://www.sonatype.com/blog/npm-libraries-colors-and-faker-sabotaged-in-protest-by-their-maintainer-what-to-do-now">意図的に破壊</a>され、意味不明な文字列やアスキーアートを無限回表示するコードが加えられました。<a href="https://github.com/aws/aws-cdk/issues/18323">AWS Cloud Development Kit</a> などに影響しました。先立つこと2020年には、開発者は次のコメントを投稿していました:</li></ul><blockquote>”Respectfully, I am no longer going to support Fortune 500s<br>( and other smaller sized companies ) with my free work.”<br> —<a href="https://web.archive.org/web/20210704022108/https://github.com/Marak/faker.js/issues/1046"><em> faker.js 開発者 (November 9, 2020)</em></a></blockquote><ul><li><strong>2022年3月</strong>: Node.js 用 IPCライブラリ node-ipc が開発者自身によって<a href="https://orca.security/resources/blog/cve-2022-23812-protestware-malicious-code-node-ipc-npm-package/">意図的に破壊</a>されました。ロシアやベラルーシで実行されている場合にファイルを破壊するコードが加えられました。</li><li><strong>2024年2月</strong>: ブラウザの差異を吸収するJavaScript ライブラリである polyfill.io のドメイン及びGitHubアカウントが売却され、<a href="https://blog.qualys.com/vulnerabilities-threat-research/2024/06/28/polyfill-io-supply-chain-attack">悪性サイトへリダイレクトするコード</a>が加えられました。</li></ul><p>コミュニティによる貢献や相互監視が活発になされていれば、これらの事案は防げた可能性があります。</p><h3>何をコントリビュートすべきか</h3><p>何をコントリビュートすべきかについて、まずはコミュニティの持続可能性の観点から考えてみます。</p><h4>コミュニティの持続可能性</h4><p>コミュニティの持続可能性の観点では、他の開発者がやりたがらない作業に取り組むことが重要です。例えば、バグ修正、テスト、リファクタリング、ドキュメント更新、質問対応などが挙げられます。これらの地道な作業を自らやってくれる人の善意を疑わないといけなくなったのがxz事件の最も悲しいところでもあります。</p><p>また、他の開発者の支援や監視も重要です。支援としては、pull requestをレビューしたり、途中で放棄されたpull requestを引き継いだりすることが挙げられます。監視項目としては、不審なコミットがないか、アカウントが乗っ取られていないか、名前や所属に偽りがないかなどが挙げられます。監視は相互かつ<strong>友好的に</strong>実施する必要があります。</p><h4>自分・自社の活動の持続可能性</h4><p>「コミュニティの持続可能性」に挙げた項目は味気ないものです。<strong>自分・自社がモチベーションを保って活動を持続できなければ、コミュニティの持続にも貢献できなくなります</strong>ので、モチベーションを保てることをコントリビューションするのも重要です。</p><p>例えば、大きい新機能を提案すると、その機能のメンテナンスで数年以上活動を続けられる(悪く言えば縛られる)ことがあります。</p><p>なお、モチベーションを保つために自己研鑽や趣味として凝った機能を作ってみても良いのですが、他の開発者がメンテナンスできるかへの配慮も必要になります。配慮したくない場合は、既存プロジェクトにマージさせるのではなく、forkしたり新規プロジェクトを立ち上げたりすることも検討すると良いでしょう。</p><p>何をやりたいか自分でもわからない場合はtypo修正など簡単なことから始めても構いませんが、typo修正ばかりやっていると荒らし(troll)と見做される恐れもあります。</p><h4>コードを書くだけがコントリビューションではない</h4><p>コードを書くだけがコントリビューションではなく、<strong>マネジメント面でのコントリビューションが実は一番重要</strong>です。特に、どうすれば他社・他者の行動を促せるかが鍵となります。例えば、コーディングやテストを必要とする部分を整理し、開発者を募ることが挙げられます。業務指示を出せるわけではない他社の方に行動を促すのは容易ではありませんが、会社をまたがってプロジェクトをまとめ上げられると、効率的に開発を進めることができます。</p><p>また、レビューされず放置されている pull request や脆弱性報告を洗い出し、適切なレビューワーを割り当てたり、自ら判断を下したりすることも重要です。マージするつもりがないpull requestについては、放置するよりもリジェクトする方が親切です。明示的にリジェクトしてもらえると、開発者はレビューを待たずに次の行動に進むことができます。脆弱性報告については、そもそも脆弱性が実在するのか怪しかったり、zero-day attack 防止の観点から公表時期の決定が難しかったりして対応が長引くことがあります。公表に当たっては、主要なユーザとの折衝が必要となることもあります。</p><p>こうしたマネジメントについては、ソフトウェア技術よりも、人脈や交渉力の方が重要になることもよくあります。とは言っても、<strong>コードを書かない「口だけ番長」は発言力を得られないので、結局はコードを書くのが基本</strong>となります。</p><p>その他、各 foundation 等への資金拠出ももちろん重要なコントリビューションとなります。</p><h4>コントリビュータの評価</h4><p>コントリビュータをどう評価すべきかについて、コミュニティ目線および企業目線でそれぞれ考えてみます。</p><p>コミュニティ内での評価は、主にメンテナの選定に関係します。大きい新機能を追加したコントリビュータは、当該モジュールのメンテナンスを長期的に任されることがあります。また、バグ管理、リリース管理、他のプロジェクトとの折衝などができると、プロジェクト全体の管理を任されやすいと言えます。</p><p>次に、OSS活動に取り組む従業員を、企業としてどう評価すべきかについて考えてみます。OSS活動が自社製品の売上に直結していると理想的ではありますが、それだけを評価指標にすると<strong>コミュニティの持続可能性</strong>を損なう恐れがあります。コミュニティの持続可能性の観点からは、コミュニティ内での評価も社内評価に取り入れることが望ましいと考えられます。なお、業務成果を定量評価するためにコミット件数やコード行数を測定している企業も多く存在しますが、<strong>定量性にこだわりすぎると迷惑行為の助長に繋がる</strong>のでよくありません。数値目標を満たすために、typo修正などの些細なpull requestを大量に投稿する組織も散見されます。</p><h3>NTTでの自身のOSSコントリビューション事例</h3><p>NTTグループは今までに Linux、 PostgreSQL、 OpenStack、 Hadoop など多くのOSSに積極的にコントリビュートしてきました。</p><p><a href="https://www.rd.ntt/sic/oss/">日本電信電話株式会社 ソフトウェアイノベーションセンタ</a> に所属する私自身は、<a href="https://github.com/moby/moby">Docker/Moby</a> (2016-)、<a href="https://github.com/moby/buildkit">BuildKit</a> (2017-)、<a href="https://containerd.io">containerd</a> (2017-)、<a href="https://github.com/opencontainers/runc">runc</a> (2020-)、<a href="https://github.com/opencontainers/runtime-spec">OCI Runtime Spec</a> (2022-) など主要なコンテナ関連OSSのメンテナを務めています。2015年末、Docker のファイルシステム関連の問題に遭遇したことがきっかけで、pull request を投稿したり、<a href="https://github.com/AkihiroSuda/issues-docker">課題整理</a>などのコントリビューションを行なったりするようになった結果、メンテナとしての役割を任せてもらえるようになりました。</p><p>機能的に大きなコントリビューションとしては、コンテナランタイムを非root権限で実行することでセキュリティを強化する<a href="https://rootlesscontaine.rs/">Rootlessコンテナ</a>と呼ばれる技術をcontainerd、BuildKit、Docker、Kubernetes に実装しました (2018-)。コンテナのネットワークを非root権限で実行可能にするモジュールとして開発した <a href="https://github.com/rootless-containers/slirp4netns">slirp4netns</a> は、Red Hat社が主導するDocker互換OSSである <a href="https://podman.io/">Podman</a> でも採用されました。<strong>振り返ってみると、提案がDockerに採用された時はモチベーションが大きく向上しましたが、マージとリリースに時間がかかった点で向上分がやや相殺されたと感じています。</strong></p><p>こうした経験も踏まえ、2020年からはcontainerd をベースにしたDocker互換プロジェクトとして <a href="https://github.com/containerd/nerdctl">nerdctl </a>(contaiNERD CTL) を開発しました。OSSとしてのDocker (Moby) のリリースが当時は停滞しており、containerd側で進んでいたセキュリティや性能面での改善を取り込みにくい状態が続いていたことが、新しい互換プロジェクトを立ち上げた理由です。OSSとしてのDocker (Moby) は現在では活気を取り戻しています。</p><p><a href="https://medium.com/nttlabs/nerdctl-359311b32d0e">nerdctl: Docker-compatible CLI for contaiNERD</a></p><p>また、 nerdctl 入りの Linux 仮想マシンを簡単に立ち上げるツールとして <a href="https://lima-vm.io/">Lima </a>も開発しました。nerdctl 及び Lima は SUSE社の<a href="https://rancherdesktop.io/">Rancher Desktop </a>や AWS社の<a href="https://runfinch.com/">Finch</a>などの製品にも取り込まれており、広く使われています。</p><p><a href="https://medium.com/nttlabs/lima-is-now-a-cncf-project-a7affde4f03c">Lima is now a CNCF project 🎉</a></p><p>2023年頃からは、OSSのサプライチェーンセキュリティを向上する取り組みをいくつか並行して進めています。その取り組みの一つとして、ソースコード汚染の検出を容易化する技術である <a href="https://github.com/docker-library/official-images/issues/16044">Reproducible Builds</a> のコンテナへの採用を推進してきましたが、<a href="https://github.com/moby/buildkit/blob/master/docs/build-repro.md">ツール群の実装は進んでいても</a> Docker Hub への採用交渉は難航しているところです。OSS活動においては<strong>実装力よりも交渉力が重要</strong>となることを改めて実感しています。</p><h3>OSSの今後</h3><p>最後に、OSSの今後についての展望を述べます。</p><h4>LLM界隈からの影響</h4><p>良くも悪くも、OSSはLLM界隈からの影響に晒されつつあります。GitHub Copilot などのLLM系アシスタントが生成するコードには他社の著作物が混入する懸念がありますが、生産性の観点からはLLMの使用を禁止するのは現実的ではありません。仮に禁止したとしても、コントリビュータは勝手にLLMを使うので実効性がないと考えられます。<strong>生産性を維持・向上しつつ、ライセンス上の懸念を払拭する</strong>にはどうすれば良いかが課題となっています。</p><p>また、<a href="https://opensource.org/blog/metas-llama-license-is-still-not-open-source">利用目的に制限を課すLLMをも “open source” と呼ぶ文化</a>が、LLM 以外のソフトウェアにも波及する恐れがあります。”open source” を名乗っていたり、あるいは “open source” であると報道されていても信用せず、ライセンスの原文を確認することが従来にも増して重要になります。</p><h4>匿名性の低下</h4><p>氏名や所属の公開を望まないOSS開発者も多く存在しますが、そのような匿名開発者は一部の活動が困難になる可能性があります。特に、2024年3月のxz・liblzma 乗っ取り事件以後は、優秀でも身元が不明な人物をメンテナに登用することは難しくなりつつあります。</p><p>新規のコントリビュータが信頼を得る方法の1つとしては、<a href="https://events.linuxfoundation.org">Open Source Summit</a> や <a href="https://fosdem.org/">FOSDEM</a> (Free and Open source Software Developers’ European Meeting) などの<strong>会議にオフラインで参加し、他の開発者と交流すること</strong>が挙げられます。ただし、旅費を勤務先に請求できない個人コントリビュータをどう包摂するかが課題です。業務としてOSSに取り組んでいるコントリビュータでも、登壇しない場合は出張を申請しにくいことも考えられます。</p><h4>プロジェクトの定量化</h4><p>本記事ではOSSプロジェクトの持続可能性について何度も言及してきましたが、これを定量化することは容易ではありません。開発者が何人バスに撥ねられてもプロジェクトを持続できるかを示す、”<a href="https://www.forbes.com/councils/forbestechcouncil/2024/08/28/survive-the-bus-factor-strategies-for-protecting-your-codebase/">Bus factor</a>” (“<a href="https://github.com/HelgeCPH/truckfactor">Truckfactor</a>”) なる物騒な指標も提唱されてはいますが、厳密性には乏しく、普及していないようです。別の指標が必要そうです。</p><p>また、プロジェクトの発展・衰退(・復活)を数理モデル化することも有益になると思われます。再現性のあるやり方で、プロジェクトを発展させたり復活させたりできるようになると良さそうです。</p><h3>まとめ</h3><p>長文となりましたが、お伝えしたかったことは次の3点です:</p><ul><li>なぜOSSにコントリビュートすべきなのか<br>→ <strong>持続可能性のため</strong> (だけとは言っていない)</li><li>OSSが放置されると悪意をもった開発者に乗っ取られることがある</li><li>ただ乗りは合理的のようで合理的ではない</li></ul><p>本記事がOSSコミュニティの更なる活性化に少しでも役立てば幸いです。</p><p>私たち<a href="https://www.rd.ntt/">NTT</a>は、様々なオープンソースコミュニティで共に活動する仲間を募集しています。ぜひ<a href="https://www.rd.ntt/sic/recruit/">弊社採用情報ページ</a>をご覧ください。</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=06064db030a0" width="1" height="1" alt=""><hr><p><a href="https://medium.com/nttlabs/why-you-should-contribute-to-open-source-software-06064db030a0">なぜオープンソースソフトウェアにコントリビュートすべきなのか</a> was originally published in <a href="https://medium.com/nttlabs">nttlabs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[containerd v2.0, nerdctl v2.0, and Lima v1.0]]></title>
            <link>https://medium.com/nttlabs/containerd-v2-0-nerdctl-v2-0-lima-v1-0-93026b5839f8?source=rss-814b1fd299ce------2</link>
            <guid isPermaLink="false">https://medium.com/p/93026b5839f8</guid>
            <category><![CDATA[kubecon]]></category>
            <category><![CDATA[lima-vm]]></category>
            <category><![CDATA[containerd]]></category>
            <category><![CDATA[nerdctl]]></category>
            <dc:creator><![CDATA[Akihiro Suda]]></dc:creator>
            <pubDate>Wed, 06 Nov 2024 06:13:39 GMT</pubDate>
            <atom:updated>2024-11-06T06:13:39.525Z</atom:updated>
            <content:encoded><![CDATA[<p>Ahead of the <a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/">KubeCon North America 2024</a> (November 12–15), this week saw the releases of <a href="https://github.com/containerd/containerd">containerd</a> v2.0, <a href="https://github.com/containerd/nerdctl">nerdctl</a> (<em>contaiNERD CTL</em>) v2.0, and <a href="https://lima-vm.io">Lima</a> v1.0 🎉.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*8K5MaEmsIwDDlRcNtXn08w.png" /></figure><h3>containerd v2.0</h3><p><a href="https://github.com/containerd/containerd">containerd</a> is the industry’s standard container runtime used by Docker and several Kubernetes-based products such as Azure Kubernetes Service (AKS), Amazon Elastic Kubernetes Service (EKS), and Google Kubernetes Engine (GKE).</p><p><a href="https://medium.com/nttlabs/the-internals-and-the-latest-trends-of-container-runtimes-2023-22aa111d7a93">The internals and the latest trends of container runtimes (2023)</a></p><p>containerd was originally written by Docker, Inc. in <a href="http://web.archive.org/web/20151217223538/https://containerd.tools/">2015</a> to provide a minimalistic daemon to manage the lifecycles of containers, under the hood of the Docker daemon.</p><p>containerd was <a href="https://www.cncf.io/announcements/2017/03/29/containerd-joins-cloud-native-computing-foundation/">transferred</a> to the <a href="https://cncf.io">Cloud Native Computing Foundation</a> (CNCF) and reached its <a href="https://github.com/containerd/containerd/releases/tag/v1.0.0">v1.0</a> in 2017, with the expanded scope of the project to support non-Docker use cases. The built-in support for Kubernetes was merged in <a href="https://github.com/containerd/containerd/releases/tag/v1.1.0">v1.1</a> (2018).</p><p>containerd v2.0 focuses on the removal of the legacy features that have been deprecated through the past nine years. This breaking change resulted in bumping up the major number from v1 to v2.</p><h4>Removed features</h4><ul><li><a href="https://github.com/containerd/containerd/pull/8262">The old </a><a href="https://github.com/containerd/containerd/pull/8262">c</a>ont<a href="https://github.com/containerd/containerd/pull/8262">ainerd-shim and </a><a href="https://github.com/containerd/containerd/pull/8262">containerd-shim-runc-v1</a>, in favor of containerd-shim-runc-v2. The old shims lacked the support for modern features such as cgroup v2, and were inefficient to support Kubernetes pods. Those old shims had been deprecated since containerd v1.4 (2020).</li><li><a href="https://github.com/containerd/containerd/pull/8263">The support for AUFS</a> , in favor of OverlayFS that has been merged in the upstream of the Linux kernel. The support for AUFS had been deprecated since containerd v1.5 (2021).</li><li><a href="https://github.com/containerd/containerd/pull/8276">The support for the Kubernetes CRI v1alpha2 API</a>, in favor of CRI v1. Kubernetes has already dropped the support for CRI v1alpha2, in <a href="https://github.com/kubernetes/kubernetes/blob/v1.26.0/CHANGELOG/CHANGELOG-1.26.md?plain=1#L482">Kubernetes v1.26</a> (2022).</li><li><a href="https://github.com/containerd/containerd/pull/9765">The support for &quot;Docker Schema 1&quot; images is now disabled</a>, in preparation of removal in containerd v2.1. Schema 1 has been substantially deprecated since circa. 2017 in favor of Schema 2 introduced in Docker v1.10 (2016), but some image registries did not support Schema 2 until 2020-ish. Docker has already disabled pushing Schema 1 images in <a href="https://github.com/moby/moby/pull/41295">Docker v20.10</a> (2020), so almost all images built in the last few years should have been formatted in Schema 2, or, its successor <a href="https://github.com/opencontainers/image-spec">OCI Image Spec</a> v1. (&quot;OCI&quot; here refers to &quot;Open Container Initiative&quot;, not to &quot;Oracle Cloud Infrastructure&quot;.)</li></ul><p>containerd v1.6.27+/v1.7.12+ users can investigate whether they are using those removed features, by running the ctr deprecations list command.</p><h4>New features</h4><ul><li><a href="https://kubernetes.io/docs/concepts/workloads/pods/user-namespaces/">User Namespaces for Kubernetes</a>, so as to map the user IDs in pods to different user IDs on the host. Especially, this features allows mapping the root user in the pod to an unprivileged user on the host.</li><li><a href="https://kubernetes.io/docs/concepts/storage/volumes/#read-only-mounts">Recursive Read-only Mounts for Kubernetes</a>, so as to prohibit accidentally having writable submounts. See also my previous blog at kubernetes.io: &lt;<a href="https://kubernetes.io/blog/2024/04/23/recursive-read-only-mounts/">https://kubernetes.io/blog/2024/04/23/recursive-read-only-mounts/</a>&gt;.</li><li><a href="https://github.com/containerd/containerd/blob/v2.0.0/docs/image-verification.md">Image verifier plugins</a>, so as to enforce cryptographic signing, malware scanning, etc.</li></ul><h4>Other notable changes</h4><ul><li><a href="https://github.com/containerd/containerd/issues/4131">Sandboxed CRI</a> is now enabled by default, for efficient handling of pods</li><li><a href="https://github.com/containerd/nri">NRI</a> (Node Resource Interface) is now enabled by default, for plugging vendor-specific logic into runtimes</li><li><a href="https://github.com/cncf-tags/container-device-interface">CDI</a> (Container Device Interface) is now enabled by default, for the enhanced support for <a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4009-add-cdi-devices-to-device-plugin-api">Kubernetes Device Plugins</a>.</li><li><a href="https://github.com/containerd/containerd/blob/v2.0.0/docs/cri/config.md">/etc/containerd/config.toml</a> now expects the version=3 header. The previous config versions are still supported.</li><li>The Go package <a href="https://pkg.go.dev/github.com/containerd/containerd">github.com/containerd/containerd</a> is now renamed to <a href="https://pkg.go.dev/github.com/containerd/containerd/v2/client">github.com/containerd/containerd/v2/client</a> .</li></ul><p>See also:</p><ul><li><a href="https://github.com/containerd/containerd/blob/v2.0.0/docs/containerd-2.0.md">https://github.com/containerd/containerd/blob/v2.0.0/docs/containerd-2.0.md</a></li><li><a href="https://github.com/containerd/containerd/releases/tag/v2.0.0">https://github.com/containerd/containerd/releases/tag/v2.0.0</a></li></ul><h3>nerdctl v2.0</h3><p><a href="https://github.com/containerd/nerdctl">nerdctl</a> (<em>contaiNERD CTL</em>) is a Docker-like command line interface tool for containerd.</p><p>nerdctl was originally written by myself in 2020 to facilitate experimental features such as <a href="https://github.com/containerd/nerdctl/blob/master/docs/stargz.md">eStargz</a> that were not supported in Docker at that time. nerdctl became a subproject of containerd in <a href="https://github.com/containerd/project/issues/69">2021</a>, and reached its v1.0 in 2022.</p><p><a href="https://medium.com/nttlabs/nerdctl-v1-0-fb6bf8e1b0b">Released nerdctl v1.0</a></p><p>nerdctl v2.0 enables <a href="https://github.com/containerd/nerdctl/pull/2723">detach-netns</a> for Rootless mode by default:</p><ul><li>Faster and more stable nerdctl pull, nerdctl push, and nerdctl build</li><li>Proper support for nerdctl pull 127.0.0.1:.../...</li><li>Proper support for nerdctl run --net=host .</li></ul><p>The detach-netnsmode may sound similar to <a href="https://github.com/rootless-containers/bypass4netns">bypass4netns</a>, which utilizes SECCOMP_IOCTL_NOTIF_ADDFD to accelerate socket syscalls in rootless containers. While bypass4netns accelerates containers, detach-netns accelerates the runtime layers that are responsible for pulling and pushing images, by leaving them in the host network namespace. Containers are executed in the &quot;detached&quot; network namespace so that they can obtain IP addresses used for container-to-container communications.</p><p>Other major changes in nerdctl v2.0 include the addition of <a href="https://github.com/containerd/nerdctl/pull/2785">nerdctl run --systemd</a> for running systemd in containers. Also, the stability was significantly improved in this release, thanks to lots of refactoring and testing by the GitHub user <a href="https://github.com/containerd/nerdctl/issues?q=is%3Apr%20author%3Aapostasie%20">@apostasie</a> .</p><p>See also the release note: <a href="https://github.com/containerd/nerdctl/releases/tag/v2.0.0">https://github.com/containerd/nerdctl/releases/tag/v2.0.0</a></p><h3>Lima v1.0</h3><p><a href="https://lima-vm.io/">Lima</a> is a command line utility to run <a href="https://github.com/containerd/containerd">containerd</a> and <a href="https://github.com/containerd/nerdctl">nerdctl</a> on desktop operating systems such as macOS, by running a Linux virtual machine with automatic filesystem sharing and port forwarding. Lima is often compared with WSL2, former Docker Machine, and Vagrant.</p><pre>brew install lima<br>limactl start<br>lima nerdctl run -p 80:80 nginx</pre><p>Lima was originally written by myself too in 2021, and joined CNCF in 2022. Lima has been adapted by several famous third-party projects such as <a href="https://github.com/abiosoft/colima">Colima</a>, <a href="https://rancherdesktop.io">Rancher Desktop</a>, and <a href="https://aws.amazon.com/blogs/opensource/introducing-finch-an-open-source-client-for-container-development/">AWS’s Finch</a>.<br><a href="https://github.com/lima-vm/lima/discussions/2390#discussioncomment-9732082">Lima is also used by several organizations including NTT Communications.</a></p><p><a href="https://medium.com/nttlabs/lima-is-now-a-cncf-project-a7affde4f03c">Lima is now a CNCF project 🎉</a></p><p>Lima finally reached v1.0 today, with the support from 110+ contributors and 15,000+ stargazers in the past 3+ years.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9mRBHjBY13l3-DIoOcUQ3w.png" /><figcaption><a href="https://star-history.com/#lima-vm/lima">https://star-history.com/#lima-vm/lima</a></figcaption></figure><p>This release introduces several breaking changes, such as switching the default machine driver on macOS from QEMU to <a href="https://developer.apple.com/documentation/virtualization">Virtualization.framework</a> (VZ) for better filesystem performance.</p><p>The limactl CLI is designed to print hints when the user hits those breaking changes. e.g., limactl create template://experimental/vz now fails with a hint that suggests using limactl create --vm-type=vz template://default instead.</p><p>Other notable changes include the addition of the support for <a href="https://github.com/lima-vm/lima/pull/2530">nested virtualization</a>, <a href="https://github.com/lima-vm/lima/pull/2411">UDP port forwarding</a>, and the <a href="https://github.com/lima-vm/lima/pull/2710">limactl tunnel</a> command (SOCKS proxy).</p><p>See also the release note: <a href="https://github.com/lima-vm/lima/releases/tag/v1.0.0">https://github.com/lima-vm/lima/releases/tag/v1.0.0</a></p><h3>Visit the maintainers at KubeCon</h3><p>Some of the maintainers of the projects, including myself, will show up at <a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/">KubeCon North America 2024</a>:</p><h4>Wednesday, November 13</h4><ul><li><strong>15:15–20:00</strong>: <a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/program/project-engagement/#project-kiosk-directory">Project Kiosk: containerd</a></li></ul><h4>Friday, November 15</h4><ul><li><strong>10:30-14:30</strong>: <a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/program/project-engagement/#project-kiosk-directory">Project Kiosk: Lima</a></li><li><strong>11:55-12:30</strong>: <a href="https://sched.co/1i7qL">What Containerd 2.0 Means for You — Samuel Karp, Google</a></li><li><strong>14:55-15:30</strong>: <a href="https://sched.co/1hoyS">What’s Going on in the Containerd Neighborhood? — Phil Estes, AWS; Samuel Karp, Google; Akihiro Suda (myself), NTT; Michael Brown, IBM; Kirtana Ashok, Microsoft</a></li></ul><p>The full schedule of the conference can be found at &lt;<a href="https://kccncna2024.sched.com/">https://kccncna2024.sched.com/</a>&gt;.</p><h3>NTT is hiring!</h3><p>We at <a href="https://www.rd.ntt/e/">NTT</a> are looking for engineers who work in Open Source communities in the fields of containers, etc. Visit &lt;<a href="https://www.rd.ntt/e/sic/recruit/">https://www.rd.ntt/e/sic/recruit/</a>&gt; to see how to join us.</p><p>私たち<a href="https://www.rd.ntt/">NTT</a>は、コンテナなどの領域でのオープンソースコミュニティで共に活動する仲間を募集しています。ぜひ弊社採用情報ページをご覧ください: &lt;<a href="https://www.rd.ntt/sic/recruit/">https://www.rd.ntt/sic/recruit/</a>&gt;</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=93026b5839f8" width="1" height="1" alt=""><hr><p><a href="https://medium.com/nttlabs/containerd-v2-0-nerdctl-v2-0-lima-v1-0-93026b5839f8">containerd v2.0, nerdctl v2.0, and Lima v1.0</a> was originally published in <a href="https://medium.com/nttlabs">nttlabs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Accelerating Llama on Lima, with WASI-NN RPC]]></title>
            <link>https://medium.com/nttlabs/accelerating-llama-on-lima-with-wasi-nn-rpc-06b84bcbbe5c?source=rss-814b1fd299ce------2</link>
            <guid isPermaLink="false">https://medium.com/p/06b84bcbbe5c</guid>
            <category><![CDATA[lima-vm]]></category>
            <category><![CDATA[llama-2]]></category>
            <category><![CDATA[llm]]></category>
            <category><![CDATA[wasi]]></category>
            <category><![CDATA[wasm]]></category>
            <dc:creator><![CDATA[Akihiro Suda]]></dc:creator>
            <pubDate>Wed, 19 Jun 2024 23:38:14 GMT</pubDate>
            <atom:updated>2024-06-19T23:38:14.968Z</atom:updated>
            <content:encoded><![CDATA[<p><a href="https://github.com/WasmEdge/WasmEdge/releases/tag/0.14.0">WasmEdge v0.14</a> was released last month, with our <a href="https://github.com/WasmEdge/WasmEdge/pull/3128">contribution</a> for exposing <a href="https://github.com/WebAssembly/wasi-nn">WASI-NN</a> (WebAssembly System Interface API for Neural Networks) over gRPC.</p><p>The WASI-NN RPC is useful for accelerating LLM workloads (e.g., <a href="https://llama.meta.com/">Llama</a>) on virtual machines (e.g., <a href="https://lima-vm.io/">Lima</a>) that do not support virtualizing GPUs.</p><p><strong>On Apple M2 Pro, Llama 2 can run 22.3 times faster. (0.66 tokens/s → 14.73 tokens/s)</strong></p><p><strong><em>Note</em></strong><em>: “Lima” in this context refers to &lt;</em><a href="https://lima-vm.io/"><em>https://lima-vm.io</em></a><em>&gt; (VM), not to &lt;</em><a href="https://gitlab.freedesktop.org/lima"><em>https://gitlab.freedesktop.org/lima</em></a><em>&gt; (Mali GPU driver).</em></p><h3>Problem: GPUs are inaccessible from VMs</h3><p>Lima is a tool that creates a Linux virtual machine with a simple command line interface. Lima was originally made for running <a href="https://github.com/containerd/containerd">containerd</a> including <a href="https://github.com/containerd/nerdctl">nerdctl</a> (contaiNERD CTL) on macOS. However, Lima has gained popularity for other use cases as well.</p><p><a href="https://medium.com/nttlabs/lima-is-now-a-cncf-project-a7affde4f03c">Lima is now a CNCF project 🎉</a></p><p>For macOS hosts, Lima supports two backends: <a href="https://www.qemu.org/">QEMU</a> and <a href="https://developer.apple.com/documentation/virtualization?language=objc">Virtualization.framework</a>. The lack of the support for GPUs in these backends has been a huge burden for users who want to efficiently run AI workloads such as Llama inside Lima.</p><h3>Solution: WASI-NN as the high-level RPC for neural networks on GPUs</h3><p>Implementing GPU passthrough in these VM backends is not a straightforward task. Instead, we chose to implement an RPC subsystem that delegates neural network computations to a host process (WASI-NN RPC Server) with direct access to the host GPUs.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*otwBHsfS2LqZZkzRSt5GWA.png" /></figure><p>The <a href="https://github.com/WasmEdge/WasmEdge/blob/0.14.0/lib/wasi_nn_rpc/wasi_ephemeral_nn.proto">RPC</a> is built on top of gRPC and directly mapped to the <a href="https://github.com/WebAssembly/wasi-nn/blob/06c30c0e12e39e674b6503614352dcf6dc0c96e0/wasi-nn.witx">WITX specification</a> of the WASI-NN API.</p><pre>// gRPC<br>message SetInputRequest {<br>  uint32 resource_handle = 1;<br>  uint32 index = 2;<br>  Tensor tensor = 3;<br>}<br><br>message ComputeRequest{<br>  uint32 resource_handle = 1;<br>}<br><br>message GetOutputRequest {<br>  uint32 resource_handle = 1;<br>  uint32 index = 2;<br>}<br><br>message GetOutputResult {<br>  bytes data = 1;<br>}<br><br>service GraphExecutionContextResource {<br>  rpc SetInput(SetInputRequest) returns (google.protobuf.Empty) {};<br>  rpc Compute(ComputeRequest) returns (google.protobuf.Empty) {};<br>  rpc GetOutput(GetOutputRequest) returns (GetOutputResult) {};<br>}</pre><pre>;; WITX<br>(@interface func (export &quot;set_input&quot;)<br>  (param $context $graph_execution_context)<br>  (param $index u32)<br>  (param $tensor $tensor)<br>  (result $error (expected (error $nn_errno)))<br>)<br><br>(@interface func (export &quot;compute&quot;)<br>  (param $context $graph_execution_context)<br>  (result $error (expected (error $nn_errno)))<br>)<br><br>(@interface func (export &quot;get_output&quot;)<br>  (param $context $graph_execution_context)<br>  (param $index u32)<br>  (param $out_buffer (@witx pointer u8))<br>  (param $out_buffer_max_size $buffer_size)<br>  (result $error (expected $buffer_size (error $nn_errno)))<br>)</pre><p>The RPC client is implemented in <a href="https://wasmedge.org/">WasmEdge</a>. The RPC itself is agnostic to WASM and can be implemented by non-WASM applications too.</p><h4><em>Why does WASM matter here?</em></h4><p>Actually, it really doesn’t. WASM appears here simply because:</p><ul><li>the WASI-NN API provides a quite simple abstraction for neural networks</li><li>the WasmEdge implementation of WASI-NN already covers several backends such as PyTorch and GGML, with the support for <a href="https://developer.apple.com/metal/">Apple Metal</a>.</li></ul><p>Alternatively, <a href="https://dawn.googlesource.com/dawn/+/f290d2d265ca7f386e743f4061076d28d5d897ef/docs/dawn/overview.md#dawn-wire">Dawn Wire</a> (RPC for WebGPU) could be adopted instead of WASM and WASI-NN, but it would incur a higher implementation cost due to the difference in abstraction levels.</p><h3>Demo: 22 times faster</h3><h4>Launching Lima</h4><p>An instance of Lima virtual machine can be created as follows:</p><pre># Host (macOS)<br>brew install lima<br>limactl start --vm-type=vz<br>lima</pre><p>As of the time of writing this, the brew command installs Lima v0.22 with Ubuntu 24.04 as the default VM template.</p><p>The <a href="https://lima-vm.io/docs/config/vmtype/">--vm-type=vz</a> flag in the limactl start command specifies Virtualization.framework (vz) as the VM driver. This flag is optional, but recommended for better performance and stability.</p><h4>Installing WasmEdge onto the Lima guest</h4><p>After running the lima command to open a shell for the VM, run the following commands to install WasmEdge inside the guest:</p><pre># Guest (Linux)<br>sudo apt-get install -y cmake libgrpc++-dev liblld-dev libopenblas-dev libopenblas64-dev llvm ninja-build pkg-config protobuf-compiler-grpc<br><br>git clone https://github.com/WasmEdge/WasmEdge.git <br>cd WasmEdge<br>git checkout 0.14.0<br><br>cmake -S. -B ./build -GNinja \<br>  -DCMAKE_BUILD_TYPE=Release \<br>  -DWASMEDGE_PLUGIN_WASI_NN_BACKEND=GGML \<br>  -DWASMEDGE_PLUGIN_WASI_NN_GGML_LLAMA_BLAS=ON \<br>  -DWASMEDGE_BUILD_WASI_NN_RPC=ON<br>cmake --build ./build<br>sudo cmake --install ./build</pre><h4>Running Llama on Lima, without the acceleration</h4><p>Inside the Lima VM, Llama can be executed with WasmEdge as follows:</p><pre># Guest (Linux)<br>curl -OSL https://github.com/second-state/WasmEdge-WASINN-examples/raw/da18b35c3c911a40a5d2784947ce78610ce51daf/wasmedge-ggml/nnrpc/wasmedge-ggml-nnrpc.wasm<br>curl -OSL https://huggingface.co/wasmedge/llama2/resolve/23de599453ce999ab1dc650bd01f6298af38eb18/llama-2-7b-chat-q5_k_m.gguf<br><br>wasmedge \<br>  --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf \<br>  --env enable_log=true \<br>  wasmedge-ggml-nnrpc.wasm default</pre><p>The license and acceptable use policy for the llama-2-7b-chat-q5_k_m.gguf file can be found at &lt;<a href="https://huggingface.co/wasmedge/llama2/tree/23de599">https://huggingface.co/wasmedge/llama2/tree/23de599</a>&gt;.<br><em>Llama</em> was chosen to be executed inside <em>Lima</em> as a pun; it is possible to use other GGUF-formatted models as well.</p><p>In the terminal, you can chat with the model, but it is quite slow (<strong>0.66 tokens per second on Apple M2 Pro</strong>) due to the lack of access to the host GPUs:</p><pre><strong>USER:                                                                                                                                      <br>What is the capital city of Peru?</strong><br>[...]<br>eval time =   13535.83 ms /     9 runs   ( 1503.98 ms per token,     0.66 tokens per second)<br>[...]<br><strong>ASSISTANT:<br>The capital city of Peru is Lima.&lt;/s&gt;</strong></pre><p>It may even appear to hang, as the model’s output is not printed until text generation is complete. This issue is being addressed in &lt;<a href="https://github.com/WasmEdge/WasmEdge/pull/3386">https://github.com/WasmEdge/WasmEdge/pull/3386</a>&gt; by implementing the WASI-NN Streaming Extension.</p><h4>Installing WASI-NN RPC server onto the macOS host</h4><p>The next step is to install WasmEdge along with the WASI-NN RPC server onto the macOS host, so that the guest can delegate the LLM inference computations to the host with the access to the GPUs.</p><pre># Host (macOS)<br>brew install cmake grpc llvm@16 ninja pkg-config<br><br>git clone https://github.com/WasmEdge/WasmEdge.git <br>cd WasmEdge<br>git checkout 0.14.0<br><br>export LLVM_DIR=&quot;${HOMEBREW_PREFIX}/opt/llvm@16/lib/cmake&quot;<br>export CC=&quot;${HOMEBREW_PREFIX}/opt/llvm@16/bin/clang&quot;<br>export CXX=&quot;${HOMEBREW_PREFIX}/opt/llvm@16/bin/clang++&quot;<br>cmake -S. -B ./build -GNinja \<br>  -DCMAKE_BUILD_TYPE=Release \<br>  -DWASMEDGE_PLUGIN_WASI_NN_BACKEND=GGML \<br>  -DWASMEDGE_PLUGIN_WASI_NN_GGML_LLAMA_METAL=ON \<br>  -DWASMEDGE_PLUGIN_WASI_NN_GGML_LLAMA_BLAS=OFF \<br>  -DWASMEDGE_BUILD_WASI_NN_RPC=ON<br>cmake --build ./build<br>sudo cmake --install ./build</pre><p>The WASI-NN RPC server listens on a UNIX domain socket on the host. The socket can be forwarded to the guest with ssh -R &lt;GUESTPATH&gt;:&lt;HOSTPATH&gt;:</p><pre># Host (macOS)<br>curl -OSL https://huggingface.co/wasmedge/llama2/resolve/23de599453ce999ab1dc650bd01f6298af38eb18/llama-2-7b-chat-q5_k_m.gguf<br><br>wasi_nn_rpcserver \<br>  --nn-rpc-uri unix://$HOME/nn.sock \<br>  --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf<br><br>ssh -F $HOME/.lima/default/ssh.config -R /home/${USER}.linux/nn.sock:$HOME/nn.sock lima-default</pre><h4>Running Llama on Lima, with the acceleration</h4><p>WasmEdge running inside the Lima instance can now connect to the WASI-NN RPC server socket with the --nn-rpc-uri flag:</p><pre># Guest (Linux)<br>wasmedge \<br>  --nn-rpc-uri unix://$HOME/nn.sock \<br>  --env enable_log=true \<br>  wasmedge-ggml-nnrpc.wasm default</pre><pre><strong># Before</strong><br>eval time =   13535.83 ms /     9 runs   ( 1503.98 ms per token,     0.66 tokens per second)<br><br><strong># After</strong><br>eval time =     611.14 ms /     9 runs   (   67.90 ms per token,    14.73 tokens per second)</pre><p><strong>On Apple M2 Pro, the performance is improved from 0.66 tokens per second to 14.73 tokens per second. (22.3 times faster)</strong></p><h3>Future: wRPC</h3><p>In the future, WASI-NN RPC maybe replaced by <a href="https://github.com/bytecodealliance/wrpc">wRPC</a>. wRPC is a fairly new Bytecode Alliance project that aims to define the standard for the distributed communication model of WASM components. wRPC could potentially be useful for exposing other host resources, such as biometric authenticators, to Lima as well.</p><h3>NTT is hiring!</h3><p>We at <a href="https://www.rd.ntt/e/">NTT</a> are looking for engineers who work in Open Source communities in the fields of containers, WASM, LLM, etc. Visit &lt;<a href="https://www.rd.ntt/e/sic/recruit/">https://www.rd.ntt/e/sic/recruit/</a>&gt; to see how to join us.</p><p>私たち<a href="https://www.rd.ntt/">NTT</a>は、コンテナ、WASM、LLMなどの領域でのオープンソースコミュニティで共に活動する仲間を募集しています。ぜひ弊社採用情報ページをご覧ください: &lt;<a href="https://www.rd.ntt/sic/recruit/">https://www.rd.ntt/sic/recruit/</a>&gt;</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=06b84bcbbe5c" width="1" height="1" alt=""><hr><p><a href="https://medium.com/nttlabs/accelerating-llama-on-lima-with-wasi-nn-rpc-06b84bcbbe5c">Accelerating Llama on Lima, with WASI-NN RPC</a> was originally published in <a href="https://medium.com/nttlabs">nttlabs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[[DockerCon 2023] Reproducible builds with BuildKit for software supply chain security]]></title>
            <link>https://medium.com/nttlabs/dockercon-2023-reproducible-builds-with-buildkit-for-software-supply-chain-security-0e5aedd1aaa7?source=rss-814b1fd299ce------2</link>
            <guid isPermaLink="false">https://medium.com/p/0e5aedd1aaa7</guid>
            <category><![CDATA[buildkit]]></category>
            <category><![CDATA[dockercon]]></category>
            <category><![CDATA[reproducible-builds]]></category>
            <dc:creator><![CDATA[Akihiro Suda]]></dc:creator>
            <pubDate>Mon, 23 Oct 2023 14:13:41 GMT</pubDate>
            <atom:updated>2023-10-23T14:13:41.946Z</atom:updated>
            <content:encoded><![CDATA[<p>This is a recap of my talk “<a href="https://github.com/AkihiroSuda/AkihiroSuda/blob/master/slides/2023/20231005%20%5BDockerCon%5D%20Reproducible%20builds%20with%20BuildKit%20for%20software%20supply%20chain%20security.pdf">Reproducible builds with BuildKit for software supply chain security</a>” at <a href="https://dockercon.com/">DockerCon</a> (October 5th, 2023).</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/blob/master/slides/2023/20231005%20%5BDockerCon%5D%20Reproducible%20builds%20with%20BuildKit%20for%20software%20supply%20chain%20security.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Kebnyc89D5qMZGccE2M3Gg.png" /></a><figcaption><a href="https://github.com/AkihiroSuda/AkihiroSuda/blob/master/slides/2023/20231005%20%5BDockerCon%5D%20Reproducible%20builds%20with%20BuildKit%20for%20software%20supply%20chain%20security.pdf">Slide 1</a></figcaption></figure><p>This was similar to <a href="https://medium.com/nttlabs/bit-for-bit-reproducible-builds-with-dockerfile-7cc2b9faed9f">my previous talk at FOSDEM in February</a>, but the toolchain was simplified since then.</p><h3>Background</h3><p>Security assessment of third party Docker images has been a long challenge, due to the lack of verifiability in the software supply chain.</p><p>Images maintained by a reputable organization or an individual are often considered to be trustworthy, however, it is hard to deny a possibility that they might have silently injected malicious codes that are not present in the source repo. Also, even if they have no malicious intent, their images can be still compromised on an accidental leakage of registry credentials.</p><p>Reproducible builds reduce this concern. Reproducible builds is a technique to ensure that a bit-for-bit identical image can be reproduced from its source code, by anybody, at any time. When multiple actors can attest to an image’s reproducibility, it signifies that the image contains no code of a secret origin.</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/blob/master/slides/2023/20231005%20%5BDockerCon%5D%20Reproducible%20builds%20with%20BuildKit%20for%20software%20supply%20chain%20security.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*tLueTNLoMskGDK3_yZTHCA.png" /></a><figcaption><a href="https://github.com/AkihiroSuda/AkihiroSuda/blob/master/slides/2023/20231005%20%5BDockerCon%5D%20Reproducible%20builds%20with%20BuildKit%20for%20software%20supply%20chain%20security.pdf">Slide 3</a></figcaption></figure><h3>Are Docker Hub images actually reproducible?</h3><p>Most of them are not. You can run docker build <a href="https://github.com/docker-library/">https://github.com/docker-library/</a>... to rebuild an image on Docker Hub by yourself, and use my <a href="https://github.com/reproducible-containers/diffoci">diffoci</a>(<em>diff for </em><a href="https://opencontainers.org/"><em>Open Container Initiative</em></a><em> images</em>) tool &lt;<a href="https://github.com/reproducible-containers/diffoci">https://github.com/reproducible-containers/diffoci</a>&gt; to see why they are not reproducible:</p><pre>docker pull golang:1.21.1-alpine@sha256:96634e55b363cb93d39f78fb18aa64abc7f96d372c176660d7b8b6118939d97b<br><br># DOCKER_BUILDKIT=0 with Docker 20.10.23 corresponds to the current Docker Hub image (Will change in the future)<br>export DOCKER_BUILDKIT=0<br>docker build -t my-golang &quot;https://github.com/docker-library/golang.git#585c8c1e705a7a458455f0629922a4f90628ce08:1.21/alpine3.18”<br><br>go install github.com/reproducible-containers/diffoci/cmd/diffoci@latest<br><br>diffoci diff docker://golang:1.21.1-alpine docker://my-golang</pre><p>The diffoci result for golang:1.21.1-alpine contains more than 14,000 lines of diffs, but most of them are just the differences of the timestamps:</p><pre><strong>$ diffoci diff docker://golang:1.21.1-alpine docker://my-golang</strong><br>TYPE     NAME                                                   INPUT-0                         INPUT-1<br>Desc     application/vnd.docker.distribution.manifest.v2+json   b25862...                       3c4eca0...<br>...<br>File     etc/ssl/certs/3e45d192.0                               2023-08-09 03:36:47 +0000 UTC   2023-09-21 08:35:31 +0000 UTC<br>...<br>(More than 14,000 lines)<br>...<br>File     go/                                                    2023-09-06 18:31:40 +0000 UTC   2023-09-21 08:35:45 +0000 UTC</pre><p>The --semantic flag can be used to ignore such “boring” differences:</p><pre><strong>$ diffoci --semantic diff docker://golang:1.21.1-alpine docker://my-golang</strong><br>TYPE     NAME                      INPUT-0                                                                        INPUT-1<br>Layer    ctx:/layers-1/layer       length mismatch (457 vs 454)                                                   <br>Layer    ctx:/layers-1/layer       name &quot;usr/local/share/ca-certificates/.wh..wh..opq&quot; only appears in input 0    <br>Layer    ctx:/layers-1/layer       name &quot;etc/ca-certificates/.wh..wh..opq&quot; only appears in input 0                <br>Layer    ctx:/layers-1/layer       name &quot;usr/share/ca-certificates/.wh..wh..opq&quot; only appears in input 0          <br>File     lib/apk/db/scripts.tar    eef110e...                                                                     e9bfe18...<br>Layer    ctx:/layers-2/layer       length mismatch (13939 vs 13938)                                               <br>Layer    ctx:/layers-2/layer       name &quot;usr/local/go/.wh..wh..opq&quot; only appears in input 0                       <br>File     lib/apk/db/scripts.tar    60e22bb...                                                                     67f2648...<br>Layer    ctx:/layers-3/layer       length mismatch (4 vs 3)                                                       <br>Layer    ctx:/layers-3/layer       name &quot;go/.wh..wh..opq&quot; only appears in input 0 </pre><p>The remaining differences are:</p><ul><li>.wh..wh..opq (AUFS whiteouts) are missing in the local build due to the filesystem difference</li><li>lib/apk/db/scripts.tar differs due to the timestamp information inside itself (the --semantic flag isn’t still clever enough to ignore timestamps inside nested tar archives)</li></ul><h3>How to make images reproducible</h3><h4>Timestamps</h4><p>Timestamps are one of the obvious challenges to achieve reproducibility. Docker/OCI (Open Container Initiative) images have timestamps in:</p><ol><li>the createdproperty in the <a href="https://github.com/opencontainers/image-spec/blob/v1.0.2/config.md">OCI Image Config</a> (shown in docker image ls )</li><li>the historyproperty in the <a href="https://github.com/opencontainers/image-spec/blob/v1.0.2/config.md">OCI Image Config</a> (shown in docker image history )</li><li>the org.opencontainers.image.created annotation in the <a href="https://github.com/opencontainers/image-spec/blob/v1.0.2/image-index.md">OCI Image Index</a></li><li>the timestamps of the files in the <a href="https://github.com/opencontainers/image-spec/blob/v1.0.2/layer.md">image layers</a></li></ol><p>BuildKit v0.11 added the support for rewriting the timestamps for 1, 2, and 3 to reduce non-reproducibility.<br>This features was extended in <a href="https://github.com/moby/buildkit/blob/v0.13.0-beta1/docs/build-repro.md#source_date_epoch">BuildKit v0.13 (beta)</a> to cover 4 as well.</p><pre># Configure buildx to use BuildKit v0.13 beta1<br>docker buildx create --use --driver-opt image=moby/buildkit:v0.13.0-beta1<br><br># Rewrite the timestamps in the image to the timestamp of the latest git commit<br>docker buildx build --build-arg SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct) \<br>  --output type=image,name=example.com/image,push=true,rewrite-timestamp=true</pre><p><a href="https://reproducible-builds.org/docs/source-date-epoch/">SOURCE_DATE_EPOCH</a> (uint64; seconds from 1970–01–01 00:00:00 UTC) here is an environment variable standardized by &lt;<a href="https://reproducible-builds.org/">https://reproducible-builds.org/</a>&gt;. This environment variable is also recognized by gcc, clang, cmake, etc.to make application binaries reproducible too. See &lt;<a href="https://reproducible-builds.org/docs/source-date-epoch/">https://reproducible-builds.org/docs/source-date-epoch/</a>&gt; for the details.</p><h4>Pinning packages</h4><p>The base image for Dockerfile can be pinned with tags like FROM debian:bookworm-20230904-slim . However, this is not enough for reproducing apt-get results, as apt-get installs the packages from the latest repos, not from the snapshot on 2023–09–04.</p><p>To install packages from a past snapshot, you have to configure the package manager to use a past snapshot explicitly. For Debian, /etc/apt/sources.list can be configured to use snapshot.debian.org/archive/debian/20230904T000000Z as follows:</p><pre><strong>FROM</strong> debian:bookworm-20230904-slim<br><strong>ENV</strong> DEBIAN_FRONTEND=noninteractive<br><strong>RUN</strong> rm -rf /etc/apt/sources.list* &amp;&amp; \<br>  echo &#39;deb [check-valid-until=no] http://snapshot.debian.org/archive/debian/20230904T000000Z bookworm main&#39; \<br>  &gt;/etc/apt/sources.list &amp;&amp; \<br>  echo &#39;deb [check-valid-until=no] http://snapshot.debian.org/archive/debian-security/20230904T000000Z bookworm-security main&#39; \<br>  &gt;&gt;/etc/apt/sources.list &amp;&amp; \<br>  echo &#39;deb [check-valid-until=no] http://snapshot.debian.org/archive/debian/20230904T000000Z bookworm-updates main&#39; \<br>  &gt;&gt;/etc/apt/sources.list &amp;&amp; \<br>  apt-get update &amp;&amp; \<br>  apt-get install -y gcc</pre><p>I wrote a script &lt;<a href="https://github.com/reproducible-containers/repro-sources-list.sh">https://github.com/reproducible-containers/repro-sources-list.sh</a>&gt; to simplify setting up /etc/apt/sources.list and enabling the cache for /var/cache/apt :</p><pre><strong>FROM</strong> debian:bookworm-20230904-slim<br><strong>ADD</strong> --chmod=0755 \<br>  https://raw.githubusercontevnt.com/reproducible-containers/repro-sources-list.sh/v0.1.0/repro-sources-list.sh \<br>  /usr/local/bin/repro-sources-list.sh<br><strong>ENV</strong> DEBIAN_FRONTEND=noninteractive<br><strong>RUN</strong> --mount=type=cache,target=/var/cache/apt \<br>  repro-sources-list.sh &amp;&amp; \<br>  apt-get update &amp;&amp; \<br>  apt-get install -y gcc</pre><p>Caching /var/cache/apt is optional, but highly recommended, as the snapshot server isn’t as fast as regular apt-get servers. The cache for /var/cache/aptcan be saved on GitHub Actions using &lt;<a href="https://github.com/reproducible-containers/buildkit-cache-dance">https://github.com/reproducible-containers/buildkit-cache-dance</a>&gt; :</p><pre><strong>steps</strong>:<br>  - <strong>uses</strong>: actions/cache@v3<br>    <strong>with</strong>:<br>      <strong>path</strong>: var-cache-apt<br>      <strong>key</strong>: var-cache-apt-${{ hashFiles(&#39;Dockerfile&#39;) }}<br>  - <strong>uses</strong>: <a href="https://github.com/reproducible-containers/buildkit-cache-dance">reproducible-containers/buildkit-cache-dance</a>@v2.1.2<br>    <strong>with</strong>:<br>      <strong>cache-source</strong>: var-cache-apt<br>      <strong>cache-target</strong>: /var/cache/apt</pre><p>The techniques above work for <a href="https://github.com/reproducible-containers/repro-sources-list.sh/blob/v0.1.0/Dockerfile.ubuntu-2204">Ubuntu</a> (snapshot.ubuntu.com) and <a href="https://github.com/reproducible-containers/repro-sources-list.sh/blob/v0.1.0/Dockerfile.archlinux">ArchLinux</a> ( archive.archlinux.org ) too.</p><p>However, this is still challenging for Alpine Linux, Rocky Linux, AlmaLinux, etc., as they do not have snapshot servers. A workaround for these distro is to preserve /etc/apk/cache , /var/cache/dnf ,etc. by yourself: &lt;<a href="https://github.com/reproducible-containers/repro-pkg-cache">https://github.com/reproducible-containers/repro-pkg-cache</a>&gt;.<br>In the long term, BuildKit frontends may have a built-in feature to help this: &lt;<a href="https://github.com/moby/buildkit/issues/4259">https://github.com/moby/buildkit/issues/4259</a>&gt;.</p><h3>Future work</h3><p>After the general availability of BuildKit v0.13, I’ll submit PRs to make well-known images reproducible.</p><p>We also need a “single-click” platform for attesting reproducibility and sharing the result. This will probably need help from registry service providers.</p><h3>NTT is hiring!</h3><p>We at <a href="https://www.rd.ntt/e/">NTT</a> are looking for engineers who work in Open Source communities like Docker/Moby, BuildKit, and their relevant projects. Visit &lt;<a href="https://www.rd.ntt/e/sic/recruit/">https://www.rd.ntt/e/sic/recruit/</a>&gt; to see how to join us.</p><p>私たち<a href="https://www.rd.ntt/">NTT</a>は、Docker/Moby や BuildKit などのオープンソースコミュニティで共に活動する仲間を募集しています。ぜひ弊社採用情報ページをご覧ください: &lt;<a href="https://www.rd.ntt/sic/recruit/">https://www.rd.ntt/sic/recruit/</a>&gt;</p><h3>Links</h3><p><strong>Tools and examples</strong>: &lt;<a href="https://github.com/reproducible-containers">https://github.com/reproducible-containers</a>&gt;</p><ul><li><a href="https://github.com/reproducible-containers/diffoci"><strong>diffoci</strong></a>: diff for OCI images, to analyze non-reproducible builds</li><li><a href="https://github.com/reproducible-containers/repro-sources-list.sh"><strong>repro-sources-list.sh</strong></a>: reproducibility helper for Debian, Ubuntu, etc.</li><li><a href="https://github.com/reproducible-containers/repro-pkg-cache"><strong>repro-pkg-cache</strong></a>: reproducibility helper for Alpine, Alma, Rocky, etc.</li><li><a href="https://github.com/reproducible-containers/buildkit-cache-dance"><strong>buildkit-cache-dance</strong></a>: apt-get cache for GitHub Actions</li></ul><p><strong>BuildKit docs</strong>: &lt;<a href="https://github.com/moby/buildkit/blob/master/docs/build-repro.md">https://github.com/moby/buildkit/blob/master/docs/build-repro.md</a>&gt;</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=0e5aedd1aaa7" width="1" height="1" alt=""><hr><p><a href="https://medium.com/nttlabs/dockercon-2023-reproducible-builds-with-buildkit-for-software-supply-chain-security-0e5aedd1aaa7">[DockerCon 2023] Reproducible builds with BuildKit for software supply chain security</a> was originally published in <a href="https://medium.com/nttlabs">nttlabs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[The internals and the latest trends of container runtimes (2023)]]></title>
            <link>https://medium.com/nttlabs/the-internals-and-the-latest-trends-of-container-runtimes-2023-22aa111d7a93?source=rss-814b1fd299ce------2</link>
            <guid isPermaLink="false">https://medium.com/p/22aa111d7a93</guid>
            <category><![CDATA[containers]]></category>
            <dc:creator><![CDATA[Akihiro Suda]]></dc:creator>
            <pubDate>Wed, 21 Jun 2023 20:38:01 GMT</pubDate>
            <atom:updated>2023-06-21T20:46:40.541Z</atom:updated>
            <content:encoded><![CDATA[<p>Last week I had an opportunity to give an <a href="http://www.cce.i.kyoto-u.ac.jp/danwa23.html">online lecture</a> about containers to students at Kyoto University.</p><p>The slide deck can be found <a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf">here</a> (PDF):</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*lehrs0x7-Oqhc5lkpOoNHw.png" /></a></figure><p><strong>Contents</strong>:</p><ol><li>Introduction to containers</li><li>Internals of container runtimes</li><li>Latest trends in container runtimes</li></ol><h3>1. Introduction to containers</h3><h4>What are containers?</h4><p>Containers are a set of various lightweight methods to isolate filesystems, CPU resources, memory resources, system permissions, etc. Containers are similar to virtual machines in many senses, but they are more efficient and often less secure than virtual machines. (<strong>Slide 5</strong>)</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*OVsYlSmH_L15vparu4VrWg.png" /></a></figure><p>An interesting thing is that there is still no strict definition of “<em>containers</em>”. Even virtual machines can be called &quot;<em>containers</em>&quot; when they provide container-like interfaces, e.g., when they implement the <a href="https://specs.opencontainers.org/">OCI (Open Container Initiative) specs</a>. Such &quot;<em>non-container</em>&quot; containers are discussed later in <strong>Section 3</strong>.</p><h4>Docker</h4><p><a href="https://www.docker.com/">Docker</a> is the most popular container engine. Docker natively supports Linux containers and Windows containers, but Windows containers are out of the scope of this talk.</p><p>A typical command line to start a Docker container is as follows:</p><pre>docker run -p 8080:80 -v .:/usr/share/nginx/html nginx:1.25</pre><p>After executing this command, the content of `<strong>index.html</strong>` in the current directory will be visible in <strong>http://&lt;the host’s IP&gt;:8080/</strong> .</p><p>The `<strong>-p 8080:80</strong>` part in the command line specifies to forward the TCP port 8080 of the host into the port 80 of the container.</p><p>The `<strong>-v .:/usr/share/nginx/html</strong>` part specifies to mount the current directory on the host onto `<strong>/usr/share/nginx/html</strong>` in the container.</p><p>The `<strong>nginx:1.25</strong>` specifies to use the <a href="https://hub.docker.com/_/nginx">official nginx image </a>on <a href="https://hub.docker.com/">Docker Hub</a>. Docker images are somewhat similar to virtual machine images, however, they usually do not contain additional daemons such as systemd and sshd.</p><p>You can find the official images for other applications on <a href="https://hub.docker.com/search">Docker Hub</a> too. You can also build your own images by yourself, using a language called Dockerfile:</p><pre>FROM debian:12<br>RUN  apt-get update &amp;&amp; apt-get install -y openjdk-17-jre<br>COPY myapp.jar /myapp.jar<br>CMD  [&quot;java&quot;, &quot;-jar&quot;, &quot;/myapp.jar&quot;]</pre><p>An image can be built with the `<a href="https://docs.docker.com/engine/reference/commandline/build/"><strong>docker build</strong></a>` command, and can be pushed to Docker Hub or other registry services with the `<a href="https://docs.docker.com/engine/reference/commandline/push/"><strong>docker push</strong></a>` command.</p><h4>Kubernetes</h4><p><a href="https://kubernetes.io/">Kubernetes</a> clusterizes multiple container hosts such as (but not limited to) Docker hosts to provide load balancing and fault-tolerance (<strong>Slide 10</strong>).</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/720/1*An2qZhoR6_OGaz7Z4mG8Zg.png" /></a></figure><p>It is noteworthy that Kubernetes is also an abstraction framework for interacting with objects such as <a href="https://kubernetes.io/docs/concepts/workloads/pods/">Pods</a> (groups of containers that are always co-scheduled on a same host), <a href="https://kubernetes.io/docs/concepts/services-networking/service/">Services</a> (entities for network connectivity), and <a href="https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/">any kind of objects</a>, but it is beyond the scope of this talk.</p><h4>Docker vs pre-Docker containers</h4><p>While containers didn&#39;t get much attention until the release of Docker in 2013, Docker wasn’t the first container platform:</p><ul><li><strong>1999</strong>: <a href="https://svnweb.freebsd.org/base?view=revision&amp;revision=46155">FreeBSD Jail</a></li><li><strong>2000</strong>: <a href="https://lkml.iu.edu/hypermail/linux/kernel/0008.2/0042.html">Virtual Environment system for Linux</a> (precursor to Virtuozzo and OpenVZ)</li><li><strong>2001</strong>: <a href="https://www.cs.helsinki.fi/linux/linux-kernel/2001-40/1065.html">Linux Vserver</a></li><li><strong>2002</strong>: <a href="https://wiki.openvz.org/History">Virtuozzo</a></li><li><strong>2004</strong>: <a href="https://lkml.iu.edu/hypermail/linux/kernel/0409.1/0994.html">BSD Jail for Linux</a></li><li><strong>2004</strong>: <a href="https://web.archive.org/web/20041116174148/http://www.sun.com/smi/Press/sunflash/2004-11/sunflash.20041115.2.html">Solaris Containers</a> (Apparently, the term &quot;container&quot; was coined this time)</li><li><strong>2005</strong>: <a href="https://wiki.openvz.org/History">OpenVZ</a></li><li><strong>2008</strong>: <a href="https://github.com/lxc/lxc/tree/5e97c3fcce787a5bc0f8ceef43aa3e05195b480a">LXC</a></li><li><strong>2013</strong>: <a href="https://www.youtube.com/watch?v=9xciauwbsuo">Docker</a></li></ul><p>It is widely considered that FreeBSD Jail (circa 1999) is the first practical container implementation for Unix-like operating systems, although the term &quot;container&quot; wasn&#39;t coined at that time.</p><p>Since then, several implementations appeared for Linux too. However, pre-Docker containers were fundamentally different from Docker containers; they had focused on mimicking an entire machine with System V init, sshd, syslogd, etc., inside it. It was also often common to put a Web server, an application server, a database server, and everything into a single container</p><p>Docker changed the paradigm. In the case of Docker, a container usually only contains a single service (<strong>Slide 14</strong>) so that containers can be stateless and immutable. This design significantly reduces maintenance costs, as containers are now disposable; When something needs to be updated, you can just remove the container and recreate it from the latest image. You no longer need to install sshd and other utilities inside the container either, as you never need a shell access for it. This simplifies load-balancing and fault-tolerance too for multi-host clusters.</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Xzpc72NV3fxpZfDrUfCHCA.png" /></a></figure><h3>2. Internals of container runtimes</h3><p>This section assumes using Docker v24 with its default configuration, but most parts are applicable to non-Docker containers too.</p><h4>Docker under the hood</h4><p>Docker consists of the client program ( `<strong>docker</strong>` CLI ) and the daemon program (`<strong>dockerd</strong>`). The `<strong>docker</strong>` CLI connects to the `<strong>dockerd</strong>` daemon via an Unix socket (`<strong>/var/run/docker.sock</strong>`) to create containers.</p><p>However, the `<strong>dockerd</strong>` daemon doesn&#39;t create containers by itself. It delegates control to the `<a href="https://containerd.io/"><strong>containerd</strong></a>` (/<em>container-dee</em>/) daemon to create containers (<strong>Slide 17</strong>). But it doesn&#39;t create containers either; it further delegates control to the `<a href="https://github.com/opencontainers/runc"><strong>runc</strong></a>` (/<em>run-see</em>/) runtime, which composes multiple Linux kernel features such as Namespaces, Cgroups, and Capabilities to implement the concept of &quot;<em>containers</em>&quot;. There is no &quot;<em>container</em>&quot; object in the Linux kernel.</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*RWzcHdOheUfu_cdEwCRRmQ.png" /></a></figure><h4>Namespaces</h4><p><a href="https://man7.org/linux/man-pages/man7/namespaces.7.html">Namespaces</a> isolate resources from the host and from other containers.</p><p>The most well-known namespaces are <a href="https://man7.org/linux/man-pages/man7/mount_namespaces.7.html">mount namespaces</a> (<strong>Slide 19</strong>). Mount namespaces isolate the filesystem view so that a container can change the rootfs to `<strong>/var/lib/docker/.../&lt;container&#39;s rootfs&gt;</strong>` using the `<a href="https://man7.org/linux/man-pages/man2/pivot_root.2.html"><strong>pivot_root(2)</strong></a>` syscall. This syscall is similar to traditional `<a href="https://man7.org/linux/man-pages/man2/chroot.2.html"><strong>chroot(2)</strong></a>` but <a href="https://tbhaxor.com/pivot-root-vs-chroot-for-containers/">more secure</a>.</p><p>The container&#39;s rootfs has very similar structure as the host, but it has several restrictions on `<strong>/proc</strong>`, `<strong>/sys</strong>`, and `<strong>/dev</strong>`. e.g.,</p><ul><li>The `<strong>/proc/sys</strong>` directory is remounted as a read-only bind mount to prohibit sysctl.</li><li>The `<strong>/proc/kcore</strong>` file (RAM) is masked by mounting `<strong>/dev/null</strong>` over it.</li><li>The `<strong>/sys/firmware</strong>` directory (firmware data) is masked by mounting an empty read-only tmpfs over it.</li><li>Accesses to the `<strong>/dev</strong>` directories are restricted by Cgroups (discussed later).</li></ul><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*T-hPJqFAR6UIZ-yDHETOMQ.png" /></a></figure><p><a href="https://man7.org/linux/man-pages/man7/network_namespaces.7.html">Network namespaces</a> (<strong>Slide 21</strong>) allow assigning dedicated IP addresses to containers so that they can talk to each other by IP.</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*fDuES0pJVmlZ-JLDNM1gSw.png" /></a></figure><p><a href="https://man7.org/linux/man-pages/man7/pid_namespaces.7.html">PID namespaces</a> (<strong>Slide 23</strong>) isolate process trees so that a container can&#39;t control processes outside it.</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZZdXqUyVmpRb1ZK9yk8OBQ.png" /></a></figure><p><a href="https://man7.org/linux/man-pages/man7/user_namespaces.7.html">User namespaces</a> (<strong>Slide 24; </strong>not to be confused with &quot;<a href="https://en.wikipedia.org/wiki/User_space_and_kernel_space">user spaces</a>&quot;) isolate the root privilege by mapping a non-root user on the host to the pseudo &quot;root&quot; in a container. The pseudo root can behave like the root in the container to run `<strong>apt-get</strong>`, `<strong>dnf</strong>`, etc., but it doesn&#39;t have privileged accesses to resources outside the container.</p><p>User namespaces significantly mitigate potential container breakout attacks, but it is still <a href="https://docs.docker.com/engine/security/userns-remap/">not used by default in Docker</a>.</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*yUzEpHCWi-vw1suk5ncFaw.png" /></a></figure><p>Other namespaces:</p><ul><li><a href="https://man7.org/linux/man-pages/man7/ipc_namespaces.7.html"><strong>IPC namespaces</strong></a>: Isolates System V inter-process communication objects, etc.</li><li><a href="https://man7.org/linux/man-pages/man7/uts_namespaces.7.html"><strong>UTS namespaces</strong></a>: Isolates the hostname. &quot;UTS&quot; (Unix Time Sharing system) seems a misnomer for this namespace.</li><li><a href="https://man7.org/linux/man-pages/man7/uts_namespaces.7.html"><strong>(Optional) Cgroup namespaces</strong></a>: Isolates `<strong>/sys/fs/cgroup</strong>` hierarchy.</li><li><a href="https://man7.org/linux/man-pages/man7/time_namespaces.7.html"><strong>(Optional) Time namespaces</strong></a>: Isolates clocks. <a href="https://github.com/opencontainers/runtime-spec/pull/1151">Not used by most containers yet.</a></li></ul><h4>Cgroups</h4><p><a href="https://man7.org/linux/man-pages/man7/cgroups.7.html">Cgroups</a> (control groups) impose several resource quotas such as CPU usage, memory usage, block I/O, and number of processes in a container.</p><p>Cgroups also control accesses to device nodes. <a href="https://github.com/opencontainers/runtime-spec/blob/v1.0.2/config-linux.md#default-devices">The default configuration of Docker</a> allows unlimited accesses to `<strong>/dev/null</strong>`, `<strong>/dev/zero</strong>`, `<strong>/dev/urandom</strong>`, etc., and disallows accesses to`<strong>/dev/sda</strong>` (disk devices), `<strong>/dev/mem</strong>` (memory), etc.</p><h4>Capabilities</h4><p>On Linux, the root privilege is represented by a 64-bit <a href="https://man7.org/linux/man-pages/man7/capabilities.7.html">capability</a> flag set. 41 bits are <a href="https://github.com/torvalds/linux/blob/v6.3/include/uapi/linux/capability.h#L420">in use</a> today.</p><p>The default configuration of Docker drops system-wide administration capabilities such as `<strong>CAP_SYS_ADMIN</strong>`.</p><p><a href="https://github.com/moby/moby/blob/v24.0.2/oci/caps/defaults.go">The retained capabilities</a> include:</p><ul><li>`<strong>CAP_CHOWN</strong>`: for running `<strong>chown</strong>` inside containers.</li><li>`<strong>CAP_NET_BIND_SERVICE</strong>`: for binding TCP and UDP ports beneath 1024 inside containers.</li><li>`<strong>CAP_NET_RAW</strong>`: for running <a href="https://github.com/moby/moby/issues/41886#issuecomment-1590736893">legacy `<strong>ping</strong>` implementations</a> that need to craft raw Ethernet packets. This capability is quite dangerous, as it allows <a href="https://blog.aquasec.com/dns-spoofing-kubernetes-clusters">ARP spoofing and DNS spoofing</a> in the container&#39;s network. A future version of Docker may <a href="https://github.com/moby/moby/issues/41886">disable it by default</a>.</li></ul><h4>(Optional) Seccomp</h4><p><a href="https://man7.org/linux/man-pages/man2/seccomp.2.html">Seccomp</a> (Secure computing) allows specifying an explicit allowlist (or a denylist) of syscalls. The default configuration of Docker allows about <a href="https://github.com/moby/moby/blob/v24.0.2/profiles/seccomp/default.json">350 syscalls</a>.</p><p>Seccomp is used for <a href="https://en.wikipedia.org/wiki/Defense_in_depth_(computing)"><em>defense in depth</em></a>; It is not a hard requirement for containers. For the sake of backward compatibility, Kubernetes still does not use seccomp by default, and <a href="https://github.com/kubernetes/enhancements/issues/2413#issuecomment-1581231097">it probably will never change the default configuration in the foreseeable future</a>. Users can still opt-in to enable seccomp via `<a href="https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/#kubelet-config-k8s-io-v1beta1-KubeletConfiguration"><strong>KubeletConfiguration</strong></a>`.</p><h4>(Optional) AppArmor <em>XOR</em> SELinux</h4><p><a href="https://apparmor.net/">AppArmor</a> and <a href="https://github.com/SELinuxProject">SELinux</a> (Security Enhanced Linux) are <a href="https://www.kernel.org/doc/html/v6.3/admin-guide/LSM/index.html">LSMs</a> (Linux Security Modules) that provide further fine-grained configuration knobs.</p><p>These are mutually exclusive; one is chosen by host OS distributors (not by container image distributors):</p><ul><li><strong>AppArmor</strong>: chosen by Debian, Ubuntu, SUSE, etc.</li><li><strong>SELinux</strong>: chosen by Fedora, Red Hat Enterprise Linux, and similar host OS distributions.</li></ul><p>Docker&#39;s <a href="https://github.com/moby/moby/blob/v24.0.2/profiles/apparmor/template.go">default</a> AppArmor profile almost just overlaps with its default configuration for capabilities, mount masks, etc., for the sake of defense-in-depth. Users may add custom settings for further security.</p><p>But the story is different for SELinux. To run containers in the `<a href="https://docs.docker.com/engine/reference/commandline/dockerd/"><strong>selinux-enabled</strong></a>` mode, you have to append an option `<strong>:z</strong>` (lower character) or `<strong>:Z</strong>` (upper character) to a bind mount, or run complex `<strong>chcon</strong>` commands by yourself to avoid permission errors.</p><p>The `<strong>:z</strong>` (lower character) option is used for Type Enforcement (<strong>Slide 32</strong>). Type Enforcement protects host files from containers, by assigning &quot;types&quot; to processes and files. A process running with the `<strong>container_t</strong>` type can read files with the `<strong>container_share_t</strong>` type, and read/write files with the `<strong>container_file_t</strong>` type, but it can&#39;t access files with other types.</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KoTwjHe3dUEYQRzfOl_Q2A.png" /></a></figure><p>The `<strong>:Z</strong>` (upper character) option is used for Multi-category Security (<strong>Slide 33</strong>). Multi-category Security protects a container from another container, by assigning category numbers to processes and files. e.g., A process with Category 42 can&#39;t access files labeled with Category 43.</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*bQoe2Cca_wWLXrYBlj_t1w.png" /></a></figure><h4>What about Docker for Mac/Win?</h4><p><a href="https://www.docker.com/products/docker-desktop/">Docker Desktop</a> products support running Linux containers on Mac and Windows, but they are just running a Linux virtual machine under the hood to run containers on it. The containers are not directly running on macOS and Windows.</p><h3>3. Latest trends in container runtimes</h3><h4>Alternatives to Docker (as Kubernetes runtimes)</h4><p>The first version of Kubernetes (2014) was solely made for Docker (<strong>Slide 37</strong>). Kubernetes <a href="https://kubernetes.io/blog/2016/07/kubernetes-1-3-bridging-cloud-native-and-enterprise-workloads/">v1.3</a> (2016) added an interim support for an alternative container runtime called rkt, but rkt was retired in <a href="https://www.cncf.io/blog/2019/08/16/cncf-archives-the-rkt-project/">2019</a>. The effort for supporting alternative container runtimes yielded the Container Runtime Interface (CRI) API in Kubernetes<a href="https://github.com/kubernetes/kubernetes/blob/v1.5.0/docs/devel/container-runtime-interface.md"> v1.5</a> (2016). After the debut of CRI, the industry has converged to have two alternative runtimes: <a href="https://containerd.io/">containerd</a> (/<em>container-dee</em>/) and <a href="https://cri-o.io/">CRI-O</a> (/<em>cry-oh</em>/, /<em>cree-oh</em>/, or /<em>see-er-eye-oh</em>/).</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*04Edb0wEnXci2c5ye9F8dA.png" /></a></figure><p>Kubernetes still had a built-in support for Docker (<strong>Slide 38</strong>), but it was finally removed in Kubernetes <a href="https://kubernetes.io/blog/2022/03/31/ready-for-dockershim-removal/">v1.24</a> (2022). Docker still continues to work for Kubernetes as a third party runtime (via the `<a href="https://github.com/Mirantis/cri-dockerd"><strong>cri-dockerd</strong></a>` shim), but Docker is now seeing less adoptions for Kubernetes.</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ttq05nTH21UT577xW-FIRg.png" /></a></figure><p>The big names in the industry has already switched away from Docker to containerd, or to CRI-O:</p><ul><li><strong>Adopters of containerd</strong>: <a href="https://docs.aws.amazon.com/eks/latest/userguide/dockershim-deprecation.html">Amazon Elastic Kubernetes Service (EKS)</a>, <a href="https://learn.microsoft.com/en-us/azure/aks/cluster-configuration#container-runtime-configuration">Azure Kubernetes Service (AKS)</a>, <a href="https://cloud.google.com/kubernetes-engine/docs/how-to/migrate-containerd">Google Kubernetes Engine (GKE)</a>, <a href="https://docs.k3s.io/advanced#configuring-containerd">k3s</a>, ... (many)</li><li><strong>Adopters of CRI-O</strong>: <a href="https://docs.openshift.com/container-platform/4.13/architecture/architecture.html#architecture-custom-os_architecture">Red Hat OpenShift</a>, <a href="https://docs.oracle.com/en-us/iaas/Content/ContEng/Concepts/contengaboutk8sversions.htm">Oracle Container Engine for Kubernetes (OKE)</a>, ...</li></ul><p>containerd focuses on extensibility and supports non-Kubernetes workloads as well as Kubernetes workloads. In contrast, CRI-O focuses on simplicity and solely supports Kubernetes.</p><h4>Alternatives to Docker (as CLI)</h4><p>While Kubernetes has become the standard for multi-node production clusters, users still want Docker-like CLI for building and testing containers locally on their laptops. Docker basically satisfies this demand, but runtime developers in the community wanted to build their own &quot;lab&quot; CLIs to incubate new features ahead of Docker and Kubernetes, as it was often hard to propose new features to Docker and Kubernetes, for several technical/technological reasons.</p><p><a href="https://podman.io/">Podman</a> (formerly called kpod in <a href="https://github.com/cri-o/cri-o/commit/0d0b70a475b9846798710ffd7cdd8f4a462a4404">2016</a>) is a Docker-compatible standalone container engine created by Red Hat and others. Its main difference from Docker is that it does not have the daemon process by default. Also, Podman is unique in the sense that it provides first-class support for managing Pods (groups of containers that share the same network namespace and often data volumes on the same host for efficient communication) as well as containers. However, most users seem to just use Podman for non-pod containers.</p><p><a href="https://github.com/containerd/nerdctl"><strong>nerd</strong>ctl</a> (/<em>nerd-see-tee-el</em>/, founded by myself in 2020) is a Docker-compatible CLI for contai<strong>nerd</strong> (/<em>container-dee</em>/). nerdctl was originally made for experimenting new features such as lazy-pulling (discussed later), but it is also useful for debugging Kubernetes nodes that are running containerd.</p><p>See also my blog article &quot;<a href="https://medium.com/nttlabs/nerdctl-v1-0-fb6bf8e1b0b"><em>Released nerdctl v1.0</em></a>&quot; (October 2022) for the further information:</p><p><a href="https://medium.com/nttlabs/nerdctl-v1-0-fb6bf8e1b0b">Released nerdctl v1.0</a></p><h4>Running containers on Mac</h4><p><a href="https://www.docker.com/products/docker-desktop/">Docker Desktop</a> products for Mac and Windows are proprietary. Windows users can just run the Linux version of Docker (Apache License 2.0, no GUI) in WSL2, but there was no equivalent for Mac users so far.</p><p><a href="https://lima-vm.io/">Lima</a> (/<em>lee-mah</em>/, founded by myself too in 2021) is a command line tool to create a WSL2-like environment on macOS for running containers. Lima uses nerdctl by default, but it supports Docker and Podman too.</p><p>See also my blog article &quot;<a href="https://medium.com/nttlabs/lima-is-now-a-cncf-project-a7affde4f03c"><em>Lima is now a CNCF project</em></a>&quot; (October 2022).</p><p><a href="https://medium.com/nttlabs/lima-is-now-a-cncf-project-a7affde4f03c">Lima is now a CNCF project 🎉</a></p><p>Lima is also adopted by third party projects such as <a href="https://github.com/abiosoft/colima">colima</a> (2021), <a href="https://rancherdesktop.io/">Rancher Desktop</a> (2021), and <a href="https://github.com/runfinch/finch">Finch</a> (2022).</p><p>Podman community released <a href="https://docs.podman.io/en/latest/markdown/podman-machine.1.html">Podman Machine</a> (command line tool, 2021) and <a href="https://podman-desktop.io/">Podman Desktop</a> (GUI, 2022) as an alternative for Docker Desktop. Podman Desktop supports Lima too, optionally.</p><h4>Docker being refactored</h4><p>containerd mainly provides two subsystems: the runtime subsystem and the image subsystem. However, the latter one is not used by Docker. This is problematic because Docker&#39;s own legacy image subsystem is far behind containerd&#39;s modern image subsystem (and it caused me to launch the nerdctl project):</p><ul><li>No support for <a href="https://github.com/containerd/stargz-snapshotter">lazy-pulling</a> (on-demand image pulling)</li><li><a href="https://github.com/moby/moby/issues/44582">Limited support for multi-platform images</a> (e.g., AMD64/ARM64 dual-platform images)</li><li><a href="https://github.com/moby/moby/issues/25779">Limited compliance of OCI Image Spec</a></li></ul><p>This long-standing problem is finally being resolved. Docker v24 (2023) added an experimental support for using containerd&#39;s image subsystem with an <a href="https://github.com/moby/moby/blob/v24.0.2/daemon/daemon.go#L801">undocumented option</a> (subject to change) in `<strong>/etc/docker/daemon.json</strong>`:</p><pre>{&quot;features&quot;:{&quot;containerd-snapshotter&quot;: true}}</pre><p>A future version of Docker (2024? 2025?) is likely to use containerd&#39;s image subsystem by default.</p><h4>Lazy-pulling</h4><p>Most files in container images are never used:</p><blockquote><strong>“pulling packages accounts for 76% of container start time, but only 6.4% of that data is read”</strong><em><br></em>From “<a href="https://www.usenix.org/conference/fast16/technical-sessions/presentation/harter">Slacker: Fast Distribution with Lazy Docker Containers</a>” (Harter, et al., FAST 2016)</blockquote><p>&quot;Lazy-pulling&quot; is a technique to reduce container startup time by pulling partial image contents on demand. This is not possible with <a href="https://github.com/opencontainers/image-spec/blob/v1.0.2/layer.md">OCI-standard tar.gz images</a>, as they do not support `<strong>seek()</strong>` operations. Several alternative formats are being proposed to support lazy-pulling:</p><ul><li><a href="https://github.com/containerd/stargz-snapshotter"><strong>eStargz</strong></a><strong> (2019)</strong>: Optimizes gzip granularity for <strong>seek()</strong>-ability; Forward compatible with OCI v1 tar.gz.</li><li><a href="https://github.com/awslabs/soci-snapshotter"><strong>SOCI</strong></a><strong> (2022)</strong>: Captures a checkpoint of tar.gz decoder state; Forward compatible with OCI v1 tar.gz.</li><li><a href="https://github.com/containerd/nydus-snapshotter"><strong>Nydus</strong></a><strong> (2022)</strong>: An alternate image format;<br>Not compatible with OCI v1 tar.gz.</li><li><a href="https://github.com/containerd/overlaybd"><strong>OverlayBD</strong></a><strong> (2021)</strong>: Block devices as container images; Not compatible with OCI v1 tar.gz.</li></ul><p><strong>Slide 51</strong> shows a benchmark result of eStargz. Lazy-pulling (+additional optimizations) can reduce the container startup time to 1/9.</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MoC4Bvx7V4t6gtRD9UbGkg.png" /></a></figure><p>See also articles from my colleague <a href="https://medium.com/@ktokunaga.mail">Kohei Tokunaga</a>:</p><ul><li><a href="https://medium.com/nttlabs/lazy-pulling-estargz-ef35812d73de">Speeding Up Pulling Container Images on a Variety of Tools with eStargz</a></li><li><a href="https://medium.com/nttlabs/nerdctl-ipfs-975569520e3d">P2P Container Image Distribution on IPFS With Containerd</a></li></ul><h4>Expanding adoption of User namespaces</h4><p>User namespaces are still rarely used in the Docker and Kubernetes ecosystem, although Docker has been supporting it since <a href="https://github.com/moby/moby/pull/12648">v1.9</a> (2015).</p><p>One of the reasons is that the complexity and the overhead of “chowning” container rootfs for a pseudo root. Linux kernel <a href="https://kernelnewbies.org/Linux_5.12#ID_mapping_in_mounts">v5.12</a> (2021) added “idmapped mounts” to eliminate the necessity for chowning. This is planned to be supported in <a href="https://github.com/opencontainers/runc/pull/3717">runc v1.2</a>.</p><p>After the release of runc v1.2, user namespaces are expected to be more popular for Docker and Kubernetes, which just added <a href="https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/127-user-namespaces/README.md">preliminary support </a>for user namespaces in <a href="https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.25.md">v1.25</a> (2022). For compatibility sake, it is unlikely that Kubernetes will ever enable User namespaces by default. However, Docker may still potentially enable user namespaces by default <a href="https://github.com/moby/moby/pull/38795">in future</a>. Nothing is decided yet, though.</p><h4>Rootless containers</h4><p><a href="https://rootlesscontaine.rs/">Rootless containers</a> is a technique to put container runtimes, as well as containers, in a user namespace that is created by a non-root user to mitigate potential vulnerabilities of runtimes.</p><p>Even if a container runtime has a bug that allows an attacker to escape from a container, an attacker can&#39;t have a privileged access to other user&#39;s files, kernel, firmware, and devices.</p><p>Here is a brief history of rootless containers:</p><ul><li><strong>2014</strong>: <a href="https://stgraber.org/2014/01/17/lxc-1-0-unprivileged-containers/">LXC v1.0</a> introduced support for rootless containers. At that time, rootless containers were called &quot;unprivileged containers&quot;. LXC&#39;s unprivileged containers are slightly different from modern rootless containers, as they require a <a href="https://man7.org/linux/man-pages/man1/lxc-user-nic.1.html">SETUID binary</a> for <a href="https://man7.org/linux/man-pages/man5/lxc-usernet.5.html">bringing up networks</a>.</li><li><strong>2017</strong>: runc <a href="https://github.com/opencontainers/runc/releases/tag/v1.0.0-rc4">v1.0-rc4</a> gained initial support for rootless containers</li><li><strong>2018</strong>: Several works has begun to support rootless containers in <a href="https://twitter.com/_AkihiroSuda_/status/953231819008180224">containerd</a>, <a href="https://twitter.com/_AkihiroSuda_/status/955698849560997888">BuildKit</a> (backend of `<strong>docker build</strong>`), <a href="https://github.com/AkihiroSuda/docker/commit/588a4e91fc8cb99af040dcde795ba6722a162127">Docker</a>, <a href="https://github.com/containers/podman/commit/19f5a504ffb1470991f331db412be456e41caab5">Podman</a>, etc., <a href="https://github.com/rootless-containers/slirp4netns">slirp4netns</a> (<strong>Slide 56</strong>) was created (by myself) to allow SETUID-less networking by translating Ethernet packets to unprivileged socket syscalls.</li><li><strong>2019</strong>: Docker <a href="https://docs.docker.com/engine/release-notes/19.03/#19030">v19.03</a> was released with an experimental support for rootless containers. Podman <a href="https://github.com/containers/podman/releases/tag/v1.1.0">v1.1</a> was also released with the same feature in this year, slightly ahead of Docker v19.03.</li><li><strong>2020</strong>: Docker <a href="https://medium.com/nttlabs/docker-20-10-59cc4bd59d37">v20.10</a> was released with general availability of rootless containers.</li></ul><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*41peAl7SSpEZqQGpkRmmUw.png" /></a></figure><p>Through 2020 to 2022, we also worked on <a href="https://github.com/rootless-containers/bypass4netns">bypass4netns</a> (<strong>Slide 57</strong>) to eliminate the overhead of slirp4netns, by hooking socket file descriptors inside a container and reconstructing them outside the container. The achieved throughput is even faster than &quot;rootful&quot; containers.</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*h5_2ZjoGixRrOfZdOu8ysA.png" /></a></figure><p>Rootless containers have successfully gained popularity, but there have been also criticisms against rootless containers. Especially, it is controversial whether non-root users should be allowed to create user namespaces that are required for running rootless containers. I&#39;d answer yes for container users, because rootless containers are at least much safer than running everything as the root. However, I&#39;d rather answer no for who don&#39;t use containers, because user namespaces can be also attack surfaces. e.g., <a href="https://www.tarlogic.com/blog/cve-2023-32233-vulnerability/"><strong>CVE-2023–32233</strong>: &quot;<em>Privilege escalation in Linux Kernel due to a Netfilter nf_tables vulnerability</em>&quot;</a>.</p><p>The community has been already seeking remedies for this dilemma. Ubuntu (since 13.10) and Debian provide a sysctl knob `<strong>kernel.unprivileged_userns_clone=&lt;bool&gt;</strong>` to specify whether to allow or disallow creating unprivileged user namespaces. However, their <a href="https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/kernel/user_namespace.c?id=342276469714b5a307745d1a3b9bdc146c804e4e">patch</a> is not merged in the upstream Linux kernel.</p><p>Instead, the upstream kernel introduced a new LSM (Linux Security Module) hook `<strong>userns_create</strong>` in Linux <a href="https://github.com/torvalds/linux/commit/7cd4c5c2101cb092db00f61f69d24380cf7a0ee8">v6.1</a> (2022) so that an LSM can dynamically decide whether to allow or disallow creating a user namespace. This hook is callable from <a href="https://docs.kernel.org/bpf/prog_lsm.html">eBPF (`<strong>bpf_program__atttach_lsm()</strong>`)</a>, so it is expected that there will be a fine-grained and non-distribution-specific knob that does not depend on AppArmor nor SELinux. However, userspace utilities for eBPF + LSM are not matured yet to provide a good user experience for this.</p><h4>More LSMs</h4><p><a href="https://landlock.io/">Landlock</a> LSM was merged into Linux <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=17ae69aba89dbfa2139b7f8024b757ab3cc42f59">v5.13</a> (2021). Landlock is similar to AppArmor in the sense that it restricts file accesses by paths (`<strong>LANDLOCK_ACCESS_FS_EXECUTE</strong>`, `<strong>LANDLOCK_ACCESS_FS_READ_FILE</strong>`, etc.), but Landlock does not require the root privilege for setting up a new profile. Landlock is also very similar to OpenBSD&#39;s `<a href="https://man.openbsd.org/pledge.2"><strong>pledge(2)</strong></a><strong>`</strong>.</p><p>Landlock is still <a href="https://github.com/opencontainers/runtime-spec/pull/1111">not supported by the OCI Runtime Spec</a>, but I guess it can be included in the OCI Runtime Spec v1.2.</p><h4>Kata Containers</h4><p>As I mentioned in <strong>Section 1</strong>, &quot;containers&quot; is not a well-defined terminology. Anything can be called &quot;containers&quot; when it provides good compatibility with the existing container ecosystem.</p><p><a href="https://katacontainers.io/">Kata Containers</a> (2017) are such sort of &quot;containers&quot; that are not actually containers in the narrower sense. Kata Containers are actually virtual machines but with support for the OCI Runtime Spec. Kata Containers are much more secure than runc containers, however, they have drawbacks on performance and they do not work well on typical non-baremetal IaaS instances that do not support nested virtualization.</p><p>Kata Containers works as a containerd runtime plugin, and receives same images and runtime configurations as runc containers. Its user experience is almost indistinguishable from runc containers.</p><h4>gVisor</h4><p><a href="https://gvisor.dev/">gVisor</a> (2018) is yet another exotic container runtime. gVisor traps syscalls and execute them in a Linux-compatible usermode kernel to mitigate attacks. gVisor currently has <a href="https://gvisor.dev/docs/architecture_guide/platforms/">three modes</a> for trapping syscalls:</p><ul><li><strong>KVM mode</strong>: rarely used, but the best option for bare-metal hosts</li><li><strong>ptrace mode</strong>: the most common option but slow</li><li><strong>SIGSYS trap mode</strong> (since 2023): expected to replace ptrace mode eventually</li></ul><p>gVisor has been used in Google&#39;s several products including Google Cloud Run. However, Google Cloud Run has switched away from gVisor to microVM in 2023:</p><blockquote><strong>“This means that software that previously didn’t run in Cloud Run due to unimplemented system call issues can now run in Cloud Run’s second-generation execution environment.”</strong><br>From <a href="https://cloud.google.com/blog/products/serverless/cloud-run-jobs-and-second-generation-execution-environment-ga/?hl=en">https://cloud.google.com/blog/products/serverless/cloud-run-jobs-and-second-generation-execution-environment-ga/?hl=en</a></blockquote><p>This implies that gVisor&#39;s performance and compatibility issues are not negligible for their business.</p><h4>WebAssembly</h4><p><a href="https://webassembly.org/">WebAssembly</a> (WASM) is a platform-independent byte code format that was originally designed for Web browsers in <a href="https://blog.mozilla.org/luke/2015/06/17/webassembly/">2015</a>. WebAssembly is somewhat similar to Java applets (1995) but it puts more focus on portability and security. One interesting aspect of WebAssembly is that it splits the code address space from the data address space; there are no instructions like `<strong>JMP &lt;<em>immediate</em>&gt;</strong>` and `<strong>JMP *&lt;<em>reg</em>&gt;</strong>`. It only supports <a href="https://webassembly.github.io/spec/core/syntax/instructions.html#control-instructions">jumping to labels that are resolved on compilation time</a>. This design reduces arbitrary code execution bugs, although it also sacrifices feasibility of JIT-compiling other byte code formats into WebAssembly.</p><p>WebAssembly is also in the spotlight as a potential alternative to containers. For running WebAssembly out of browsers, <a href="https://wasi.dev/">WASI </a>(WebAssembly System Interface) was proposed in 2019 to provide low-level API (e.g., <a href="https://github.com/WebAssembly/WASI/blob/main/legacy/preview1/docs.md">`<strong>fd_read()</strong>`, `<strong>fd_write()</strong>`, `<strong>sock_recv()</strong>`, `<strong>sock_send()</strong>`</a>) that can be used for implementing POSIX-like layers on it. containerd added &quot;<a href="https://github.com/containerd/runwasi">runWASI</a>&quot; plugin in 2022 to treat WASI workloads as containers.</p><p>In 2023, <a href="https://wasix.org/docs/api-reference">WASIX</a> was proposed to extend WASI to provide more convenient (and somewhat controversial) functions:</p><ul><li><strong>Threads</strong>: `<a href="https://wasix.org/docs/api-reference/wasix/thread_spawn"><strong>thread_spawn()</strong></a>`, `<a href="https://wasix.org/docs/api-reference/wasix/thread_join"><strong>thread_join()</strong></a>`, ...</li><li><strong>Processes:</strong> `<a href="https://wasix.org/docs/api-reference/wasix/proc_fork"><strong>proc_fork()</strong></a><strong>`</strong>, `<a href="https://wasix.org/docs/api-reference/wasix/proc_exec"><strong>proc_exec()</strong></a>`, ...</li><li><strong>Sockets</strong>: `<a href="https://wasix.org/docs/api-reference/wasix/sock_listen"><strong>sock_listen()</strong></a>`, `<a href="https://wasix.org/docs/api-reference/wasix/sock_connect"><strong>sock_connect()</strong></a>`, ...</li></ul><p>Eventually, these movements may replace a huge (but non-100%) portion of containers. Solomon Hykes, the founder of Docker, says that &quot;<em>If WASM+WASI existed in 2008, we wouldn’t have needed to created Docker</em>&quot;:</p><h3>Solomon Hykes / @shykes@hachyderm.io on Twitter: &quot;If WASM+WASI existed in 2008, we wouldn&#39;t have needed to created Docker. That&#39;s how important it is. Webassembly on the server is the future of computing. A standardized system interface was the missing link. Let&#39;s hope WASI is up to the task! https://t.co/wnXQg4kwa4 / Twitter&quot;</h3><p>If WASM+WASI existed in 2008, we wouldn&#39;t have needed to created Docker. That&#39;s how important it is. Webassembly on the server is the future of computing. A standardized system interface was the missing link. Let&#39;s hope WASI is up to the task! https://t.co/wnXQg4kwa4</p><h3>Recap</h3><ul><li>Containers are more efficient, but often less secure, than virtual machines. Lots of security technologies are being introduced to harden containers. (User namespaces, Rootless containers, Linux security modules, ...)</li><li>Alternatives to Docker are arising (containerd, CRI-O, Podman, nerdctl, Finch, ...), but Docker isn’t fading out.</li><li>“Non-container” containers are trends too.<br>(<strong>Kata</strong>: VM-based, <strong>gVisor</strong>: user mode kernel, <strong>runWASI</strong>: WebAssembly, ...)</li></ul><p><strong>Slide 71</strong> shows the landscape of the well-known runtimes.</p><figure><a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_x0ujgxNUyzBIco_J6O-mw.png" /></a></figure><p>See also the rest of the <a href="https://github.com/AkihiroSuda/AkihiroSuda/raw/5d9f0b1cd9b8c37cb1951768a3bebdb08a3a469e/slides/2023/20230615%20%5BKyoto%20University%5D%20The%20internals%20and%20the%20latest%20trends%20of%20container%20runtimes.pdf">slides</a> for the further topics that could not be covered in the talk.</p><h3>NTT is hiring!</h3><p>We at <a href="https://www.rd.ntt/e/">NTT</a> have been proudly leading the trends of containers and other open source software. Visit <a href="https://www.rd.ntt/e/sic/recruit/">https://www.rd.ntt/e/sic/recruit/</a> to see how to join us.</p><p>私たち<a href="https://www.rd.ntt/">NTT</a>は、コンテナ等のOSSの流行を牽引していることを自負しています。ぜひ弊社採用情報ページをご覧ください: <a href="https://www.rd.ntt/sic/recruit/">https://www.rd.ntt/sic/recruit/</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=22aa111d7a93" width="1" height="1" alt=""><hr><p><a href="https://medium.com/nttlabs/the-internals-and-the-latest-trends-of-container-runtimes-2023-22aa111d7a93">The internals and the latest trends of container runtimes (2023)</a> was originally published in <a href="https://medium.com/nttlabs">nttlabs</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>