Stories by Vincent Wang on Medium

The State of MAVLink in 2021

Vincent Wang — Sun, 26 Dec 2021 08:13:22 GMT

It’s about the end of the year, a perfect time for a ranting post on the state of certain open-source projects.

MAVLink has been the standard protocol for drone communication and control for a long time now. It’s been updated and extended significantly for over a decade, and is currently in use by both the PX4 and Ardupilot open-source autopilot projects, as well as others. I want to make it clear that this post is not an attack or insult on MAVLink at all; it overall does its job well.

That being said, this post wouldn’t be here if there weren’t major issues surrounding the project (at least in my view). This post aims to summarize the current state of the protocol, pain points, and issues that I have run into while using the protocol.

This is a disclaimer that the contents of this article are written entirely off of my own understanding and experiences with MAVLink, and does not come with a guarantee of accuracy. With that out of the way, here is the actual content.

How the MAVLink Protocol Works

In order to understand some of the issues with MAVLink, we should first describe how the MAVLink protocol works.

A standard MAVLink v2 data frame.

MAVLink is a very simple publish-subscribe protocol with a fixed standard message set (which can be customized, but rely on all devices on the network understanding that message set).

A MAVLink network is comprised of systems, which represent physical systems such as drones, ground stations etc.. Each device in the system is identified with a component ID, which has standard values to describe the type of the component.

Each device will give itself a system and component ID, and publish heartbeats on the network (as in most protocols) to let other devices know it’s there. This heartbeat contains the type of the device (IMPORTANT FOR LATER!), autopilot info if applicable, and some other information.

Devices can publish messages (e.g. for telemetry). Messages have fixed message IDs, and other devices can listen for a certain message ID to subscribe to the message. Other devices can also read the system and component ID from the message to figure out where the message came from.

Services

Although MAVLink is a publish-subscribe protocol, it is capable of emulating the traditional service-client (1 server, 1 caller) architecture. The most common example of this is the Command Protocol, which is used to execute commands such as arming, takeoff etc.

The protocol flow for the Command Protocol. GCS means ground station, drone means drone.

There is a whole collection of microservices supported by MAVLink, which involve exchanging a series of messages to support services such as writing parameters to a vehicle.

An Addendum on Routing

Routing is perhaps one of the more complicated parts of MAVLink, and given that even the standard protocol libraries don’t seem to implement it properly, it deserves a section here.

All devices on a MAVLink system have a system ID and a component ID. The system ID identifies a complete system, while the component ID identifies a component on the system; so for example, a drone’s autopilot might be (sysid: 1, compid: 1), while that drone’s gimbal might be (sysid: 1, compid: 154).

Realistically the only commonly used compids are 1 (autopilot), 190 (ground control), and rarely 154 (gimbal).

One important issue to note is that one connection != one system. Multiple drones/systems can exist over one UDP/serial/whatever connection. Strangely all of the existing common libraries assume that there is only one drone per connection, something that will be ranted about down below.

Speaking of rants:

The Ranting Section

Now that you have some background on the protocol, let’s talk about what’s wrong with it and the library ecosystem surrounding it.

Just Routing, like, the whole thing

It is very rare that will you find a system that uses two IDs to identify devices instead of 1. That being said, this is excusable because we want to know which system the device is part of, and encoding that into every message is a reliable way of doing so.

That’s a small nitpick, however. The more pressing issue is the none of the current largest MAVLink libraries implement the routing protocol properly. Technically, the raw C header library does, but it literally only sends and receives messages, so we can’t count it. Here’s a rundown:

Pymavlink, the simple reference Python library, assigns a master.target_system and master.target_component based on the first heartbeat it reads. That being said, you can also manually feed in your own system and component IDs so it technically works, but like the C header library it’s very low level.
Dronekit still has not managed to implement it in master. The issue seems stale as of last year.
MAVSDK technically has multi-drone support in C++ but, like DroneKit, it is one-system-one-connection. Its bindings require a (simple) workaround for multi-drone support — set the mavsdk_server port to different ports, and it’ll start separate servers, allowing multi-drone control; but again, it is still one system per connection. Its heavily abstracted design removes the concept of system and component IDs for the user for the most part.

MAVLink also follows the practice of hard-fixing meanings to component IDs — like having gimbals be 154, etc. Technically, this makes it easier to deconflict IDs, but it comes with a number of disadvantages:

We’re assigning semantic meaning to the IDs, a role that’s already fulfilled by MAV_TYPE and a decision that limits the flexibility.
Having two sources of component type information causes confusion. Some libraries may check the component ID instead of the MAV_TYPE, which is incorrect.
Multiple of the same device will have the same default component ID and require deconfliction anyway.

Node deconfliction is a somewhat complex topic, but there are a number of easy ways in which MAVLink could implement it; a crude method is to simply listen for other nodes with the same ID for a period, and reassign.

Basically, the TL;DR of routing is that libraries don’t implement it properly, and it is not super flexible. The protocol itself is acceptable, but library support is meh.

Inconsistency

MAVLink is a standard. That means industry standard open-source autopilots should implement the standard consistently, but we’ve seen that they often don’t.

For instance, take the simple MAV_CMD_NAV_TAKEOFF command:

That altitude parameter is completely ignored by PX4, which instead only pays attention to the MIS_TAKEOFF_ALT parameter. Ardupilot implements the altitude parameter.

Where is this documented? Nowhere, except if you dig through the MAVSDK implementation for takeoff.

Ardupilot and PX4 also implement different sets of modes, even though they are functionally the same thing. This is the standard DO_SET_MODE command:

Simple? NO.

That first parameter is a standard enum of modes:

Simple vehicle modes.

PX4 and Ardupilot promptly ignore this first field entirely and instead implement custom modes, but ones that do the same thing (Mission, Guided/Offboard, etc.) While custom modes are useful, it would benefit the community to be able to have a standard set of modes for vehicles, instead of completely separate modes that largely have the same features.

Microservice Protocols

MAVLink has a large collection of microservices that implement certain functionality, such as sending mission descriptions, writing parameters, etc.

There are issues with some of the microservice protocols. Most of the issues boil down to three points:

Overcomplicated
Scope creep
Not implemented consistently or properly

Camera Protocol

The MAVLink Camera Protocol is a protocol for generally exposing camera streams and image servers. It’s…okay, in that it fulfills the job of exposing the stream/image server, but it contains a lot of redundant information. Take, for example, the CAMERA_INFORMATION message:

The CAMERA_INFORMATION message has a lot of (redundant) fields.

This doesn’t seem terrible, except all of that information is also encoded in the camera definition file. Then, the VIDEO_STREAM_INFORMATION has similar data:

More fields.

except resolution, bitrate etc. are all available via the RTSP/RTP/whatever stream itself, usually.

The camera protocol is honestly not that bad of an offender; it is just rather complex — it requires a companion computer to function correctly, and preferably a dedicated radio (MAVLink FTP is very, very slow). There’s a reference implementation, but is archived.

My major gripe with this protocol is that it depends on an external HTTP/FTP server + companion computer for the definition and stream anyway. MAVLink is clearly not the optimal protocol for this application; it would likely be better to standardize an HTTP-based API for requesting camera information, and use MAVLink only to point ground control software at the server.

Gimbal Protocol (v1 and v2)

The Gimbal Protocol was updated with a v2 a while back to address issues with the v1 protocol, mainly its ambiguous message set and performance issues. V2 made major changes, the most important of which is an explicit separation between gimbal device and gimbal manager:

An example where the autopilot is the gimbal manager.

An example where the gimbal is its own manager. The gimbal device doesn’t exist here because it is the same thing as the gimbal manager.

An example where a companion computer manages the gimbal device.

The separation is a good idea; v1 struggled because both the autopilot and ground station would often try to control the gimble directly, which would cause conflicts. The gimbal manager aims to solve this; its primary job is to implement higher-level functionality and deconflict control.

The way it deconflicts control is by assigning a primary and secondary control system+component ID; the implementation of this mixing is not defined, which is a minor issue. The behavior of primary and secondary control is not defined to be consistent across platforms, which could lead to portability issues, although this is solvable.

There are also a large number of manager commands, some of which will not be supported by all systems. While GIMBAL_MANAGER_INFORMATION contains some facilities for reporting capabilities, I think more fine-grained capability reporting (e.g. can track ROI but not WPNEXT) would be useful.

That’s a lot of messages.

There is also a somewhat confusing gimbal manager to device relationship. The protocol calls for one gimbal manager to one device; messages such as GIMBAL_MANAGER_INFORMATION use a gimbal device ID directly rather than calling it a manager ID, so the wording is confusing.

Additionally, gimbal managers have no component ID. They are instead attached to another device (autopilot, gimbal device etc.) but which device they are attached to is generally unclear.

It would be much clearer if gimbal managers were considered their own devices; it would simplify addressing gimbal managers (no need for a separate gimbal device ID in the control message!) and clear up confusion.

Better yet, remove the gimbal device from the MAVLink network. This seems to me that moving the gimbal device out of the main MAVLink network and having it be isolated through the gimbal manager greatly reduces confusion; the gimbal manager is the representation of the gimbal on the main network. Additionally, it opens up fun possibilities for ideas such as chaining gimbal managers; a lower-level gimbal manager implemented on a physical gimbal that can only control angle can be chained into an internal autopilot or companion gimbal manager, which allows less advanced hardware to take advantage of autopilot sensors and systems.

Conclusion

MAVLink in 2021 is a fairly usable protocol that suffers from some incorrectly implemented libraries and feature creep. The core protocol does a great job of controlling drones; auxiliary protocols such as the gimbal and camera protocol may benefit from moving some functionality outside of MAVLink. Multi-drone control and correct routing is an issue that libraries sorely need to address.

Here’s to hoping that 2022 will bring further improvements to the protocol and the ecosystem surrounding it.

Setting up an Ubuntu chroot on your Linux distro

Vincent Wang — Fri, 06 Nov 2020 06:41:33 GMT

Setting up an Ubuntu chroot on your Linux distro with schroot

Sometimes, you just need Ubuntu.

In my daily life, I usually use Arch Linux as my daily driver (barring GNUless November). It works amazing, the AUR is a blessing from Harambe, etc etc, all the stuff you’ve probably already heard from Arch users. The problem is that unfortunately, not all things work on Arch; for example, I occasionally need to mess with ROS (Robot Operating System), and the community-maintained AUR distribution seems to always have dependency issues, so I wanted to install the official Ubuntu distribution but still be able to use it cleanly within my Arch system.

This article assumes you’re using an amd64/x86_64 system.

Why not a container?

Yes, containers are nice and pretty epic, but for my use case, I thought it was too much of a hassle — containers are too isolated in this case for my tastes. So for this specific purpose, I wanted to run something a bit lighter and less isolated than a container; chroots are perfect for that.

Enter: schroot+debootstrap

schroot and debootstrap are 2 tools that make managing Debian-based chroots incredibly simple. schroot manages setting up the chroot (it can even mount your home directory, so all your configs are there!), while debootstrap allows you to basically bootstrap your chroot with a Debian derivative with one command.

First, you’ll need to actually install the two tools. Since i use Arch:

pacman -S schroot debootstrap

Next, set up the chroot definition. Edit /etc/schroot/chroot.d/.conf (you can name the chroot whatever you want). Next, put in this content:

[]
description=
directory=/srv/chroot/
root-users=
type=directory
users=

This will set up a new chroot at /srv/chroot/. We can actually make as many chroots as we want; just put another config file in the directory.

Next, make the chroot directory:

sudo mkdir -p /srv/chroot/

And bootstrap it using debootstrap.

sudo debootstrap --arch=amd64 focal /srv/chroot// http://archive.ubuntu.com/ubuntu/

This command will actually bootstrap from any Debian derivative, not just Ubuntu. You just need to change the mirror (http://archive.ubuntu.com/ubuntu/) and distribution version (focal). This specific command bootstraps Ubuntu 20.04 (focal).

That’s more or less it! You now have a working Ubuntu chroot. You can run schroot -c to enter the chroot. However, we still need to set some stuff up before it works properly.

Post Setup

After the chroot is set up, we need to do a few more things:

Fix network config (do only once)
Add Ubuntu sandbox users and groups on host system (so apt works) (do only once)
Enable repos
Give your user sudo

Fix the network config by commenting out networks in /etc/schroot/default/nssdatabases. This tells schroot to not copy the networks from host system. If you leave this on in Arch it breaks the network.

Add the sandbox users and groups (run this on the Arch host, not inside the chroot):

sudo useradd -u 124 _apt
sudo useradd -u 939 geoclue
sudo useradd -u 694 man
sudo groupadd crontab
sudo groupadd messagebus

(For the astute, we’re specifying UIDs under 1000 so they don’t show up on your login screen. They’re random numbers.)

Next, actually enter your chroot as root with sudo schroot -c , and add your user to sudo with

usermod -a -G sudo

Next, exit and re-enter your chroot as your normal user (sudo schroot -c ), and add the universe, multiverse and restricted repos (you don’t have to do this but you probably want to in order to get most of Ubuntu’s packages):

sudo apt update
sudo apt install software-properties-common
sudo add-apt-repository universe
sudo add-apt-repository multiverse
sudo add-apt-repository restricted

You probably want to add focal-security as well, for more up to date packages. Run this command:

echo "deb http://security.ubuntu.com/ubuntu focal-security main" >> /etc/apt/sources.list

as root (sudo -s).

That should be it! Now your Ubuntu chroot will basically work exactly like an actual Ubuntu installation, and you can install all the Ubuntu programs you need without ever leaving your distro of choice. Hope this guide helped!

Please Stop Hijacking Our Scroll Wheels

Vincent Wang — Sat, 18 Jul 2020 05:49:30 GMT

Web developers are no strangers to hijacking regular browser functions in order to do something they consider “cooler”. Overall, this is probably a plus; without overriding regular browser functions, modern website functionality we take for granted today (for example, Single Page Applications) wouldn’t be possible. However, this ability has also been turned towards more insidious applications — namely, hijacking the scroll wheel to do “cool” animations and other actions.

What am I talking about? Just take a look at Apple’s product pages. Try scrolling. The function of the scroll wheel bounces around all over the place. Is it playing an animation? Scrolling the actual page? No, wait — now it’s changing the background. Oh, wait, it’s scrolling again. The scroll wheel just won’t stay put. The end result? The page is extremely hard to navigate — look at how much you have to scroll to get to the bottom of the page:

Just look at the size of that scrollbar!

Those who have followed web design for a while might know that this problem is not exactly new — the problem has existed for years, with multiple other voices loudly complaining about the usability nightmare it poses.

But if web designers have known about the problem for years, then why does it still exist?

It mainly comes down to web designers over-designing in order to make a website look edgy and cool. The product page above is a classic example of this; in an effort to replicate the slick in-your-face editing shots of the Apple commercials, the product page takes the scroll wheel and throws its regular function out of the window, using it to play through a bunch of in-your-face animations and changes.

Apple, of course, is not the only offender in this design trend; countless websites use some form of scroll-jacking, ranging from simple full-page scrolling to using the scroll wheel to play through all sorts of fancy animations and effects. However, just because a lot of websites use scroll-jacking does not make it a good practice.

Seriously, don’t do it.

Need some reasons? Let’s discuss a few:

Intuitiveness

This is probably the most important issue in regard to scroll hijacking — and, to be honest, UX design in general.

When we use a function, we expect the function to do what we expect it to. In the case of the scroll wheel, that’s scrolling. If the page doesn’t scroll when the user scrolls, it creates a break between what the user expects and what the page does; this leads to an unintuitive interface that feels cumbersome and confusing to navigate.

Different scroll hijacking methods violate this rule to different degrees; for example, most simple full-page scroll websites are relatively unobtrusive, simply making the site scroll screen-by-screen instead of the regular behavior. Therefore, if you really really want some full-page scrolling, it’s OK to implement, as long as it doesn’t disrupt the user experience too much. Which brings me to my second point…

Consistency

Some websites allow you to scroll naturally at first, but then decide halfway down the page that it wants to take control of your scroll wheel.

Please don’t do this.

If you’re going to change the scroll wheel functionality, change it in a consistent manner — don’t have a section that scrolls naturally and then suddenly transitions into full-page scrolling, or vice versa. If you’re going to scroll-jack, do it all the way. A sudden change in the function of the scroll wheel is jarring and distracting.

Navigability

Is that a word?

Doesn’t matter — the point is that on a page, being able to navigate up and down the page is essential. Scroll hijacking interferes with this (although, again, some methods affect UX less than others). Full-page scrolling, for example, limits the rate at which you can scroll to a constant animation speed, making it frustrating to scroll to the bottom of the page. On the other hand, scrolling tied to animation a la Apple leads to unreasonably long scroll bars, forcing you to partake in a forced 100 meter scroll race to get to the bottom. Either way, changing the scroll functionality will inevitably lead to some issues with site navigation, ranging from mildly annoying to downright unusable.

Summary

TL;DR: The best option is to just leave the default scrolling in place. You probably don’t need scroll hijacking.

If you’re really, really sure that you do want to change the scroll functionality of your site, consider these few tips:

Be consistent about it. If you’re changing the scroll functionality, make sure it does one thing only; don’t transition between natural scrolling and full-page scrolling.
Keep your site navigation in mind. Try and control the size of your scrollbar, and if you’re using full-page scrolling, don’t set the animation speed to an unbearably slow pace.
Avoid tying scrolling to animation. Not only is this unintuitive, but it also doesn’t even look that good — for instance, on a desktop, some scroll wheels only scroll a chunk at a time, which means that the animation will play, pause, then play in fits and starts as the user scrolls down the page. A better solution may be to implement a sort of full-page scrolling for animations — instead of tying the entire animation to the scroll wheel (which leads to the animation starting and stopping), the animation simply plays when the user scrolls past a certain threshold, in order to transition between two screens. (Note: we’re talking here about animations tied to scroll positions a la Apple website; animating elements on scroll-in is fine.)

Thanks for reading!

Bibify: Building an open-source citation service

Vincent Wang — Tue, 19 May 2020 04:16:36 GMT

bibify

DISCLAIMER: This article is shameless self promo. The links for the frontend and backend source code are here: https://gitlab.com/bibify. If you want to read about how we did it, go on.

We’ve all had to do it: a teacher assigns a research project and says that it needs a full bibliography, formatted in MLA or APA or whatever. Now, most of us don’t bother to remember how to cite a website in MLA (because how often do you actually need to do that in life?), so we usually turn to a citation generator.

These things usually suck.

Most of these citation generators are slow and laden with ads. The ones that aren’t usually don’t work properly. We put up with most of them because the alternative would be to do it by hand, and doing it by hand is painful. So what if we built an free and open source citation generator?

Step 1 — the frontend

Our citation generator needs a frontend. Easy job. Grab some React, your favorite components library, and slap them together. Bing bang bong, add some axios for making HTTP requests, a querystring library, and you’re done!

The source for the frontend is here: https://gitlab.com/bibify/bibify.

Step 2 — the backend

The backend side is a bit harder. In order to match current citation generators in features, our app needs to be able to:

Generate accurate citations for all common citation styles
Get book and website info to auto-cite (because entering data by hand is lame)

Let’s break down these problems.

Fetching Website Info

Fetching website info is pretty easy. Grab a metadata scraping library and point it at the website you want to get the info of. That’s about it!

Fetching Book Info

Fetching accurate book information for free is a bit harder than fetching websites; the book database that most people use is locked behind a subscription fee. However, the Google Books API is free, and there’s no limit! (Although they do request that you stay under 10,000 requests per day as a courtesy limit.) All we need to do is grab a good wrapper library, give it a search query, and we’re good to go.

Generating accurate citations

Generating accurate citations for every common citation style is difficult, especially because doing it yourself would mean reading through every citation style guide’s rules on how to cite every media type. Luckily, we don’t have to! The CSL (Citation Style Language) project already contains 9000+ citation styles that we can use. Combine this with the citeproc-js processor, which takes these citation styles and spits out a citation, and we’re in business!

Except it’s not that simple.

Being able to access 9000 styles through citeproc-js is great, but getting it to work is a bit of a slog. In order to use the citeproc-js engine, you need to write your own sys object which provides the functions retrieveItem() and retrieveLocale(); while it isn’t really that hard to write these functions yourself, it is still a good amount of boring boilerplate. So, to solve this, we s̶t̶e̶a̶l̶ borrow this nice wrapper (as well as this helper script). Now, instead of writing our own sys object, we can just let sys = citeprocnode.simpleSys(). Much easier, isn’t it?

Once we have our sys object, it’s a simple matter of giving it an item and calling makeBibliography:

Just load in the CSL style file, the locale, and the bibliography item (formatted in CSL-JSON), and it spits out a bibliography!

Side Note: CSL-JSON

In order to generate a bibliography, you need to load your info into citeproc-js in CSL-JSON format. It varies based on type, but it more or less looks something like this:

{
  'id': 'random-id-aihwgew',
  'type': 'book', // any one of the CSL types listed here
  'title': 'A Book about Something',
  'publisher': 'Random Publisher Inc.',
  // other type-specific fields,
  'authors': [
    { 'family': 'Last', 'given': 'First' }
  ]
}

The full docs for CSL-JSON are here.

Pitfalls

One thing about CSL is it has 2 categories of styles: independent and dependent. Basically, the dependent styles are different names for an independent style — for example, Harvard Educational Review links to APA. Unfortunately, that’s the only information that’s available in the dependent style:

An example of a dependent CSL style. Note that only the parent style and some metadata are available.

This means that trying to load this style directly into citeproc-js won’t work, because citeproc-js is expecting a full independent CSL style:

An example of an independent style. Note how there’s a lot more information on how to actually create the citation.

So, we need to grab the linked independent style from the dependent style and load that instead:

Here we use xpath to grab the path of the linked independent CSL style and load that instead.

Now we’re in business!

That’s the main CSL issue we need to get out of the way. Here’s some other minor inconveniences:

This is more of a side note and less of a pitfall, but it turns out that the “Harvard” style that most popular citation services provide doesn’t really exist in CSL. There’s a (deprecated) harvard1.csl reference Harvard style, which links to the Harvard Cite Them Right style. However, most universities actually have their own Harvard variant.
MLA 7 and 8 are both shortened to “MLA”, so when you display the short titles side by side (as bibify does), both of them show up as “MLA”. This is easily fixed by going into each CSL file (modern-language-association.csl and modern-language-association-7th-edition.csl) and changing the content.

The Future

While bibify currently has feature parity with other popular citation generators, there’s some things still planned in the works:

Bibify currently scrapes websites for metadata in real time. While this approach generally works, it also means that slow websites will take longer to cite, and websites that are down won’t be able to be cited at all. To solve this, we’re planning on maintaining a cache of cited websites, as well as integrating with The Wayback Machine. (NOTE: As of update 2020.05.18, Bibify now caches website fetch results with superagent-cache. While this speeds things up, websites that are down still can’t be cited.)
Autocitation only currently works with books and websites. We can improve on this by adding autocitation for other media types.
Like most citation generators, Bibify struggles to handle autociting author names with more than two words (e.g. “Bartolome de las Casas”). Currently, Bibify simply treats author names with more than two words as a literal, meaning that the name is put in as is; this is generally not compliant with most citation styles.
The frontend UI can definitely be improved.

Come and contribute!

This open-source citation generator project lives at https://gitlab.com/bibify. Come on over and contribute! File an issue or two, maybe fix some bugs or add some new features. New additions are always welcome.

Creating a D-Bus Service with dbus-python and Polkit Authentication

Vincent Wang — Sat, 09 Nov 2019 10:20:24 GMT

D-Bus is the standard for inter-process communication for Linux desktop applications. Both Qt and GLib have high-level abstractions for D-Bus communication, and many of the desktop services we rely on export D-Bus protocols. However, D-Bus has its shortcomings — namely a lack of documentation. Let’s explore how to write our own D-Bus Service in Python and connect it to Freedesktop.org’s PolicyKit API to provide user authentication.

What is D-Bus?

D-Bus is a standard IPC/RPC protocol introduced by Freedesktop.org as a way of unifying the messy landscape of inter-process communication on Linux desktops under one standard. In other words, it’s a way for programs to communicate with each other.

How is D-Bus organized?

D-Bus is organized into objects. These objects can be published on one of two “buses” (the system bus, in which there is one object for the whole system, or the session bus, in which each user session can have its own object). Objects play a double role as both an RPC object (you can call methods on the object) and as a publish/subscribe interface (you can subscribe to signals on the object). Each object defines interfaces, which describe and organize what each object can do.

Objects published on a bus are identified by a unique bus name (often written in reverse-DNS format, e.g. “org.freedesktop.NetworkManager”) and an object path, describing (e.g. “/org/freedesktop/NetworkManager”). If you’re confused about the difference between an object path and a bus name, this probably explains it better than me. In this article, we’ll explore how to export our own object onto a bus under an object path of our choosing.

Why use D-Bus?

While D-Bus has its shortcomings, it still remains the standard for Linux desktops today. All of Freedesktop.org’s APIs (that is to say, maybe half of the average Linux desktop in general) are published through D-Bus. D-Bus is well suited to the needs of an IPC system for the Linux desktop; it can handle publish/subscribe interfaces as well as methods, allowing for robust architectures.

Okay, can we build the thing?

Yes.

We’ll be using dbus-python to build our service. This means it’s primarily targeted towards GLib/GTK apps. If you’re looking to integrate your Qt app with D-Bus, there’s great first-party docs on that.

Step 1 is to install the dbus-python bindings. Most package managers should have a package for it; it might already be installed by something else. If your package manager’s dbus-python version is out of date, you can always pip install it.

While you’re at it, install D-Feet for easy GUI D-Bus debugging.

Oh, and you’ll also want to grab PyGObject so the service has a mainloop, otherwise it won’t listen continuously.

Now that we have the dbus-python bindings, let’s start by creating a simple service:

service.py

import dbus
import dbus.service
import dbus.mainloop.glib

from gi.repository import GLib

class HelloWorld(dbus.service.Object):
    def __init__(self, conn=None, object_path=None, bus_name=None):
        dbus.service.Object.__init__(self, conn, object_path, bus_name)

if __name__ == "__main__":
    dbus.mainloop.glib.DBusGMainLoop(set_as_default=True)
    bus = dbus.SessionBus()
    name = dbus.service.BusName("com.example.HelloWorld", bus)
    helloworld = HelloWorld(bus, "/HelloWorld")
    mainloop = GLib.MainLoop()
    mainloop.run()

Let’s break it down line by line:

1–5: imports

7: We declare a subclass of dbus.service.Object. This will define the object we export onto the bus.

8: We define an init function. It doesn’t have to follow this as long as the superclass constructor in line 9 gets the proper arguments, but I think this is probably the cleanest way.

9: We call the superclass constructor. We pass in self , conn (the bus connection), object_path (the object path we want to use, as str), and the bus name we want to export under.

11: It’s an if __name__ == "__main__" block. Python devs should probably know what this is.

12: We tell D-Bus to use a mainloop. This allows the service to listen for requests.

13: We get a connection to the Session Bus (if you want SystemBus instead, just replace SessionBus with SystemBus; everything’s the same).

14: We create a BusName to export our object under (“com.example.HelloWorld”) under the Session Bus.

15: We create the actual service object, passing in the bus connection, the object path and the bus name.

That is basically all you need to create a D-Bus service. Open up D-Feet and you should see something like this:

As you can see, our object shows up under /HelloWorld in com.example.HelloWorld, but it doesn’t have any content in it (besides the default Introspectable interface). Let’s fix that by adding a method to our class:

@dbus.service.method(dbus_interface="com.example.HelloWorldInterface", in_signature="s", out_signature="s", sender_keyword="sender", connection_keyword="conn")
def SayHello(self, name, sender=None, conn=None):
    return "Hello " + name

It’s technically only 3 lines, but there’s a lot to break down here. Let’s go line by line again:

1: the @dbus.service.method decorator tells D-Bus that this is a callable method that can be called on the object. This decorator takes several named arguments:

dbus_interface is the interface name we publish the method under. For methods, interfaces are merely a way of grouping functionality.
in_signature is a string of characters representing the datatypes of the parameters passed in; for a comprehensive guide, see the official D-Bus docs. Each piece of data is passed in as an argument to the method; in this example, we take in 1 string argument, name .
out_signature is the same as in_signature, but it represents return datatype instead of parameter datatype.
sender_keyword and connection_keyword are optional; basically, sender_keyword=“sender” basically means the ID of the user who called the method will be stored in the sender argument, and connection_keyword means the connection will be stored in conn .

2: Here we actually declare the function. Note that the types have to match the in_signature declared in the decorator. (P.S. In most D-Bus services, methods are in PascalCase; that’s just how it is, don’t ask me why.)

3: We return a result. Note that the type has to match the out_signature declared in the decorator.

Let’s see the result:

The SayHello method shows up under com.example.HelloWorldInterface as expected and takes the input and output expected. Great!

Now, let’s spice things up a bit.

Polkit Authentication

For the uninitiated, PolicyKit (Polkit for short) is the authorization system used by most of the Linux desktop today. Opened your software center and got prompted for a password? That’s polkit. Tried running GParted and you need root privileges? Polkit. Polkit automatically manages showing those nice authorization popups, so it’s less work for the dev and more ease of use for the user.

How do I use Polkit?

This is where the docs start to fall short. There’s not a lot of actual documentation on integrating your own app with Polkit; you could hunt through the docs all day to find out how it works but TL;DR: it’s a D-Bus API (and a C library, but of course we want to do Python, so D-Bus it is).

First, you need to configure a polkit auth level. Write this file to /usr/share/polkit-1/actions :

com.example.HelloWorld.policy


 "-//freedesktop//DTD PolicyKit Policy Configuration 1.0//EN"
 "http://www.freedesktop.org/standards/PolicyKit/1.0/policyconfig.dtd">

Example
  https://example.com/example


    Authorization
    Authentication is needed to perform this action.
    
        
      auth_admin
      auth_admin
      auth_admin

This XML configuration basically defines a polkit privilege called com.example.HelloWorld.auth which requires the user to enter admin credentials to gain authorization.

Of course, you also need to write the code to check privileges. To save time, here’s a convenience function for checking polkit privileges:

Pastebin

Vim screenshot because I couldn’t get it to format

Make sure to add self.dbus_info = None and self.polkit = None to the __init__ function.

Here’s a general summary:

1–8: Get a reference to the DBus object and use it to get the Unix Process ID of the user who called the method (we’ll need this for polkit).

10–15: Get a reference to the Polkit object.

17–32: Use the Polkit object to check authorization. If it times out, try again. The polkit CheckAuthorization method takes these arguments:

Subject (aka type — “unix-process” here — and a dictionary of details (pid as UInt32 and start time of 0 here ))
Privilege — a string describing the Polkit privilege we defined earlier.
Details — details describing the action; for example, you can set a message for the authorization dialog that overrides the default one (here we leave it empty)
CheckAuthorizationFlags — aka 0 = no attempt to authenticate user, or 1 = show them an authentication dialog
CancellationID — just leave this one empty
timeout=seconds until timeout

CheckAuthorization will return a set of values in the format (bool is_authorized, bool is_challenge, dict details). is_authorized is likely the one you want to look at; it tells you whether the user is authenticated or not. We return false if is_auth is false, true otherwise; if we choose, we can also substitute returning true or false for raising an exception when is_auth is false.

Let’s try it out:

Whoops — it gives us an error. Only services running as root or running as an action owner (that we have to specify in the config file) can actually check for authorization, which makes sense given that non-privileged services would have little use for authorization. We could just run the service as root, but because it’s on a SessionBus, the service won’t be accessible to regular users, which ruins the point of authentication. We’ll need to convert this service into a SystemBus service.

Converting to SystemBus

SystemBus services are a little more involved than SessionBus services. Unlike SessionBus services, SystemBus services require a config file to specify the service; if you don’t have it, you’ll get an error. Write this file to /etc/dbus-1/system.d/ :

com.example.HelloWorld.conf


"http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd">

  system

Once we have written the XML config file, D-Bus should allow us to run our service with sudo: sudo python3 service.py

Now does it work?

Yes it does.

Congratulations — you’ve just built a D-Bus service with polkit authentication! If you want further D-Bus guidance, check out the official dbus-python docs.

Gotchas — debugging

Polkit can be finicky sometimes; I lost several hours debugging why I wasn’t getting an authorization dialog before I realized I needed to have my authentication agent running first (to be fair, that particular case was more my own fault than polkit’s, but the point stands). Some useful tips for debugging a polkit service:

Print out is_challenge. If is_challenge is true, you probably don’t have your authentication agent running; while this shouldn’t be a problem on most major DEs, you may have issues if you configured your own WM setup.
Print out details; details can show you things you might have missed, such as an incorrect argument placement.

Credits

A huge thanks to this guy on Ubuntu Forums from 2009 for providing example code; it really helped me figure out how polkit and D-Bus work (the _check_polkit_privilege() function is mostly copied from his code, with some modifications).

Also thanks to the D-Bus and dbus-python official documentation.