Your questions, Part I | BSD Now 210
Posted on: September 7, 2017

We take a look at the reimplementation of NetBSD using a Microkernel, check out what makes DHCP faster, and see what high-process count support for DragonflyBSD has to offer, and we answer the questions you’ve always wanted to ask us.
RSS Feeds:
MP3 Feed | iTunes Feed | HD Vid Feed | HD Torrent Feed
Become a supporter on Patreon:
– Show Notes: –
Headlines
A Reimplementation Of Netbsd Using a Microkernel
- Minix author Andy Tanenbaum writes in Part 1 of a-reimplementation-of-netbsd-using-a-microkernel
Based on the MINIX 3 microkernel, we have constructed a system that to the user looks a great deal like NetBSD. It uses pkgsrc, NetBSD headers and libraries, and passes over 80% of the KYUA tests). However, inside, the system is completely different. At the bottom is a small (about 13,000 lines of code) microkernel that handles interrupts, message passing, low-level scheduling, and hardware related details. Nearly all of the actual operating system, including memory management, the file system(s), paging, and all the device drivers run as user-mode processes protected by the MMU. As a consequence, failures or security issues in one component cannot spread to other ones. In some cases a failed component can be replaced automatically and on the fly, while the system is running, and without user processes noticing it. The talk will discuss the history, goals, technology, and status of the project.
Research at the Vrije Universiteit has resulted in a reimplementation of NetBSD using a microkernel instead of the traditional monolithic kernel. To the user, the system looks a great deal like NetBSD (it passes over 80% of the KYUA tests). However, inside, the system is completely different. At the bottom is a small (about 13,000 lines of code) microkernel that handles interrupts, message passing, low-level scheduling, and hardware related details. Nearly all of the actual operating system, including memory management, the file system(s), paging, and all the device drivers run as user-mode processes protected by the MMU. As a consequence, failures or security issues in one component cannot spread to other ones. In some cases a failed component can be replaced automatically and on the fly, while the system is running.
The latest work has been adding live update, making it possible to upgrade to a new version of the operating system WITHOUT a reboot and without running processes even noticing. No other operating system can do this.
The system is built on MINIX 3, a derivative of the original MINIX system, which was intended for education. However, after the original author, Andrew Tanenbaum, received a 2 million euro grant from the Royal Netherlands Academy of Arts and Sciences and a 2.5 million euro grant from the European Research Council, the focus changed to building a highly reliable, secure, fault tolerant operating system, with an emphasis on embedded systems. The code is open source and can be downloaded from www.minix3.org. It runs on the x86 and ARM Cortex V8 (e.g., BeagleBones). Since 2007, the Website has been visited over 3 million times and the bootable image file has been downloaded over 600,000 times. The talk will discuss the history, goals, technology, and status of the project.
- Part 2 is also available.
Rapid DHCP: Or, how do Macs get on the network so fast?
- One of life’s minor annoyances is having to wait on my devices to connect to the network after I wake them from sleep. All too often, I’ll open the lid on my EeePC netbook, enter a web address, and get the dreaded “This webpage is not available” message because the machine is still working on connecting to my Wi-Fi network. On some occasions, I have to twiddle my thumbs for as long as 10-15 seconds before the network is ready to be used. The frustrating thing is that I know it doesn’t have to be this way. I know this because I have a Mac. When I open the lid of my MacBook Pro, it connects to the network nearly instantaneously. In fact, no matter how fast I am, the network comes up before I can even try to load a web page. My curiosity got the better of me, and I set out to investigate how Macs are able to connect to the network so quickly, and how the network connect time in other operating systems could be improved.
I figure there are three main categories of time-consuming activities that occur during network initialization:
Link establishment. This is the activity of establishing communication with the network’s link layer. In the case of Wi-Fi, the radio must be powered on, the access point detected, and the optional encryption layer (e.g. WPA) established. After link establishment, the device is able to send and receive Ethernet frames on the network.
Dynamic Host Configuration Protocol (DHCP). Through DHCP handshaking, the device negotiates an IP address for its use on the local IP network. A DHCP server is responsible for managing the IP addresses available for use on the network.
Miscellaneous overhead. The operating system may perform any number of mundane tasks during the process of network initialization, including running scripts, looking up preconfigured network settings in a local database, launching programs, etc.
My investigation thus far is primarily concerned with the DHCP phase, although the other two categories would be interesting to study in the future. I set up a packet capture environment with a spare wireless access point, and observed the network activity of a number of devices as they initialized their network connection. For a worst-case scenario, let’s look at the network activity captured while an Android tablet is connecting:
This tablet, presumably in the interest of “optimization”, is initially skipping the DHCP discovery phase and immediately requesting its previous IP address. The only problem is this is a different network, so the DHCP server ignores these requests. After about 4.5 seconds, the tablet stubbornly tries again to request its old IP address. After another 4.5 seconds, it resigns itself to starting from scratch, and performs the DHCP discovery needed to obtain an IP address on the new network.
In all fairness, this delay wouldn’t be so bad if the device was connecting to the same network as it was previously using. However, notice that the tablet waits a full 1.13 seconds after link establishment to even think about starting the DHCP process. Engineering snappiness usually means finding lots of small opportunities to save a few milliseconds here and there, and someone definitely dropped the ball here.
In contrast, let’s look at the packet dump from the machine with the lightning-fast network initialization, and see if we can uncover the magic that is happening under the hood:
The key to understanding the magic is the first three unicast ARP requests. It looks like Mac OS remembers certain information about not only the last connected network, but the last several networks. In particular, it must at least persist the following tuple for each of these networks:
1. The Ethernet address of the DHCP server
2. The IP address of the DHCP server
3. Its own IP address, as assigned by the DHCP server
During network initialization, the Mac transmits carefully crafted unicast ARP requests with this stored information. For each network in its memory, it attempts to send a request to the specific Ethernet address of the DHCP server for that network, in which it asks about the server’s IP address, and requests that the server reply to the IP address which the Mac was formerly using on that network. Unless network hosts have been radically shuffled around, at most only one of these ARP requests will result in a response—the request corresponding to the current network, if the current network happens to be one of the remembered networks.
This network recognition technique allows the Mac to very rapidly discover if it is connected to a known network. If the network is recognized (and presumably if the Mac knows that the DHCP lease is still active), it immediately and presumptuously configures its IP interface with the address it knows is good for this network. (Well, it does perform a self-ARP for good measure, but doesn’t seem to wait more than 13ms for a response.) The DHCP handshaking process begins in the background by sending a DHCP request for its assumed IP address, but the network interface is available for use during the handshaking process. If the network was not recognized, I assume the Mac would know to begin the DHCP discovery phase, instead of sending blind requests for a former IP address as the Galaxy Tab does.
The Mac’s rapid network initialization can be credited to more than just the network recognition scheme. Judging by the use of ARP (which can be problematic to deal with in user-space) and the unusually regular transmission intervals (a reliable 1.0ms delay between each packet sent), I’m guessing that the Mac’s DHCP client system is entirely implemented as tight kernel-mode code. The Mac began the IP interface initialization process a mere 10ms after link establishment, which is far faster than any other device I tested. Android devices such as the Galaxy Tab rely on the user-mode dhclient system (part of the dhcpcd package) dhcpcd program, which no doubt brings a lot of additional overhead such as loading the program, context switching, and perhaps even running scripts.
The next step for some daring kernel hacker is to implement a similarly aggressive DHCP client system in the Linux kernel, so that I can enjoy fast sign-on speeds on my Android tablet, Android phone, and Ubuntu netbook. There already exists a minimal DHCP client implementation in the Linux kernel, but it lacks certain features such as configuring the DNS nameservers. Perhaps it wouldn’t be too much work to extend this code to support network recognition and interface with a user-mode daemon to handle such auxillary configuration information received via DHCP. If I ever get a few spare cycles, maybe I’ll even take a stab at it.
- You can also find other ways of optimizing the dhclient program and how it works in the dhclient tutorial on Calomel.org.
BSDCam Trip Report
Over the decades, FreeBSD development and coordination has shifted from being purely on-line to involving more and more in-person coordination and cooperation. The FreeBSD Foundation sponsors a devsummit right before BSDCan, EuroBSDCon, and AsiaBSDCon, so that developers traveling to the con can leverage their airfare and hammer out some problems. Yes, the Internet is great for coordination, but nothing beats a group of developers spending ten minutes together to sketch on a whiteboard and figuring out exactly how to make something bulletproof.
In addition to the coordination efforts, though, conference devsummits are hierarchical. There’s a rigid schedule, with topics decided in advance. Someone leads the session. Sessions can be highly informative, passionate arguments, or anything in between.
BSDCam is… a little different. It’s an invaluable part of the FreeBSD ecosystem. However, it’s something that I wouldn’t normally attend.
But right now, is not normal. I’m writing a new edition of Absolute FreeBSD. To my astonishment, people have come to rely on this book when planning their deployments and operations. While I find this satisfying, it also increases the pressure on me to get things correct. When I wrote my first FreeBSD book back in 2000, a dozen mailing lists provided authoritative information on FreeBSD development. One person could read every one of those lists. Today, that’s not possible—and the mailing lists are only one narrow aspect of the FreeBSD social system.
Don’t get me wrong—it’s pretty easy to find out what people are doing and how the system works. But it’s not that easy to find out what people will be doing and how the system will work. If this book is going to be future-proof, I needed to leave my cozy nest and venture into the wilds of Cambridge, England. Sadly, the BSDCam chair agreed with my logic, so I boarded an aluminum deathtrap—sorry, a “commercial airliner”—and found myself hurtled from Detroit to Heathrow.
And one Wednesday morning, I made it to the William Gates building of Cambridge University, consciousness nailed to my body by a thankfully infinite stream of proper British tea.
BSDCam attendance is invitation only, and the facilities can only handle fifty folks or so. You need to be actively working on FreeBSD to wrangle an invite. Developers attend from all over the world. Yet, there’s no agenda. Robert Watson is the chair, but he doesn’t decide on the conference topics. He goes around the room and asks everyone to introduce themselves, say what they’re working on, and declare what they want to discuss during the conference. The topics of interest are tallied. The most popular topics get assigned time slots and one of the two big rooms. Folks interested in less popular topics are invited to claim one of the small breakout rooms.
Then the real fun begins. I started by eavesdropping in the virtualization workshop. For two hours, people discussed FreeBSD’s virtualization needs, strengths, and weaknesses. What needs help? What should this interface look like? What compatibility is important, and what isn’t? By the end of the session, the couple dozen people had developed a reasonable consensus and, most importantly, some folks had added items to their to-do lists.
Repeat for a dozen more topics. I got a good grip on what’s really happening with security mitigation techniques, FreeBSD’s cloud support, TCP/IP improvements, advances in teaching FreeBSD, and more. A BSDCan devsummit presentation on packaging the base system is informative, but eavesdropping on two dozen highly educated engineers arguing about how to nail down the final tidbits needed to make that a real thing is far more educational.
To my surprise, I was able to provide useful feedback for some sessions. I speak at a lot of events outside of the FreeBSD world, and was able to share much of what I hear at Linux conferences. A tool that works well for an experienced developer doesn’t necessarily work well for everyone.
Every year, I leave BSDCan tired. I left BSDCam entirely exhausted. These intense, focused discussions stretched my brain.
But, I have a really good idea where key parts of FreeBSD development are actually headed. This should help future-proof the new Absolute FreeBSD, as much as any computer book can be future-proof.
Plus, BSDCam throws the most glorious conference dinner I’ve ever seen.
I want to thank Robert Watson for his kind invitation, and the FreeBSD Foundation for helping defray the cost of this trip
Interview – The BSDNow Crew
News Roundup
Beta Update – Request for (more) Testing
- https://beta.undeadly.org/ has received an update. The most significant changes include:
The site has been given a less antiquated “look”. (As the topic icons have been eliminated, we are no longer seeking help with those graphics.)
The site now uses a moderate amount of semantic HTML5.
Several bugs in the HTML fragment validator (used for submissions and comments) have been fixed.
To avoid generating invalid HTML, submission content which fails validation is no longer displayed in submission/comment previews.
Plain text submissions are converted to HTML in a more useful fashion. (Instead of just converting each EOL to
, the converter now generates proper paragraphs and interprets two or more consecutive EOLs as indicating a paragraph break.)
- The redevelopment remains a work-in-progress. Many thanks to those who have contributed!
As before, constructive feedback would be appreciated. Of particular interest are reports of bugs in behaviour (for example, in the HTML validator or in authentication) that would preclude the adoption of the current code for the main site.
High-process-count support added to master
We’ve fixed a number of bottlenecks that can develop when the number of user processes runs into the tens of thousands or higher. One thing led to another and I said to myself, “gee, we have a 6-digit PID, might as well make it work to a million!”. With the commits made today, master can support at least 900,000 processes with just a kern.maxproc setting in
/boot/loader.conf, assuming the machine has the memory to handle it.
And, in fact, as today’s machines start to ratchet up there in both memory capacity and core count, with fast storage (NVMe) and fast networking (10GigE and higher), even in consumer boxes, this is actually something that one might want to do. With AMD’s threadripper and EPYC chips now out, the Intel<->AMD cpu wars are back on! Boasting up to 32 cores (64 threads) per socket and two sockets on EPYC, terabytes of ram, and motherboards with dual 10GigE built-in, the reality is that these numbers are already achievable in a useful manner.
In anycase, I’ve tested these changes on a dual-socket xeon. I can in-fact start 900,000 processes. They don’t get a whole lot of cpu and running ‘ps’ would be painful, but it works and the system is still responsive from the shell with all of that going on.
xeon126# uptime
1:42PM up 9 mins, 3 users, load averages: 890407.00, 549381.40, 254199.55
In fact, judging from the memory use, these minimal test processes only eat around 60KB each. 900,000 of them ate only 55GB on a 128GB machine. So even a million processes is not out of the question, depending on the cpu requirements for those processes. Today’s modern machines can be stuffed with enormous amounts of memory.
Of course, our PIDs are currently limited to 6 digits, so a million is kinda the upper limit in terms of discrete user processes (verses pthreads which are less restricted). I’d rather not go to 7 digits (yet).
CFT: Driver for generic MS Windows 7/8/10 – compatible USB HID multi-touch touchscreens
- Following patch [1] adds support for generic MS Windows 7/8/10 – compatible USB HID multi-touch touchscreens via evdev protocol. It is intended to be a native replacement of hid-multitouch.c driver found in Linux distributions and multimedia/webcamd port.
- Patch is made for 12-CURRENT and most probably can be applied to recent 11-STABLE and 11.1-RELEASE (not tested)
- How to test”
1. Apply patch [1]
2. To compile this driver into the kernel, place the following lines
into your kernel configuration file:
device wmt
device usb
device evdev
Alternatively, to load the driver as a module at boot time, place the
following line in loader.conf(5):
wmt_load="YES"
3. Install x11-drivers/xf86-input-evdev or
x11-drivers/xf86-input-libinput port
4. Tell XOrg to use evdev or libinput driver for the device:Section “ServerLayout”
InputDevice “TouchScreen0” “SendCoreEvents”
EndSectionSection “InputDevice”
Identifier “TouchScreen0”
Driver “evdev”
# Driver “libinput”
Option “Device” “/dev/input/eventXXX”
EndSection
- Exact value of “/dev/input/eventXXX” can be obtained with evemu-record utility from devel/evemu.
- Note1: Currently, driver does not support pens or touchpads.
- Note2: wmt.ko should be kld-loaded before uhid driver to take precedence over it! Otherwise uhid can be kld-unloaded after loading of wmt.
- wmt review: https://reviews.freebsd.org/D12017
- Raw diff: https://reviews.freebsd.org/D12017.diff
Beastie Bits
- BSDMag Programing Languages Infographic
- t2k17 Hackathon Report: Bob Beck on buffer cache tweaks, libressl and pledge progress
- New FreeBSD Journal
- NetBSD machines at Open Source Conference 2017 Kyoto
Feedback/Questions
- Send questions, comments, show ideas/topics, or stories you want mentioned on the show to feedback@bsdnow.tv