Jupiter Broadcasting

There be Dragons, BSD Dragons! | BSD Now 119

This week on BSDNow – It’s getting close to christmas and the presents seem to be dropping early! We have a new DragonflyBSD release, loads of news, and an exciting interview with NetBSD developer Paul Goyette coming your way right now!

Thanks to:





Direct Download:

Video | HD Video | MP3 Audio | OGG Audio | Torrent | YouTube

RSS Feeds:

MP3 Feed | OGG Feed | iTunes Feed | Video Feed | HD Vid Feed | HD Torrent Feed

– Show Notes: –

Headlines

n2k15 hackathon reports


[ACM Queue: Challenges of Memory Management on Modern NUMA System

](https://queue.acm.org/detail.cfm?id=2852078)
+ “Modern server-class systems are typically built as several multicore chips put together in a single system. Each chip has a local DRAM (dynamic random-access memory) module; together they are referred to as a node. Nodes are connected via a high-speed interconnect, and the system is fully coherent. This means that, transparently to the programmer, a core can issue requests to its node’s local memory as well as to the memories of other nodes. The key distinction is that remote requests will take longer, because they are subject to longer wire delays and may have to jump several hops as they traverse the interconnect. The latency of memory-access times is hence non-uniform, because it depends on where the request originates and where it is destined to go. Such systems are referred to as NUMA (non-uniform memory access).”
+ So, depending what core a program is running on, it will have different throughput and latency to specific banks of memory. Therefore, it is usually optimal to try to allocate memory from the bank of ram connected to the CPU that the program is running on, and to keep that program running on that same CPU, rather than moving it around
+ There are a number of different NUMA strategies, including:
+ Fixed, memory is always allocated from a specific bank of memory
+ First Touch, which means that memory is allocated from the bank connected to the CPU that the application is running on when it requests the memory, which can increase performance if the application remains on that same CPU, and the load is balanced optimally
+ Round Robin or Interleave, where memory is allocated evenly, each allocation coming from the next bank of memory so that all banks are used. This method can provide more uniform performance, because it ensures that all memory accesses have the same change to be local vs remote. If even performance is required, this method can be better than something more focused on locality, but that might fail and result in remote access
+ AutoNUMA, A kernel task routinely iterates through the allocated memory of each process and tallies the number of memory pages on each node for that process. It also clears the present bit on the pages, which will force the CPU to stop and enter the page-fault handler when the page is next accessed. In the page-fault handler it records which node and thread is trying to access the page before setting the present bit and allowing execution to continue. Pages that are accessed from remote nodes are put into a queue to be migrated to that node. After a page has already been migrated once, though, future migrations require two recorded accesses from a remote node, which is designed to prevent excessive migrations (known as page bouncing).
+ The paper also introduces a new strategy:
+ Carrefour is a memory-placement algorithm for NUMA systems that focuses on traffic management: placing memory so as to minimize congestion on interconnect links or memory controllers. Trying to strike a balance between locality, and ensuring that the interconnect between a specific pair of CPUs does not become congested, which can make remote accesses even slower
+ Carrefour uses three primary techniques:
+ Memory collocation, Moving memory to a different node so that accesses will likely be local.
+ Replication, Copying memory to several nodes so that threads from each node can access it locally (useful for read-only and read-mostly data).
+ Interleaving, Moving memory such that it is distributed evenly among all nodes.
+ FreeBSD is slowly gaining NUMA capabilities, and currently supports: fixed, round-robin, first-touch. Additionally, it also supports fixed-rr, and first-touch-rr, where if the memory allocation fails, because the fixed domain or first-touch domain is full, it falls back to round-robin.
+ For more information, see numa(4) and numa_setaffinity(2) on 11-CURRENT


Is that Linux? No it is PC-BSD


Dual booting OS X and OpenBSD with full disk encryption


Interview – Paul Goyette – pgoyette@netbsd.org

NetBSD Testing and Modularity


iXsystems

News Roundup

HOWTO: L2TP/IPSec with OpenBSD


DragonFly 4.4 Released


Guide to install Ajenti on Nginx with SSL on FreeBSD 10.2


BSDCan 2016 CFP is up!


Beastie Bits


Feedback/Questions