Where and how Ubuntu kernels get their ZFS modules

By cks on 2024-03-07 04:59:21

One of the interesting and convenient things about Ubuntu for people like us is that they provide pre-built and integrated ZFS kernel modules in their mainline kernels. If you want ZFS on your (our) ZFS fileservers, you don't have to add any extra PPA repositories or install any extra kernel module packages; it's just there. However, this leaves us with a little mystery, which is how the ZFS modules actually get there. The reason this is a mystery is that the ZFS modules are not in the Ubuntu kernel source, or at least not in the package source.

(One reason this matters is that you may want to see what patches Ubuntu has applied to their version of ZFS, because Ubuntu periodically backports patches to specific issues from upstream OpenZFS. If you go try to find ZFS patches, ZFS code, or a ZFS changelog in the regular Ubuntu kernel source, you will likely fail, and this will not be what you want.)

Ubuntu kernels are normally signed in order to work with Secure Boot. If you use 'apt source ...' on a signed kernel, what you get is not the kernel source but a 'source' that fetches specific unsigned kernels and does magic to sign them and generate new signed binary packages. To actually get the kernel source, you need to follow the directions in Build Your Own Kernel to get the source of the unsigned kernel package. However, as mentioned this kernel source does not include ZFS.

(You may be tempted to fetch the Git repository following the directions in Obtaining the kernel sources using git, but in my experience this may well leave you hunting around in confusing to try to find the branch that actually corresponds to even the current kernel for an Ubuntu release. Even if you have the Git repository cloned, downloading the source package can be easier.)

How ZFS modules get into the built Ubuntu kernel is that during the package build process, the Ubuntu kernel build downloads or copies a specific zfs-dkms package version and includes it in the tree that kernel modules are built from, which winds up including the built ZFS kernel modules in the binary kernel packages. Exactly what version of zfs-dkms will be included is specified in debian/dkms-versions, although good luck finding an accurate version of that file in the Git repository on any predictable branch or in any predictable location.

(The zfs-dkms package itself is the DKMS version of kernel ZFS modules, which means that it packages the source code of the modules along with directions for how DKMS should (re)build the binary kernel modules from the source.)

This means that if you want to know what specific version of the ZFS code is included in any particular Ubuntu kernel and what changed in it, you need to look at the source package for zfs-dkms, which is called zfs-linux and has its Git repository here. Don't ask me how the branches and tags in the Git repository are managed and how they correspond to released package versions. My current view is that I will be downloading specific zfs-linux source packages as needed (using 'apt source zfs-linux').

The zfs-linux source package is also used to build the zfsutils-linux binary package, which has the user space ZFS tools and libraries. You might ask if there is anything that makes zfsutils-linux versions stay in sync with the zfs-dkms versions included in Ubuntu kernels. The answer, as far as I can see, is no. Ubuntu is free to release new versions of zfsutils-linux and thus zfs-linux without updating the kernel's dkms-versions file to use the matching zfs-dkms version. Sufficiently cautious people may want to specifically install a matching version of zfsutils-linux and then hold the package.

I was going to write something about how you get the ZFS source for a particular kernel version, but it turns out that there is no straightforward way. Contrary to what the Ubuntu documentation suggests, if you do 'apt source linux-image-unsigned-$(uname -r)', you don't get the source package for that kernel version, you get the source package for the current version of the 'linux' kernel package, at whatever is the latest released version. Similarly, while you can inspect that source to see what zfs-dkms version it was built with, 'apt get source zfs-dkms' will only give you (easy) access to the current version of the zfs-linux source package. If you ask for an older version, apt will probably tell you it can't find it.

(Presumably Ubuntu has old source packages somewhere, but I don't know where.)

Set a static IP address with nmtui on Raspberry Pi OS 12 'Bookworm'

By Jeff Geerling on 2024-03-07 00:24:33

Set a static IP address with nmtui on Raspberry Pi OS 12 'Bookworm'

Old advice for setting a Raspberry Pi IP address to a static IP on the Pi itself said to edit the /etc/dhcpcd.conf file, and add it there.

But on Raspberry Pi OS 12 and later, dhcpcd is no longer used, everything goes through Network Manager, which is configured via nmcli or nmtui. If you're booting into the Pi OS desktop environment, editing the IP settings there is pretty easy.

But setting a static IP via the command line is a little different.

First, get the interface information—you can get a list of all interfaces with nmcli device status:

$ nmcli device status
DEVICE         TYPE      STATE                   CONNECTION         
eth0           ethernet  connected               Wired connection 1 
lo             loopback  connected (externally)  lo                 
wlan0          wifi      disconnected            --

In my case, I want to set an IP on eth0, the built-in Ethernet.

Jeff Geerling March 6, 2024

Committing to Windows

By David Heinemeier Hansson on 2024-03-07 01:19:31

I've gone around the computing world in the past eighty hours. I've been flowing freely from Windows to Linux, sampling text editors like VSCode, neovim, Helix, and Sublime, while surveying PC laptops and desktops. It's been an adventure! But it's time to stop being a tourist. It's time to commit.

So despite my earlier reservations about giving up on TextMate, I've decided to make Windows my new primary abode. That's Windows with Linux running inside of it as a subsystem (WSL), mind you. I would never have contemplated a switch to Windows without being able to run Linux inside it. But it's still a change of scenery you could not possibly have convinced me was in the cards a few years ago!

Where the original expedition was motivated by Apple's callous call to nuke PWAs in the EU (which they later retracted), the present commitment is encouraged in part by Apple's atrocious handling of the Epic AB situation. I could not believe that Phil Schiller, the Apple executive in charge of App Store policy, would commit the following in writing:

Your colorful criticism of our DMA compliance plan, coupled with Epic's past practice of intentionally violating contractual provisions with which it disagrees, strongly suggest that Epic Sweden does not intend to follow the rules.

So public criticism of Apple is now motivating grounds for being denied access to the App Store? What kind of overtly authoritarian bullshit is this?

But it's actually time to look past the negative motivations too. That's part of the reason for burning the boat, and committing to Windows for me personally. I don't want to compute purely out of spite. I want to compute out of passion. And, believe it or not, I've found a lot of surprising delights with this Windows/Linux combo that's sprouting just that kind of passion.

Like finally figuring out that fonts can look gorgeous on Window too, if you run it with a great high resolution screen and refrain from fractional scaling. I had this prejudice that Windows simply didn't know how to render fonts, and it turned out to be false. Awesome!

And VSCode continues to grow on me too. The key turned out to resist recreating TextMate, and something as simple has picking a radically different color theme helped break the constant comparison. So too did diving into the configuration, turning off all the IDE-y stuff, code suggestions, and more. Just focusing on VSCode as a text editor rendered in Tokyo Nights.

That theme inspiration came from my ongoing exploration of neovim. It's such a radical departure from editors like TextMate and VSCode, but that's half the reason I've been having fun. Even if the extreme focus on personalized configurations isn't actually well-aligned with my beliefs in convention over configuration.

But in the grand scheme, none of this matters. Windows is great. Running Linux inside of it at full speed is fantastic. Whether I end up with VSCode or neovim here, it's going to be fine.

What's going to be even better than fine is using this personal change of computing to countering the Mac monoculture we'd be running at 37signals. One encouraged and sanctioned by yours truly, mind you, but also one at odds with the fact that more than half the users on our biggest product, Basecamp, live on Windows.

Again, it's not like I'm going to burn the MacBooks that have accumulated at our house. It's still OK to own more than one computer! But one of them has to be the primary one where you're doing your work, and that one for me is now going to be running Windows.

Some notes about the Cloudflare eBPF Prometheus exporter for Linux

By cks on 2024-03-08 05:01:56

I've been a fan of the Cloudflare eBPF Prometheus exporter for some time, ever since I saw their example of per-disk IO latency histograms. And the general idea is extremely appealing; you can gather a lot of information with eBPF (usually from the kernel), and the ability to turn it into metrics is potentially quite powerful. However, actually using it has always been a bit arcane, especially if you were stepping outside the bounds of Cloudflare's canned examples. So here's some notes on the current version (which is more or less v2.4.0 as I write this), written in part for me in the future when I want to fiddle with eBPF-created metrics again.

If you build the ebpf_exporter yourself, you want to use their provided Makefile rather than try to do it directly. This Makefile will give you the choice to build a 'static' binary or a dynamic one (with 'make build-dynamic'); the static is the default. I put 'static' into quotes because of the glibc NSS problem; if you're on a glibc-using Linux, your static binary will still depend on your version of glibc. However, it will contain a statically linked libbpf, which will make your life easier. Unfortunately, building a static version is impossible on some Linux distributions, such as Fedora, because Fedora just doesn't provide static versions of some required libraries (as far as I can tell, libelf.a). If you have to build a dynamic executable, a normal ebpf_exporter build will depend on the libbpf shared library you can find in libbpf/dest/usr/lib. You'll need to set a LD_LIBRARY_PATH to find this copy of libbpf.so at runtime.

(You can try building with the system libbpf, but it may not be recent enough for ebpf_exporter.)

To get metrics from eBPF with ebpf_exporter, you need an eBPF program that collects the metrics and then a YAML configuration that tells ebpf_exporter how to handle what the eBPF program provides. The original version of ebpf_exporter had you specify eBPF programs in text in your (YAML) configuration file and then compiled them when it started. This approach has fallen out of favour, so now eBPF programs must be pre-compiled to special .o files that are loaded at runtime. I believe these .o files are relatively portable across systems; I've used ones built on Fedora 39 on Ubuntu 22.04. The simplest way to build either a provided example or your own one is to put it in the examples directory and then do 'make <name>.bpf.o'. Running 'make' in the examples directory will build all of the standard examples.

To run an eBPF program or programs, you copy their <name>.bpf.o and <name>.yaml to a configuration directory of your choice, specify this directory in theebpf_exporter '--config.dir' argument, and then use '--config.names=<name>,<name2>,...' to say what programs to run. The suffix of the YAML configuration file and the eBPF object file are always fixed.

The repository has some documentation on the YAML (and eBPF) that you have to write to get metrics. However, it is probably not sufficient to explain how to modify the examples or especially to write new ones. If you're doing this (for example, to revive an old example that was removed when the exporter moved to the current pre-compiled approach), you really want to read over existing examples and then copy their general structure more or less exactly. This is especially important because the main ebpf_exporter contains some special handling for at least histograms that assumes things are being done as in their examples. When reading examples, it helps to know that Cloudflare has a bunch of helpers that are in various header files in the examples directory. You want to use these helpers, not the normal, standard bpf helpers.

(However, although not documented in bpf-helpers(7), '__sync_fetch_and_add()' is a standard eBPF thing. It is not so much documented as mentioned in some kernel BPF documentation on arrays and maps and in bpf(2).)

One source of (e)BPF code to copy from that is generally similar to what you'll write for ebpf_exporter is bcc/libbpf-tools (in the <name>.bpf.c files). An eBPF program like runqlat.bpf.c will need restructuring to be used as an ebpf_exporter program, but it will show you what you can hook into with eBPF and how. Often these examples will be more elaborate than you need for ebpf_exporter, with more options and the ability to narrowly select things; you can take all of that out.

(When setting up things like the number of histogram slots, be careful to copy exactly what the examples do in both your .bpf.c and in your YAML, mysterious '+ 1's and all.)

A realization about shell pipeline steps on multi-core machines

By cks on 2024-03-09 04:27:42

Over on the Fediverse, I had a realization:

This is my face when I realize that on a big multi-core machine, I want to do 'sed ... | sed ... | sed ...' instead of the nominally more efficient 'sed -e ... -e ... -e ...' because sed is single-threaded and if I have several costly patterns, multiple seds will parallelize them across those multiple cores.

Even when doing on the fly shell pipelines, I've tended to reflexively use 'sed -e ... -e ...' when I had multiple separate sed transformations to do, instead of putting each transformation in its own 'sed' command. Similarly I sometimes try to cleverly merge multi-command things into one command, although usually I don't try too hard. In a world where you have enough cores (well, CPUs), this isn't necessarily the right thing to do. Most commands are single threaded and will use only one CPU, but every command in a pipeline can run on a different CPU. So splitting up a single giant 'sed' into several may reduce a single-core bottleneck and speed things up.

(Giving sed multiple expressions is especially single threaded because sed specifically promises that they're processed in order, and sometimes this matters.)

Whether this actually matters may vary a lot. In my case, it only made a trivial difference in the end, partly because only one of my sed patterns was CPU-intensive (but that pattern alone made sed use all the CPU it could get and made it the bottleneck in the entire pipeline). In some cases adding more commands may add more in overhead than it saves from parallelism. There are no universal answers.

One of my lessons learned from this is that if I'm on a machine with plenty of cores and doing a one-time thing, it probably isn't worth my while to carefully optimize how many processes are being run as I evolve the pipeline. I might as well jam more pipeline steps whenever and wherever they're convenient. If it's easy to move one step closer to the goal with one more pipeline step, do it. Even if it doesn't help, it probably won't hurt very much.

Another lesson learned is that I might want to look for single threaded choke points if I've got a long-running shell pipeline. These are generally relatively easy to spot; just run 'top' and look for what's using up all of one CPU (on Linux, this is 100% CPU time). Sometimes this will be as easy to split as 'sed' was, and other times I may need to be more creative (for example, if zcat is hitting CPU limits, maybe pigz can help a bit.

(If I have the fast disk space, possibly un-compressing the files in place in parallel will work. This comes up in system administration work more than you'd think, since we can want to search and process log files and they're often stored compressed.)

(One comment.)

Could Apple leave Europe?

By David Heinemeier Hansson on 2024-03-08 00:44:41

Apple's responses to the Digital Market Act, its recent 1.8b euro fine in the Spotify case, and Epic Sweden's plans to introduce an alternative App Store in the EU have all been laced with a surprising level of spite and obstinacy. Even when Steve Jobs was pulling power moves with Adobe and Flash or responding to Antennagate, we never saw such an institutional commitment to flipping off legislators and platform partners. It might have been ruthless, but it didn't come across as personal.

Which is curious! Because you'd think that a creative thinker like Steve Jobs would be more likely to wear his heart on his sleeve than a professional bean counter like Tim Cook. More likely to lash out. But assuming Cook is still signing off on the company's strategy, and it's hard to imagine otherwise, his cool cucumber public persona seems to be turning into more of a hot potato with every aggrieved move Apple pulls. Which raises questions!

Like, what's next if the EU keeps turning up the heat on that already hot potato? At what point does it start to boil? If they're already lashing out with malicious compliance, vindictive App Store evictions, and pissy press releases on account of where we are today, what might they do if the regulatory pressure in Europe doesn't relent next month, next quarter, or next year? What if the EU is actually serious about this?

Well, Apple could quit Europe. Stop selling its products in the EU. While it's a big market, it's actually not huge, by Apple standards. Some 8-10% of revenue. So maybe $35b per year, out of some $383b in total. At what point does Cook look at that number and say "not worth it, we're out"?

Prior to witnessing Apple's actions of the last few years, I would have said no way. Tim Cook just isn't the kind of CEO to make such a big move. He's too conservative, too timid, too focused on the bottom line. But that mental model has been seriously tested lately. A CEO that signs off on public letters like the one in response to their loss in the Spotify case might actually have it in them to do something big.

It's not without precedence that big tech companies threaten to leave a major market. Facebook famously threatened to do just that in Australia over the fight regarding newspaper royalties. But as far as I recall, nobody has actually done it. Not on a scale like Apple and the EU.

But we've gone through a lot of surprises in the last decade. Major, world-affecting events and decisions almost nobody would have contemplated as realistic possibilities just a few years prior to them happening.

I hope there are bureaucrats within the EU at least entertaining the possibility. Stranger things have happened.

Google's sad ideological capture was exactly what we were trying to avoid

By David Heinemeier Hansson on 2024-03-09 00:23:31

The Gemini AI roll out should have been Google's day of triumph. The company made one of the smartest acquisitions in tech when they bought DeepMind in 2014. They helped set the course for the modern AI movement with the Transformer paper in 2017. They were poised to be right there, right at the fore font of a whole new era of computing. And then they blew it.

If it wasn't all so terribly dark and sad, it would actually be funny. Rendering George Washington as a Black man. Equivocating on whether Musk's memes are worse or not than actual, literal Hitler. Oh, and defending pedophilia. Yeah, the Gemini launch had it all. Like a risqué stand-up comic shocking her audience for effect. Except, Gemini wasn't joking.

In pictures and texts, it ironically made the point of the "AI safety" crowd incredibly well, but in the opposite direction. The threat from AI will come less from "perpetuating existing biases in the world" and more from "injecting the biases and ideology of its overseers".

How on earth Google could release something so twisted, so wrong to the world? The company's executives, as well as Google co-founder Sergei Brin, tried to brush it off as "bugs", but few people bought that story. It seemed more likely that Gemini was working just fine by the company's muddle Google AI Principles. A set of principles that unapologetically puts social justice and anti-bias instincts as prime directives #1 and #2. While failing to even mention "accuracy" or "usefulness".

But this part of the story has already been diagnosed to death. Gemini was a catastrophic, confidence-shattering launch. It also caused Google's stock price to take quite the dive. Presumably because it called into question whether all of those investments and years of research will ultimately be squandered on a futile search for cosmic justice. Investors are right to worry.

The part that's even more fascinating to me than the hilarious broken product is what kind of organization could possibly design and release such an abomination to the world. The answer came courtesy of a Pirate Wires report this week. It's shocking reading. Even if you've paid attention to the institutional capture by the social justice/woke/whatever ideology that peaked from 2020-2022.

While the rest of tech has started to return to sanity on this issue, Google clearly has not. It appears completely captured and paralyzed by this stifling ideology. An asylum run entirely by its most deranged inmates, holding everyone else captive. Even its founder duo, who seem either incapable or unwilling to act to restrain it.

But I can totally see how they got there. How Sergei and Larry could feels like it's too late, too hard, too painful to deal with the cultural capitulation. Because that's almost how Jason and I felt at times prior to April 2021, when some of the same forces and sentiments were spreading inside our own company.

The Pirate Wires report was entitled "Google's Culture of Fear", and that's exactly what it felt like at times at our company leading up to April 2021. That the ship was being forced in a bad direction, by bad actors, with bad ideas, but that if you were going to question the compass, there'd be hell to pay. Both internally and externally. You were going to be called names. Accused of horrible things. And, really, do you want to deal with all that right now? Maybe it'd be easier to just let dragons lie.

But the problem with ideological dragons like this is that they're never content with the political scalps or capital accumulated. There's always a hunger for more, more, more. Every little victory is an opportunity to move the goal posts further north. Make ever smaller transgressions punishable by ostracization and shame. Label even bigger swaths of normal interactions and behaviors as "problematic". It just never fucking ends.

That is unless you say "enough". Enough with the nonsense. Enough with the witch hunts. Enough with the echo chamber.

That's what we did at our company in April of 2021, and it hurt like hell for a couple of weeks. And that was at a small software company with no board or investors. I can't even imagine how it would have gone then if we'd had either of those. Good odds that they'd buckled under the pressure, and Jason and I would have been pushed out in a futile attempt to appease the mob.

So I totally get why Sergei and Larry might have more than a little trepidation about rocking the boat. Google appears to be so deeply captured at this point, the rot has been left to fester for so long, that it's going to be extraordinarily painful to correct now.

On the other hand, there's more cover. The worst of the woke scourge has indeed passed in tech. Plenty of other companies have now dismantled their DEI bureaucracies or made them a shadow of their former might. It is possible to reverse course, and it's infinitely easier to do so in 2024 than it was in 2021. But it's still a motherfucker.

If I were a betting man, I'd bet it's going to happen, though. Maybe not as spectacularly and decisively as we did it at our company, with one clean cut. But gradually, like most major corporations have wound down the woke excesses while pretending it's all just a correction for "over hiring".

What's clear to me is that addressing this is existential to Google. Just like it was existential for us. If you follow these bad ideas to their logical conclusion, you end up with worse than useless products. You end up with a search engine that wants to lecture people rather than finding the facts. There's no mainstream market for such a bullshit product in the long run.

Eventually the market will force the correction. But Google is a very rich company. It could coast on the fumes of its former glory for a long time. Let's hope that there's more than an empty, hollowed out shell of a company left by the time they get this right and return to sanity.

I never thought I'd say this, but I'm actually rooting for Google on this one. Big tech is a game of thrones, and all us mere peasants are better off when the big powers all counter each other in a variety of ways. We need a stronger Google to counter a strong Apple and a strong Microsoft.

So. Hard choices, easy life. Easy choices, hard life. We made some incredibly hard choices in April of 2021. We've lived a comparably very easy life on that vector ever since we finished the cleanup. Sergei and Larry, you guys can do it too. But you have to want to do it. You have to want Google to be relevant in AI. You have to want to make the world's information accessible and useful again, without the ideological nonsense. Vamos!

Some thoughts on usage data for your systems and services

By cks on 2024-03-10 04:10:39

Some day, you may be called on by decision makers (including yourself) to provide some sort of usage information for things you operate so that you can make decisions about them. I'm not talking about system metrics such as how much CPU is being used (although for some systems that may be part of higher level usage information, for example for our SLURM cluster); this is more on the level of how much things are being used, by who, and perhaps for what. In the very old days we might have called this 'accounting data' (and perhaps disdained collecting it unless we were forced to by things like chargeback policies).

In an ideal world, you will already be generating and retaining the sort of usage information that can be used to make decisions about services. But internal services aren't necessarily automatically instrumented the way revenue generating things are, so you may not have this sort of thing built in from the start. In this case, you'll generally wind up hunting around for creative ways to generate higher level usage information from low level metrics and logs that you do have. When you do this, my first suggestion is write down how you generated your usage information. This probably won't be the last time you need to generate usage information, and also if decision makers (including you in the future) have questions about exactly what your numbers mean, you can go back to look at exactly how you generated them to provide answers.

(Of course, your systems may have changed around by the next time you need to generate usage information, so your old ways don't work or aren't applicable. But at least you'll have something.)

My second suggestion is to look around today to see if there's data you can easily collect and retain now that will let you provide better usage information in the future. This is obviously related to keeping your logs longer, but it also includes making sure that things make it to your logs (or at least to your retained logs, which may mean setting things to send their log data to syslog instead of keeping their own log files). At this point I will sing the praises of things like 'end of session' summary log records that put all of the information about a session in a single place instead of forcing you to put the information together from multiple log lines.

(When you've just been through the exercise of generating usage data is an especially good time to do this, because you'll be familiar with all of the bits that were troublesome or where you could only provide limited data.)

Of course there are privacy implications of retaining lots of logs and usage data. This may be a good time to ask around to get advance agreement on what sort of usage information you want to be able to provide and what sort you definitely don't want to have available for people to ask for. This is also another use for arranging to log your own 'end of session' summary records, because if you're doing it yourself you can arrange to include only the usage information you've decided is okay.

The GPT-4 barrier has finally been broken

2024-03-08 19:02:39

Four weeks ago, GPT-4 remained the undisputed champion: consistently at the top of every key benchmark, but more importantly the clear winner in terms of "vibes". Almost everyone investing serious time exploring LLMs agreed that it was the most capable default model for the majority of tasks - and had been for more than a year.

Today that barrier has finally been smashed. We have four new models, all released to the public in the last four weeks, that are benchmarking near or even above GPT-4. And the all-important vibes are good, too!

Those models come from four different vendors.

Google Gemini 1.5, February 15th. I wrote about this the other week: the signature feature is an incredible one million long token context, nearly 8 times the length of GPT-4 Turbo. It can also process video, which it does by breaking it up into one frame per second - but you can fit a LOT of frames (258 tokens each) in a million tokens.
Mistral Large, February 26th. I have a big soft spot for Mistral given how exceptional their openly licensed models are - Mistral 7B runs on my iPhone, and Mixtral-8x7B is the best model I've successfully run on my laptop. Medium and Large are their two hosted but closed models, and while Large may not be quite outperform GPT-4 it's clearly in the same class. I can't wait to see what they put out next.
Claude 3 Opus, March 4th. This is just a few days old and wow: the vibes on this one are really strong. People I know who evaluate LLMs closely are rating it as the first clear GPT-4 beater. I've switched to it as my default model for a bunch of things, most conclusively for code - I've had several experiences recently where a complex GPT-4 prompt that produced broken JavaScript gave me a perfect working answer when run through Opus instead (recent example). I also enjoyed Anthropic research engineer Amanda Askell's detailed breakdown of their system prompt.
Inflection-2.5, March 7th. This one came out of left field for me: Inflection make Pi, a conversation-focused chat interface that felt a little gimmicky to me when I first tried it. Then just the other day they announced that their brand new 2.5 model benchmarks favorably against GPT-4, and Ethan Mollick - one of my favourite LLM sommeliers - noted that it deserves more attention.

Not every one of these models is a clear GPT-4 beater, but every one of them is a contender. And like I said, a month ago we had none at all.

There are a couple of disappointments here.

Firstly, none of those models are openly licensed or weights available. I imagine the resources they need to run would make them impractical for most people, but after a year that has seen enormous leaps forward in the openly licensed model category it's sad to see the very best models remain strictly proprietary.

And unless I've missed something, none of these models are being transparent about their training data. This also isn't surprising: the lawsuits have started flying now over training on unlicensed copyrighted data, and negative public sentiment continues to grow over the murky ethical ground on which these models are built.

It's still disappointing to me. While I'd love to see a model trained entirely on public domain or licensed content - and it feels like we should start to see some strong examples of that pretty soon - it's not clear to me that it's possible to build something that competes with GPT-4 without dipping deep into unlicensed content for the training. I'd love to be proved wrong on that!

In the absence of such a vegan model I'll take training transparency over what we are seeing today. I use these models a lot, and knowing how a model was trained is a powerful factor in helping decide which questions and tasks a model is likely suited for. Without training transparency we are all left reading tea leaves, sharing conspiracy theories and desperately trying to figure out the vibes.

Scheduling latency, IO latency, and their role in Linux responsiveness

By cks on 2024-03-11 04:31:46

One of the things that I do on my desktops and our servers is collect metrics that I hope will let me assess how responsive our systems are when people are trying to do things on them. For a long time I've been collecting disk IO latency histograms, and recently I've been collecting runqueue latency histograms (using the eBPF exporter and a modified version of libbpf/tools/runqlat.bpf.c). This has caused me to think about the various sorts of latency that affects responsiveness and how I can measure it.

Run queue latency is the latency between when a task becomes able to run (or when it got preempted in the middle of running) and when it does run. This latency is effectively the minimum (lack of) response from the system and is primarily affected by CPU contention, since the major reason tasks have to wait to run is other tasks using the CPU. For obvious reasons, high(er) run queue latency is related to CPU pressure stalls, but a histogram can show you more information than an aggregate number. I expect run queue latency to be what matters most for a lot of programs that mostly talk to things over some network (including talking to other programs on the same machine), and perhaps some of their time burning CPU furiously. If your web browser can't get its rendering process running promptly after the HTML comes in, or if it gets preempted while running all of that Javascript, this will show up in run queue latency. The same is true for your window manager, which is probably not doing much IO.

Disk IO latency is the lowest level indicator of things having to wait on IO; it sets a lower bound on how little latency processes doing IO can have (assuming that they do actual disk IO). However, direct disk IO is only one level of the Linux IO system, and the Linux IO system sits underneath filesystems. What actually matters for responsiveness and latency is generally how long user-level filesystem operations take. In an environment with sophisticated, multi-level filesystems that have complex internal behavior (such as ZFS and its ZIL), the actual disk IO time may only be a small portion of the user-level timing, especially for things like fsync().

(Some user-level operations may also not do any disk IO at all before they return from the kernel (for example). A read() might be satisfied from the kernel's caches, and a write() might simply copy the data into the kernel and schedule disk IO later. This is where histograms and related measurements become much more useful than averages.)

Measuring user level filesystem latency can be done through eBPF, to at least some degree; libbpf-tools/vfsstat.bpf.c hooks a number of kernel vfs_* functions in order to just count them, and you could convert this into some sort of histogram. Doing this on a 'per filesystem mount' basis is probably going to be rather harder. On the positive side for us, hooking the vfs_* functions does cover the activity a NFS server does for NFS clients as well as truly local user level activity. Because there are a number of systems where we really do care about the latency that people experience and want to monitor it, I'll probably build some kind of vfs operation latency histogram eBPF exporter program, although most likely only for selected VFS operations (since there are a lot of them).

I think that the straightforward way of measuring user level IO latency (by tracking the time between entering and exiting a top level vfs_* function) will wind up including run queue latency as well. You will get, basically, the time it takes to prepare and submit the IO inside the kernel, the time spent waiting for it, and then after the IO completes the time the task spends waiting inside the kernel before it's able to run.

Because of how Linux defines iowait, the higher your iowait numbers are, the lower the run queue latency portion of the total time will be, because iowait only happens on idle CPUs and idle CPUs are immediately available to run tasks when their IO completes. You may want to look at io pressure stall information for a more accurate track of when things are blocked on IO.

A complication of measuring user level IO latency is that not all user visible IO happens through read() and write(). Some of it happens through accessing mmap()'d objects, and under memory pressure some of it will be in the kernel paging things back in from wherever they wound up. I don't know if there's any particularly easy way to hook into this activity.

Tracking World Time with Emacs

By Bozhidar Batsov on 2024-03-11 10:38:00

In today’s highly connected world it’s often useful to keep track of time in several time zones. I work in a company with employees all over the world, so I probably keep track of more time zones than most people.

So, what are the best ways to do this? I know what you’re thinking - let’s just buy an Omega Aqua Terra Worldtimer mechanical watch for $10,000 and be done with it!¹ While this will definitely get the job done and improve the looks of your wrist immensely, there’s a cheaper and more practical option for you - Emacs. Did you know that Emacs has a command named world-clock that does exactly what we want?² If you invoke it you’ll see something like this:

Seattle   Monday 11 March 02:45 PDT
New York  Monday 11 March 05:45 EDT
London    Monday 11 March 09:45 GMT
Paris     Monday 11 March 10:45 CET
Bangalore Monday 11 March 15:15 IST
Tokyo     Monday 11 March 18:45 JST

Hmm, looks OK but the greatest city in the world (Sofia, Bulgaria) is missing from the list… That’s totally unacceptable! We can fix this by tweaking the variable world-clock-list:

(setq world-clock-list
      '(("America/Los_Angeles" "Seattle")
        ("America/New_York" "New York")
        ("Europe/London" "London")
        ("Europe/Paris" "Paris")
        ("Europe/Sofia" "Sofia")
        ("Asia/Calcutta" "Bangalore")
        ("Asia/Tokyo" "Tokyo")))

Let’s try M-x world-clock again now:

Seattle      Monday 11 March 02:51 PDT
New York     Monday 11 March 05:51 EDT
London       Monday 11 March 09:51 GMT
Paris        Monday 11 March 10:51 CET
Sofia        Monday 11 March 11:51 EET
Bangalore    Monday 11 March 15:21 IST
Tokyo        Monday 11 March 18:51 JST

Much better!

By the way, you don’t really have to edit world-clock-list, as by default it’s configured to mirror the value of zoneinfo-style-world-list. The choice is yours.

You can also configure the way the world time entries are displayed using world-clock-time-format. Let’s switch to a style with shorter day and month names:

(setq world-clock-time-format "%a %d %b %R %Z")

This will result in:

Seattle      Mon 11 Mar 06:06 PDT
New York     Mon 11 Mar 09:06 EDT
London       Mon 11 Mar 13:06 GMT
Paris        Mon 11 Mar 14:06 CET
Sofia        Mon 11 Mar 15:06 EET
Bangalore    Mon 11 Mar 18:36 IST
Tokyo        Mon 11 Mar 22:06 JST

Check out the docstring of format-time-string (C-h f format-time-string) for more details, as the options here are numerous.

That’s all I have for you today. I hope you learned something useful. Keep hacking!

Mechanical watches are another passion of mine. ↩
It was named display-time-world before Emacs 28.1. The command was originally introduced in Emacs 23.1. ↩

Why we should care about usage data for our internal services

By cks on 2024-03-12 03:47:02

I recently wrote about some practical-focused thoughts on usage data for your services. But there's a broader issue about usage data for services and having or not having it. My sense is that for a lot of sysadmins, building things to collect usage data feels like accounting work and likely to lead to unpleasant and damaging things, like internal chargebacks (which have create various problems, and also). However, I think we should strongly consider routinely gathering this data anyway, for fundamentally the same reasons as you should collect information on what TLS protocols and ciphers are being used by your people and software.

We periodically face decisions both obvious and subtle about what to do about services and the things they run on. Do we spend the money to buy new hardware, do we spend the time to upgrade the operating system or the version of the third party software, do we need to closely monitor this system or service, does it need to be optimized or be given better hardware, and so on. Conversely, maybe this is now a little-used service that can be scaled down, dropped, or simplified. In general, the big question is do we need to care about this service, and if so how much. High level usage data is what gives you most of the real answers.

(In some environments one fate for narrowly used services is to be made the responsibility of the people or groups who are the service's big users, instead of something that is provided on a larger and higher level.)

Your system and application metrics can provide you some basic information, like whether your systems are using CPU and memory and disk space, and perhaps how that usage is changing over a relatively long time base (if you keep metrics data long enough). But they can't really tell you why that is happening or not happening, or who is using your services, and deriving usage information from things like CPU utilization requires either knowing things about how your systems perform or assuming them (eg, assuming you can estimate service usage from CPU usage because you're sure it uses a visible amount of CPU time). Deliberately collecting actual usage gives you direct answers.

Knowing who is using your services and who is not also gives you the opportunity to talk to both groups about what they like about your current services, what they'd like you to add, what pieces of your service they care about, what they need, and perhaps what's keeping them from using some of your services. If you don't have usage data and don't actually ask people, you're flying relatively blind on all of these questions.

Of course collecting usage data has its traps. One of them is that what usage data you collect is often driven by what sort of usage you think matters, and in turn this can be driven by how you expect people to use your services and what you think they care about. Or to put it another way, you're measuring what you assume matters and you're assuming what you don't measure doesn't matter. You may be wrong about that, which is one reason why talking to people periodically is useful.

PS: In theory, gathering usage data is separate from the question of whether you should pay attention to it, where the answer may well be that you should ignore that shiny new data. In practice, well, people are bad at staying away from shiny things. Perhaps it's not a bad thing to have your usage data require some effort to assemble.

(This is partly written to persuade myself of this, because maybe we want to routinely collect and track more usage data than we currently do.)

Fixing nginx Error: Undefined constant PDO::MYSQL_ATTR_USE_BUFFERED_QUERY

By Jeff Geerling on 2024-03-12 05:57:08

Fixing nginx Error: Undefined constant PDO::MYSQL_ATTR_USE_BUFFERED_QUERY

I install a lot of Drupal sites day to day, especially when I'm doing dev work.

In the course of doing that, sometimes I'll be working on infrastructure—whether that's an Ansible playbook to configure a Docker container, or testing something on a fresh server or VM.

In any case, I run into the following error every so often in my Nginx error.log:

"php-fpm" nginx Error: Undefined constant PDO::MYSQL_ATTR_USE_BUFFERED_QUERY

The funny thing is, I don't have that error when I'm running CLI commands, like vendor/bin/drush, and can even install and manage the Drupal site and database on the CLI.

The problem, in my case, was that I had applied php-fpm configs using Ansible, but in my playbook I hadn't restarted php-fpm (in my case, on Ubuntu 22.04, php8.3-fpm) after doing so. So FPM was running with outdated config and didn't know that the MySQL/MariaDB drivers were even present on the system.

Jeff Geerling March 11, 2024