These are the ramblings of Matthijs Kooijman, concerning the software he hacks on, hobbies he has and occasionally his personal life.
Most content on this site is licensed under the WTFPL, version 2 (details).
Questions? Praise? Blame? Feel free to contact me.
My old blog (pre-2006) is also still available.
See also my Mastodon page.
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
(...), Arduino, AVR, BaRef, Blosxom, Book, Busy, C++, Charity, Debian, Electronics, Examination, Firefox, Flash, Framework, FreeBSD, Gnome, Hardware, Inter-Actief, IRC, JTAG, LARP, Layout, Linux, Madness, Mail, Math, MS-1013, Mutt, Nerd, Notebook, Optimization, Personal, Plugins, Protocol, QEMU, Random, Rant, Repair, S270, Sailing, Samba, Sanquin, Script, Sleep, Software, SSH, Study, Supermicro, Symbols, Tika, Travel, Trivia, USB, Windows, Work, X201, Xanthe, XBee
On my server, I use LVM for managing partitions. I have one big "data" partition that is stored on an HDD, but for a bit more speed, I have an LVM cache volume linked to it, so commonly used data is cached on an SSD for faster read access.
Today, I wanted to resize the data volume:
# lvresize -L 300G tika/data
Unable to resize logical volumes of cache type.
Bummer. Googling for the error message showed me some helpful posts here and here that told me you have to remove the cache from the data volume, resize the data volume and then set up the cache again.
For this, they used lvconvert --uncache
, which detaches and deletes
the cache volume or cache pool completely, so you then have to recreate
the entire cache (and thus figure out how you created it in the first
place).
Trying to understand my own work from long ago, I looked through
documentation and found the lvconvert --splitcache
in
lvmcache(7), which detached a cache volume or cache pool,
but does not delete it. This means you can resize and just reattached
the cache again, which is a lot less work (and less error prone).
For an example, here is how the relevant volumes look:
# lvs -a
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
data tika Cwi-aoC--- 300.00g [data-cache_cvol] [data_corig] 2.77 13.11 0.00
[data-cache_cvol] tika Cwi-aoC--- 20.00g
[data_corig] tika owi-aoC--- 300.00g
Here, data
is a "cache" type LV that ties together the big data_corig
LV
that contains the bulk data and small data-cache_cvol
that contains the
cached data.
After detaching the cache with --splitcache
, this changes to:
# lvs -a
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
data tika -wi-ao---- 300.00g
data-cache tika -wi------- 20.00g
I think the previous data
cache LV was removed, data_corig
was renamed to
data
and data-cache_cvol
was renamed to data-cache
again.
Armed with this knowledge, here's how the ful resize works:
lvconvert --splitcache tika/data
lvresize -L 300G tika/data @hdd
lvconvert --type cache --cachevol tika/data-cache tika/data --cachemode writethrough
The last command might need some additional parameters depending on how you set
up the cache in the first place. You can view current cache parameters with
e.g. lvs -a -o +cache_mode,cache_settings,cache_policy
.
Note that all of this assumes using a cache volume an not a cache pool. I was originally using a cache pool setup, but it seems that a cache pool (which splits cache data and cache metadata into different volumes) is mostly useful if you want to split data and metadata over different PV's, which is not at all useful for me. So I switched to the cache volume approach, which needs fewer commands and volumes to set up.
I killed my cache pool setup with --uncache
before I found out about
--splitcache
, so I did not actually try --splitcache
with a cache pool, but
I think the procedure is actually pretty much identical as described above,
except that you need to replace --cachevol
with --cachepool
in the last
command.
For reference, here's what my volumes looked like when I was still using a cache pool:
# lvs -a
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
data tika Cwi-aoC--- 260.00g [data-cache] [data_corig] 99.99 19.39 0.00
[data-cache] tika Cwi---C--- 20.00g 99.99 19.39 0.00
[data-cache_cdata] tika Cwi-ao---- 20.00g
[data-cache_cmeta] tika ewi-ao---- 20.00m
[data_corig] tika owi-aoC--- 260.00g
This is a data
volume of type cache, that ties together the big data_corig
LV that contains the bulk data and a data-cache
LV of type cache-pool that
ties together the data-cache_cdata
LV with the actual cache data and
data-cache_cmeta
with the cache metadata.
A few months ago, I put up an old Atom-powered Supermicro server (SYS-5015A-PHF) again, to serve at De War to collect and display various sensor and energy data about our building.
The server turned out to have an annoying habit: every now and then it would start beeping (one continuous annoying beep), that would continue until the machine was rebooted. It happened sporadically, but kept coming back. When I used this machine before, it was located in a datacenter where nobody would care about a beep more or less (so maybe it has been beeping for years on end before I replaced the server), but now it was in a server cabinet inside our local Fablab, where there are plenty of people to become annoyed by a beeping server...
I eventually traced this back to faulty sensor readings and fixed this by disabling the faulty sensors completely in the server's IPMI unit, which will hopefully prevent the annoying beep. In this post, I'll share my steps, in case anyone needs to do the same.
When sorting out some stuff today I came across an "Ecobutton". When you attach it through USB to your computer and press the button, your computer goes to sleep (at least that is the intention).
The idea is that it makes things more sustainable because you can more easily put your computer to sleep when you walk away from it. As this tweakers poster (Dutch) eloquently argues, having more plastic and electronics produced in China, shipped to Europe and sold here for €18 or so probably does not have a net positive effect on the environment or your wallet, but well, given this button found its way to me, I might as well see if I can make it do something useful.
I had previously started a project to make a "Next" button for spotify that you could carry around and would (wirelessly - with an ESP8266 inside) skip to the next song using the Spotify API whenever you pressed it. I had a basic prototype working, but then the project got stalled on figuring out an enclosure and finding sufficiently low-power addressable RGB LEDs (documentation about this is lacking, so I resorted to testing two dozen different types of LEDs and creating a website to collect specs and test results for adressable LEDs, which then ended up with the big collection of other Yak-shaving projects waiting for this magical moment where I suddenly have a lot of free time).
In any case, it seemed interesting to see if this Ecobutton could be used as poor-man's spotify next button. Not super useful, but at least now I can keep the button around knowing I can actually use it for something in the future. I also produced some useful (and not readily available) documentation about remapping keys with hwdb in the process, so it was at least not a complete waste of time... Anyway, into the technical details...
Or: Forcing Linux to use the USB HID driver for a non-standards-compliant USB keyboard.
For an interactive art installation by the Spullenmannen, a friend asked me to have a look at an old paint mixing terminal that he wanted to use. The terminal is essentially a small computer, in a nice industrial-looking sealed casing, with a (touch?) screen, keyboard and touchpad. It was by "Lacour" and I think has been used to control paint mixing machines.
They had already gotten Linux running on the system, but could not get the keyboard to work and asked me if I could have a look.
The keyboard did work in the BIOS and grub (which also uses the BIOS), so we know it worked. Also, the BIOS seemed pretty standard, so it was unlikely that it used some very standard protocol or driver and I guessed that this was a matter of telling Linux which driver to use and/or where to find the device.
Inside the machine, it seemed the keyboard and touchpad were separate devices, controlled by some off-the-shelf microcontroller chip (probably with some custom software inside). These devices were connected to the main motherboard using a standard 10-pin expansion header intended for external USB ports, so it seemed likely that these devices were USB ports.
Recently, a customer asked me te have a look at an external hard disk he was using with his Macbook. It would show up a file listing just fine, but when trying to open actual files, it would start failing. Of course there was no backup, but the files were very precious...
This started out as a small question, but ended up in an adventure that spanned a few days and took me deep into the ddrescue recovery tool, through the HFS+ filesystem and past USB power port control. I learned a lot, discovered some interesting things and produced a pile of scripts that might be helpful to others. Since the journey seems interesting as well as the end result, I will describe the steps I took here, "ter leering ende vermaeck".
I was previously running an ancient Windows XP install under Virtualbox for the occasional time I needed Windows for something. However, since Debian Stretch, virtualbox is no longer supplied, due to security policy problems, I've been experimenting with QEMU, KVM and virt-manager. Migrating my existing VirtualBox XP installation to virt-manager didn't work (it simply wouldn't boot), and I do not have any spare Windows keys lying around, but I do have a Windows 7 installed alongside my Linux on a different partition, so I decided to see if I could get that to boot inside QEMU/KVM.
An obvious problem is the huge change in hardware between the real and virtual environment, but apparently recent Windows versions don't really mind this in terms of drivers, but the activation process could be a problem, especially when booting both virtually and natively. So far I have not seen any complications with either drivers or activation, not even after switching to virtio drivers (see below). I am using an OEM (preactivated?) version of Windows, so that might help in this area.
Update: When booting Windows in the VM a few weeks later, it started bugging me that my Windows was not genuine, and it seems no longer activated. Clicking the "resolve now" link gives a broken webpage, and going through system properties suggests to contact Lenovo (my laptop provider) to resolve this (or buy a new license). I'm not yet sure if this is really problematic, though. This happened shortly after replacing my hard disk, though I'm not sure if that's actually related.
Rebooting into Windows natively shows it is activated (again or still), but booting it virtually directly after that still shows as not activated...
Booting the installation was actually quite painless: I just used the
wizard inside virt-manager, entered /dev/sda
(my primary hard disk) as
the storage device, pressed start, selected to boot Windows in my
bootloader and it booted Windows just fine.
Booting is not really fast, but once it runs, things are just a bit sluggish but acceptable.
One caveat is that this adds the entire disk, not just the Windows partition. This also means the normal bootloader (grub in my case) will be used inside the VM, which will happily boot the normal default operating system. Protip: Don't boot your Linux installation inside a VM inside that same Linux installation, both instances will end up fighting in your filesystem. Thanks for fsck, which seems to have fixed the resulting garbage so far...
To prevent this, make sure to actually select your Windows installation in the bootloader. See below for a more permanent solution.
In some Arduino / C++ project, I was using a custom assert()
macro, that, if
the assertion would fail show an error message, along with the current
filename and line number. The filename was automatically retrieved using
the __FILE__
macro. However, this macro returns a full path, while we
only had little room to show it, so we wanted to show the filename only.
Until now, we've been storing the full filename, and when an assert was triggered we would use the strrchr function to chop off all but the last part of the filename (commonly called the "basename") and display only that. This works just fine, but it is a waste of flash memory, storing all these (mostly identical) paths. Additionally, when an assertion fails, you want to get a message out ASAP, since who knows what state your program is in.
Neither of these is really a showstopper for this particular project,
but I suspected there would be some way to use C++ constexpr
functions
and templates to force the compiler to handle this at compiletime, and
only store the basename instead of the full path. This week, I took up
the challenge and made something that works, though it is not completely
pretty yet.
Working out where the path ends and the basename starts is fairly easy
using something like strrchr
. Of course, that's a runtime version, but
it is easy to do a constexpr
version by implementing it recursively,
which allows the compiler to evaluate these functions at compiletime.
For example, here are constexpr
versions of strrchrnul()
,
basename()
and strlen()
:
/**
* Return the last occurence of c in the given string, or a pointer to
* the trailing '\0' if the character does not occur. This should behave
* just like the regular strrchrnul function.
*/
constexpr const char *static_strrchrnul(const char *s, char c) {
/* C++14 version
if (*s == '\0')
return s;
const char *rest = static_strrchr(s + 1, c);
if (*rest == '\0' && *s == c)
return s;
return rest;
*/
// Note that we cannot implement this while returning nullptr when the
// char is not found, since looking at (possibly offsetted) pointer
// values is not allowed in constexpr (not even to check for
// null/non-null).
return *s == '\0'
? s
: (*static_strrchrnul(s + 1, c) == '\0' && *s == c)
? s
: static_strrchrnul(s + 1, c);
}
/**
* Return one past the last separator in the given path, or the start of
* the path if it contains no separator.
* Unlike the regular basename, this does not handle trailing separators
* specially (so it returns an empty string if the path ends in a
* separator).
*/
constexpr const char *static_basename(const char *path) {
return (*static_strrchrnul(path, '/') != '\0'
? static_strrchrnul(path, '/') + 1
: path
);
}
/** Return the length of the given string */
constexpr size_t static_strlen(const char *str) {
return *str == '\0' ? 0 : static_strlen(str + 1) + 1;
}
So, to get the basename of the current filename, you can now write:
constexpr const char *b = static_basename(__FILE__);
However, that just gives us a pointer halfway into the full string literal. In practice, this means the full string literal will be included in the link, even though only a part of it is referenced, which voids the space savings we're hoping for (confirmed on avr-gcc 4.9.2, but I do not expect newer compiler version to be smarter about this, since the linker is involved).
To solve that, we need to create a new char
array variable that
contains just the part of the string that we really need. As happens
more often when I look into complex C++ problems, I came across a post
by Andrzej Krzemieński, which shows a technique to concatenate two
constexpr
strings at compiletime (his blog has a lot of great posts on
similar advanced C++ topics, a recommended read!). For this, he has a
similar problem: He needs to define a new variable that contains the
concatenation of two constexpr strings.
For this, he uses some smart tricks using parameter packs (variadic
template arguments), which allows to declare an array and set its
initial value using pointer
references (e.g. char foo[] = {ptr[0], ptr[1], ...}
). One caveat is
that the length of the resulting string is part of its type, so must be
specified using a template argument. In the concatenation case, this can
be easily derived from the types of the strings to concat, so that gives
nice and clean code.
In my case, the length of the resulting string depends on the contents of the string itself, which is more tricky. There is no way (that I'm aware of, suggestions are welcome!) to deduce a template variable based on the value of an non-template argument automatically. What you can do, is use constexpr functions to calculate the length of the resulting string, and explicitly pass that length as a template argument. Since you also need to pass the contents of the new string as a normal argument (since template parameters cannot be arbitrary pointer-to-strings, only addresses of variables with external linkage), this introduces a bit of duplication.
Applied to this example, this would look like this:
constexpr char *basename_ptr = static_basename(__FILE__);
constexpr auto basename = array_string<static_strlen(basename_ptr)>(basename_ptr); \
This uses the static_string
library published along with the above
blogpost. For this example to work, you will need some changes to the
static_string class (to make it accept regular char*
as well), see
this pull request for the version I used.
The resulting basename variable is an array_string
object, which
contains just a char
array containing the resulting string. You can
use array indexing on it directly to access variables, implicitly
convert to const char*
or explicitly convert using basename.c_str()
.
So, this solves my requirement pretty neatly (saving a lot of flash
space!). It would be even nicer if I did not need to repeat the
basename_ptr
above, or could move the duplication into a helper class
or function, but that does not seem to be possible.
I recently upgraded my systems to Debian Stretch, which caused GnuPG to stop working within Mutt. I'm not exactly sure what was wrong, but I discovered that GnuPG version 2 changed quite some things and relies more heavily on the gpg-agent, and I discovered that recent SSH version can forward unix domain socket instead of just TCP sockets, which allows forwarding a gpg-agent connection over SSH.
Until now, I had my GPG private keys stored on my server, Tika, where my Mutt mail client also runs. However, storing private keys, even with a passphrase, on permanentely connected multi-user system never felt quite right. So this seemed like a good opportunity to set up proper forwarding for my gpg agent, and keep my private keys confined to my laptop.
I already had some small scripts in place to easily connect to my server through SSH, attach to the remote tmux session (or start it), set up some port forwards (in particular a reverse port forward for SSH so my mail client and IRC client could open links in my browser), and quickly reconnect when the connection fails. However, once annoyance was that when the connection fails, the server might not immediately notice, so reconnecting usually left me with failed port forwards (since the remote listening port was still taken by the old session). This seemed like a good occasion to fix that as wel.
The end result is a reasonably complex script, that is probably worth
sharing here. The script can be found in my scripts git repository.
On the server, it calls an attach
script, but that's not much more
than attaching to tmux, or starting a new session with some windows if
no session is running yet.
The script is reasonably well-commented, including an introduction on what it can do, so I will not repeat that here.
For the GPG forwarding, I based upon this blogpost. There, they
suggest configuring an extra-socket
in gpg-agent.conf
, but I've
found that gpg-agent already created an extra socket (whose path I could
query with gpgconf --list-dirs
), so I didn't use that extra-socket
configuration line. They also talk about setting StreamLocalBindUnlink
to clean up a lingering socket when creating a new one, but that is
already handled by my script instead.
Furthermore, to prevent a gpg-agent from being autostarted by gnupg
serverside (in case the forwarding fails, or when I would connect
without this script, etc.), I added no-autostart
to
~/.gnupg/gpg.conf
. I'm not running systemd user session on my server,
but if you are you might need to disable or mask some ssh-agent sockets
and/or services to prevent systemd from creating sockets for ssh-agent
and starting it on-demand.
My next step is to let gpg-agent also be my ssh-agent (or perhaps just use plain ssh-agent) to enforce confirming each SSH authentication request. I'm currently using gnome-keyring / seahorse as my SSH agent, but that just silently approves everything, which doesn't really feel secure.
On a small embedded system, I wanted to run a simple Rails application and have it automatically start up at system boot. The system is running systemd, so a systemd service file seemed appropriate to start the rails service.
Normally, when you run the ruby-on-rails standalone server, it binds on port 3000. Binding on port 80 normally requires root (or a special capability enabled for all of ruby), but I don't want to run the rails server as root. AFAIU, normal deployments using something like Nginx to open port 80 and let it forward requests to the rails server, but I wanted a minimal setup, with just the rails server.
An elegant way to binding port 80 without running as root is to use systemd's socket activation feature. Using socket activation, systemd (running as root) opens up a network port before starting the daemon. It then starts the daemon, which inherits the open network socket file descriptor, with some environment variables to indicate this. Apart from allowing privileged ports without root, this has other advantages such as on-demand starting, easier parallel startup and seamless restarts and upgrades (none of which is really important for my usecase, but it is still nice :-p).
On my mailserver, I'm using Spamassassin with a Bayes filter to detect spam. Such a filter needs to be trained with samples of spam and ham (non-spam) messages to let it learn what spam and ham looks like, but it also needs to be retrained when the spam or ham changes over time. I have some automatic training set up, but since a while I've seen the bayes filter being completely wrong (showing a confident ham score for something that is very clearly spam), so I decided to retrain the filter from scratch, using the spam and ham messages I collected over the last time (I don't really throw away any e-mail).
Since training with all my e-mail is not productive (more than 5,000
messages aren't really helpful AFAIU, and training with old messages is
not representative for current messages), I decided to just take all of
my e-mail and take the last 2,000 spam and ham messages and train with
that. My spam is neatly collected in 2 mailboxes (Spam for obvious spam
and ProbablySpam for messages that need an occasional review to find
false positives), but my ham is sorted out in dozens of different
mailboxes. Hence, I needed some find
magic to get a list of the most
recent spam and ham messages. So, I built these commands:
# find Spam ProbablySpam -type f \( -path '*/cur/*' -o -path '*/new/*' \) -printf "%T@ %p\n"
| sort -n | cut -d' ' -f 2 | tail -n 2000 > spam
# find . -type d \( -path ./Spam -o -path ./ProbablySpam -o -path ./Bulk -o -path ./Sent \) -prune -o \
-type f \( -path '*/cur/*' -o -path '*/new/*' \) -printf "%T@ %p\n" \
| sort -n | cut -d' ' -f 2 | tail -n 2000 > ham
# sa-learn --progress --spam -f spam
# sa-learn --progress --ham -f ham
After retraining with recent spam, the results were a lot better, so I'm not longer spending time every day deleting a couple dozens spam e-mails :-D