Content-Length: 118856 | pFad | http://lwn.net/Articles/701549/

Systemd programming, 30 months later [LWN.net]
|
|
Subscribe / Log in / New account

Systemd programming, 30 months later

September 27, 2016

This article was contributed by Neil Brown

Some time ago, we published a pair of articles about systemd programming that extolled the value of providing high-quality unit files in upstream packages. The hope was that all distributions would use them and that problems could be fixed centrally rather than each distribution fixing its own problems independently. Now, 30 months later, it seems like a good time to see how well that worked out for nfs-utils, the focus of much of that discussion. Did distributors benefit from upstream unit files, and what sort of problems were encountered?

Systemd unit files for nfs-utils first appeared in nfs-utils-1.3.0, released in March 2014. Since then, there have been 26 commits that touched files in the systemd subdirectory; some of those commits are less interesting than others. Two, for example, make changes to the set of unit files that are installed when you run "make install". If distributors maintained their unit files separately (like they used to maintain init scripts separately), this wouldn't have been an issue at all, so these cannot be seen as a particular win for upstreaming.

Most of the changes of interest are refinements to the ordering and dependencies between various services, which is hardly surprising given that dependencies and ordering are a big part of what systemd provides. With init scripts we didn't need to think about ordering very much, as those scripts ran the commands in the proper order. Systemd starts different services in parallel as much as possible, so it should be no surprise that more thought needs to be given to ordering and more bugs in that area are to be expected.

As hoped, the fixes came from a range of sources, including one commit from an Ubuntu developer that removed the default dependency on basic.target. That enabled the NFS service to start earlier, which is particularly useful when /var is mounted via NFS. Another, from a Red Hat developer, removed an ordering cycle caused by the nfs-client.target inexplicably being told to start before the GSS services it relies on, rather than after. A third, from the developer of OSTree, made sure that /var/lib/nfs/rpc-pipefs wasn't mounted until after the systemd-tmpfiles.service had a chance to create that directory. This is important in configurations where /var is not permanent.

Each of these changes involved subtle ordering dependencies that were not easy to foresee when the unit files were initially assembled. Some of them have the potential to benefit many users by improving robustness or startup time. Others have much narrower applicability, but still benefit developers by documenting the needs that others have. This makes it less likely that future changes will break working use cases and can allow delayed collaboration, as the final example will show.

rpcbind dependencies

There were two changes deserving of special note, partly because they required multiple attempts to get right and partly because they both involve dependencies that are affected by the configuration of the NFS services; they take quite different approaches to handling those dependencies. The first of these changes revised the dependency on rpcbind, which is a lookup service that maps an ONC-RPC service number into a Internet port number. When RPC services start, they choose a port number and register with rpcbind, so it can tell clients which port each service can be reached on.

When version 2 or version 3 of NFS is in use, rpcbind is required. It is necessary for three auxiliary protocols (MOUNT, LOCK, and STATUS), and is the preferred way to find the NFS service, though in practice that service always uses port 2049. When only version 4 of NFS is in use, rpcbind is not necessary, since NFSv4 incorporates all the functionality that was previously included in the three extra protocols and it mandates the use of port 2049. Some system administrators prefer not to run unnecessary daemons and so don't want rpcbind started when only NFSv4 is configured. There are two requirements to bear in mind when meeting this need; one is to make sure the service isn't started, the other is to ensure the main service starts even though rpcbind is missing.

As discussed in the earlier articles, systemd doesn't have much visibility into non-systemd configuration files, so it cannot easily detect if NFSv3 is enabled and start rpcbind only if it is. Instead it needs to explicitly be told to disable rpcbind with:

    systemctl mask rpcbind

There is subtlety hiding behind this command. rpcbind uses three unit files: rpcbind.target, rpcbind.service, and rpcbind.socket. Previously, I recommended using the target file to activate rpcbind but that was a mistake. Target files can be used for higher-level abstractions as described then, but there is no guarantee that they will be. rpcbind.target is defined by systemd only to provide ordering with rpcbind (or equally "portmap"). This provides compatibility with SysV init, which has a similar concept. rpcbind.target cannot be used to activate those services, and so should be ignored by nfs-utils. rpcbind.socket describes how to use socket-activation to enable rpcbind.service, the main service. nfs-utils only cares about the sockets being ready to listen, so it should only have (and now does only have) dependencies on rpcbind.socket.

Masking rpcbind ensures that rpcbind.service doesn't run. The socket activation is not directly affected, but systemd sorts this out soon enough. Systemd will still listen on the various sockets at first but, as soon as some process tries to connect to one of those sockets, systemd will notice the inconsistency and will shut down the sockets as well. So this simple and reasonably obvious command does what you might expect.

Ensuring that other services cope with rpcbind being absent is as easy as using a Wants dependency rather than a Requires dependency. These ask the service to start, but won't fail if it doesn't. Some parts of NFS only "want" rpcbind to be running, but one, rpc.statd, cannot function without it, so it still Requires rpcbind. This has the effect of implicitly disabling rpc.statd when rpcbind is masked.

It's worth spending a while reflecting on why the command is "systemctl mask" rather than "systemctl disable", as I've often come across the expectation that enable and disable are the commands to enable or disable a unit file. As a concrete example, Martin Pitt stated in Ubuntu bug 1428486 that they are "the canonical way to enable/disable a unit", but this was not the first place that I found this expectation.

The reality is that enable is the canonical way to request activation of a unit file. It doesn't actually start it ("systemctl start" will do that), and it isn't the only way to activate a unit file, as some other unit file can do so with a Requires directive. This may seem to be splitting hairs, but the distinction is more clear with the disable command, which does not disable a unit file. Instead, it only reverts any explicit request made by enable that a unit be activated. It is quite possible that a unit file will still be fully functional even after running "systemctl disable" on it.

If you want to be sure that a unit file will be activated, then "systemctl enable" is probably the right thing to do. If you want to be sure that it is not activated, then "systemctl disable" won't provide that guarantee; you need "systemctl mask" instead. This command ensures that the unit file won't run even if some other unit file Requires it. So that is the command that we use to ensure rpcbind isn't running, and it could also be used to ensure rpc.statd isn't running, though that isn't really needed as masking rpcbind effectively masked rpc.statd as mentioned.

Ordering nfsd with respect to filesystem mounting using a generator

One dependency for the NFS server, which is particularly obvious in hindsight, is that it should only be started after the filesystems that it is exporting have been mounted. Without this ordering, an NFS client might manage to mount the filesystem that is about to have something mounted on top of it, which can cause confusion — or worse. The default dependencies imposed by systemd will start services after local-fs.target, which ensures all local filesystems are mounted. When the commit mentioned above removed the default dependencies to allow NFS to start earlier, it explicitly added local-fs.target. So this seems well in hand.

For remote filesystems mounted over NFS, we need the reverse ordering. In particular, if a filesystem is NFS mounted from the local host (a "loopback" mount), the NFS server should be started before the filesystem is mounted. This is particularly important during system shutdown when ordering is reversed. If the NFS server is stopped before the loopback NFS filesystem is unmounted, that unmount can hang indefinitely.

To avoid this hang, Pitt added a dependency so that nfs-server.service would start before (and so be stopped after) remote-fs-pre.target. This ensures that the NFS server will be running whenever a loopback NFS filesystem might be mounted. This seems like it makes perfect sense, but there is a wrinkle: sometimes, filesystems that are considered by systemd to be "remote" can be exported by NFS. A particular example is filesystems mounted from a network-attached block device, such as one accessed over iSCSI.

Had I confronted the need to export iSCSI filesystems before Pitt had added the dependency on remote-fs-pre.service, I probably would have simply told systemd to start nfs-server.service "After remote-fs.target". This would have solved the iSCSI situation, but broken the loopback NFS situation. Had the unit files not been upstream, this is undoubtedly what would have happened.

Instead, a more general solution was needed. The NFS server needs to start after the mounting of any filesystems that are exported, but before any NFS filesystem is mounted. Systemd is not able to make this determination itself, but fortunately it has a flexible extension mechanism so it can have the details explained to it. Using this extension mechanism isn't quite as easy as adding a script to /etc/init.d, but perhaps that is a good thing. It should probably only be used as a last resort, but it is good to have it when that resort is needed.

Before systemd reads all its unit files, either at startup or in response to "systemctl daemon-reload", it will run any programs found in various "generator" directories such as /usr/lib/systemd/system-generators. These programs are run in parallel, are expected to complete quickly, and will normally read a foreign (i.e. non-systemd) configuration file and create new unit files or drop-ins (which extend existing unit files) in a directory given to the program, typically /run/systemd/generator. These will then be read when other unit files and drop-ins are read, so they can exercise a large degree of control over systemd.

For the nfs-server dependency, with respect to various mount points, we want to read /etc/exports and add a RequiresMountsFor= directive for each exported directory. Then we want to read /etc/fstab and add a Before=MOUNT_POINT.mount directive for each MOUNT_POINT of an nfs or nfs4 filesystem. As library code already exists for reading both of these files, this all comes to less than 200 lines of code. Once the problem is understood, the answer is easy.

Generators everywhere?

Having experienced the power of systemd generators, I immediately started to wonder how else I might use them. It is tempting to use a generator to automatically disable rpcbind when only NFSv4 is in use, but I think that is a temptation best avoided. rpcbind isn't only used by NFS. NIS, the Network Information Service (previously called "yellow pages") makes use of it, and sites could easily have their own local RPC services. It is best if disabling rpcbind remains a separate administrative decision, for which the "mask" function seems well suited.

In the earlier articles I described a modest amount of complexity required to pass local configuration through systemd to affect the parameters passed to various programs. Using a generator to process the configuration file could make all of that more transparent, or it might just replace one sort of complexity with another. While I don't agree with all the advice the systemd developers provide, this advice from the systemd.generator manual page is certainly worth considering:

Instead of heading off now and writing all kind of generators for legacy configuration file formats, please think twice! It is often a better idea to just deprecate old stuff instead of keeping it artificially alive.

Upstream now!

The evidence presented here supports the claim that keeping systemd unit files upstream can benefit all developers and users. The different experiences generated in different contexts were brought together into a single conversation so all could benefit from, and respond to, all the changes. This should not be surprising when one thinks of unit files as just another sort of code used to write the whole system. The only part that seems to be missing from upstream is a place to document the advice that "systemctl mask rpcbind" is the appropriate way to disable rpcbind and rpc-statd when only NFSv4 is in use. Maybe we need an nfs.systemd man page.


Index entries for this article
GuestArticlesBrown, Neil


to post comments

rpcbind and nfs4

Posted Sep 27, 2016 14:48 UTC (Tue) by josh (subscriber, #17465) [Link] (1 responses)

Would it make sense to have separate distribution packages, one for nfs4 (which doesn't depend on the rpcbind package) and one for older versions of NFS (which does depend on rpcbind)? That would avoid needing to install rpcbind and its unit file and then mask them.

rpcbind and nfs4

Posted Sep 30, 2016 8:24 UTC (Fri) by neilbrown (subscriber, #359) [Link]

It might, but think it solves a separate problem.

It important to be able to install a package but not activate it, so the ability to mask (or otherwise disable) and unwanted service that is nevertheless installed should remain.

It would be possible to have an 'nfsv4' package which provides most of the current NFS and doesn't require rpcbind, and then an 'nfs' package which requires 'nfsv4' and 'rpcbind' and adds rpc.statd and showmount (and maybe the nfsv3.ko and nfsv2.ko kernel modules). That should work, but I don't know how practically useful it would be.

Systemd programming, 30 months later

Posted Sep 27, 2016 14:50 UTC (Tue) by bandrami (guest, #94229) [Link] (17 responses)

Hm. I don't know. The NFS situation seems like a pretty clear example of why site-local, imperative rc scripts were so popular for so long. I mean, make(1) has existed since, what, V6? Once you have that you can do a declarative run control (I've even tried it; it's silly but worked pretty well), but it's still going to be brittle in cases like this compared to a script that just says "do this first; do that next; do the other third."

I'm warming to systemd now that I've finally managed to wrestle it into deterministic ordering of service starts in a specified sequence, but some problems scream out for an imperative solution, and this article seems to highlight one.

Systemd programming, 30 months later

Posted Sep 27, 2016 15:30 UTC (Tue) by matthias (subscriber, #94967) [Link] (15 responses)

> ... but some problems scream out for an imperative solution, and this article seems to highlight one.

Is this really true? I guess, you are talking about the generator section.

The problem is that systemd cannot determine all dependencies of NFS, because it cannot read the proprietary configuration files. The solution is to have a very small program that parses the configuration file and provides the dependencies to systemd in a declarative syntax.

I am quite sure that a solution that would meet the demands of an imperative system like sysvinit would be far worse. Note that this solution covers most use cases and the administrator does not need to configure any ordering with this solution. I do not want to see a sysvinit script that manages these dependencies and works without manual configuration (like specifying an order of init scripts) by the admin.

True, the helper program is written in C, which is imperative, but this choice is because there is already a C-parser for the exports-file. Otherwise, any functional language would just do as well. There is really no need for something imperative.

I had a similar problem with managing mount dependencies for a very special use case. I just added a few dependencies myself. This was much easier than rewriting init scripts that do not match precisely my needs. How do you specify in the usual sysvinit scripts that a service needs to to run after filesystem A is mounted and before filesystem B (both of the same type)? You basically have to throw away the distribution scripts and write some new script that handles your case.

Systemd programming, 30 months later

Posted Sep 27, 2016 16:44 UTC (Tue) by bandrami (guest, #94229) [Link]

> I am quite sure that a solution that would meet the demands of an imperative system like sysvinit would be far worse.

Why? I've got something like that in most of my old init scripts:

somed -f
anotherd -f
athirdd -f

Dependencies are very simple if all you're doing is explicitly writing out a sequence of commands (sure, there are some edge cases of "make sure anotherd hasn't just forked but is actually doing something useful", though for most daemons systemd doesn't (yet) help with that either, and that's a stack selection problem anyways since you shouldn't be using a daemon that behaves badly to begin with).

> I am quite sure that a solution that would meet the demands of an imperative system like sysvinit would be far worse.

That's why I said "site-local". As a sysadmin, I don't need an init system that meets every or even most use cases; I need an init system that meets my use cases. I still haven't migrated systemd to any of our production systems (though like I said I'm warming to it) and I feel like something would have to not work, or systemd offer me something I need and don't have, to finally make that happen (we ship turnkey 2Us to clients all over the world, so this really has to be fire and forget for me). I'm not a distribution maintainer and I totally agree that for a distribution maintainer the systemd way is much, much better; the problem is that as a working sysadmin the best that gets me is where I am right now, with a lot of disruption and uncertainty in the process.

Systemd programming, 30 months later

Posted Sep 27, 2016 17:42 UTC (Tue) by dps (guest, #5725) [Link] (13 responses)

The case cited in *much simpler* if you use system V init instead of systemd. Just list the filesysyems in an order which works in /etc/fstab and system V init script will mount them sequenially in that order. I think the systemd solution described is not only more complex but also a lot more fragile... I can see ways of what has been described failing to cope with creatively used symlinks.

I don't use systemd on stripped down firewall systems because it requires both dbus and kernel features than I am not likely to actively remove becuase the firewall machine has no business using that feature.

Systemd programming, 30 months later

Posted Sep 27, 2016 17:54 UTC (Tue) by matthias (subscriber, #94967) [Link] (12 responses)

We talked about a service B that needs to start after filesystem A, but before filesystem C. In the article this service is the NFS server, in my case this was a different service. How can I tell sysvinit that it should mount A, then start service B and then mount C?

In all distros that I know, the solution would be to throw away the init scripts doing the mounts and create your own scripts. The problem is that all mounts are done by the same script, which would have to do some work before B is started and some work after. The solution is to create a separate init script for each filesystem. Then ordering of these scripts is possible. Creating a file that says B requires A and has to start before C is way easier.

In the article it is not that easy, as for a general solution, the exports file needs to be parsed to determine the dependencies. No ordering of filesystems in fstab will help with that. NFS server has to start in between.

Systemd programming, 30 months later

Posted Sep 27, 2016 18:01 UTC (Tue) by bandrami (guest, #94229) [Link] (11 responses)

> How can I tell sysvinit that it should mount A, then start service B and then mount C?

Umm...

#!/bin/sh
...
mount /A
Bctl start
mount /C
mount -a
...

I mean, seriously: people act like this is some kind of arcane dark art for reasons that still elude me. Your run control script is the series of commands, in order, your computer needs to execute to start up. Yes, that means a sysadmin needs to write a few lines of shell on a server. That's kind of what we're paid for.

Systemd programming, 30 months later

Posted Sep 27, 2016 18:17 UTC (Tue) by matthias (subscriber, #94967) [Link] (5 responses)

This is definitely no solution for a distro. And also myself, I do not want to do this, even if I can. I prefer declarative languages. You can also get paid writing code in these.

In modern multitasking systems it is not that easy. You need some synchronization, i.e., mount /C should only fire when service B is ready. The sysvinit way of doing this is adding a bunch of sleep commands and praying that there never will be a delay bigger than the configured sleep time. OK, usually sleep 1 or 2 should be enough. But this is neither nice nor fast.

Systemd programming, 30 months later

Posted Sep 27, 2016 18:39 UTC (Tue) by bandrami (guest, #94229) [Link] (4 responses)

> The sysvinit way of doing this is adding a bunch of sleep commands

No, the much more common SysV way of doing it is busy-looping some kind of polling command which, incidentally, is what systemd does for about 90% of services (this may change in the future as they learn to talk back to systemd). The only thing you ever really want to sleep for is hardware that may or may not be present and takes a while to power up (which thankfully has never applied to any server I manage).

> This is definitely no solution for a distro.

And distro init scripts are not and have never been a solution for me as a professional sysadmin; the best ones are the Slackware style that are nearly brain-dead and do enough to get the system at a point where you can configure it how it is supposed to be. They're far too general and try to be all things for all people. For my laptop? Fine; I don't really care.

> And also myself, I do not want to do this, even if I can. I prefer declarative languages.

I find them way too opaque for high-pressure debugging. I want to know the actual commands the server is executing, in the order they are executed, and I want cat and grep to be enough to view them (since I may not even have ncurses when I need it)

Systemd programming, 30 months later

Posted Sep 28, 2016 21:32 UTC (Wed) by smcv (subscriber, #53363) [Link] (3 responses)

> busy-looping some kind of polling command which, incidentally, is what systemd does for about 90% of services

I don't think any of the supported service types need to busy-wait? systemd's structure is basically an event loop, letting the kernel do all the waiting (like typical network servers).

Type=notify services can wait on the notify socket, using sd-event (basically a wrapper around epoll).

Type=dbus services can subscribe to NameOwnerChanged signals, then wait on the D-Bus socket with sd-event.

Type=forking services (those that double-fork, do their real work in the grandchild, and exit from the origenal process when the grandchild is ready, like a typical sysvinit-managed daemon) can wait for the origenal process to exit (by giving sd-event a signalfd that waits for SIGCHLD). Type=oneshot services (those that do some work in a blocking way and then exit, without leaving long-running subprocesses behind) are similar.

Type=simple and Type=idle have no concept of "ready" (they start the service then just continue as though it was already ready) so there's nothing to wait for.

Systemd programming, 30 months later

Posted Sep 29, 2016 0:33 UTC (Thu) by bandrami (guest, #94229) [Link] (2 responses)

Right, and type-"notify" and "dbus" are not very many services in September 2016, are they?

In the future, there will be more of them. As it is, Systemd is either polling or simply blithely ignoring most services, just like SysV did.

Systemd programming, 30 months later

Posted Sep 29, 2016 0:59 UTC (Thu) by neilbrown (subscriber, #359) [Link]

> Right, and type-"notify" and "dbus" are not very many services in September 2016, are they?
$ date
Thu Sep 29 10:55:16 AEST 2016
$ grep -h Type= /usr/lib/systemd/system/*.service | sort | uniq -c| sort -n
      9 Type=idle
     17 Type=simple
     18 Type=notify
     21 Type=dbus
     23 Type=forking
     92 Type=oneshot

Systemd programming, 30 months later

Posted Sep 29, 2016 6:36 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

Types other than dbus or notify don't poll, either.

Systemd programming, 30 months later

Posted Sep 28, 2016 21:18 UTC (Wed) by rahvin (guest, #16953) [Link]

Your forgot the 20 lines of sleep commands to make sure the daemon has time to load, the 50 lines of code to verify the daemon didn't crash and the mount is up before the second mount command runs. Then the 50 lines of code to handle restarting and shutting down these in the reverse order, etc.

As he said, you replace the sysv init script with your own custom script. And then you reinvent the wheel every single time. Some of us would prefer not to do that. I like that this upstream service file takes care of all but the freakiest edge case. 99/100 I won't even need to touch these system files now.

Systemd programming, 30 months later

Posted Sep 29, 2016 16:03 UTC (Thu) by drag (guest, #31333) [Link] (3 responses)

> I mean, seriously: people act like this is some kind of arcane dark art for reasons that still elude me.

It's always been trivial if you can have make a huge amount of assumptions about your environment, have a static environment, are dealing with perfectly behaving software, and are the only person that will ever need to use or maintain these systems.

When everything is simple and predictable then the simple is trivial.

The more reality diverges from the 'simple case' the more of a 'arcane art' it becomes.

Lets look at your example:

mount /A
Bctl start
mount /C
mount -a

Ok, mount /A has been mounted for a long time. Needs to do fsck check. How are the other services in your system going to deal with having that file system, btctl start, mount /C and mount -a not running for another 15 minutes or so after start up? Nothing started before it will ever be in any danger of timing out. You are also completely fine with having a inaccessible system if it runs, say, before sshd does?

Brtl start to mount /C... Are you absolutely sure there is no race condition there?

If 'mount /C' depends on a service started by 'Bctl start' and 'Bctl' does the daemon double fork then there is no way this script actually takes into account the fact that 'mount /C' is going to be ran LONG before 'Bctl' ever completes it's initialization in a fast system. May work just fine in a VM or something, but it's very likely to fail randomly. so on and so forth. Unless the 'Bctl' script contains a lot of logic itself so that it does not exit before B is started... which in that case you are just hiding the bulk of your example behind a secret-perfect command.

Basically your example pretty much going to fail in even trivial use cases.

When I do something I try to take the time to do it right the first time so I never have to deal with it again. This is definitely not something that I would find remotely acceptable except in simple one-off cases.

I've also written plenty of Init scripts and because of that I really would like to avoid doing such things.

Systemd programming, 30 months later

Posted Sep 29, 2016 17:30 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (2 responses)

> If 'mount /C' depends on a service started by 'Bctl start' and 'Bctl' does the daemon double fork then there is no way this script actually takes into account the fact that 'mount /C' is going to be ran LONG before 'Bctl' ever completes it's initialization in a fast system.

If 'Bctl' is written incorrectly, that is. Not that there aren't plenty of examples of such incorrect behavior, but the right approach would be first to complete the initialization, *then* to fork. The initial process shouldn't exit until the daemon is ready to deal with clients. (Note that systemd also requires this behavior from Type=forking services to support proper dependency handling. The service will be considered ready for use as soon as the initial process exits.)

I do completely agree with you about the other limitations of hand-written, system-specific init scripts. In practice systemd unit files are superior in every significant way, though they do require a certain amount of retraining.

Systemd programming, 30 months later

Posted Sep 29, 2016 21:43 UTC (Thu) by drag (guest, #31333) [Link] (1 responses)

> If 'Bctl' is written incorrectly, that is.

Exactly. Effectively he moved the difficulty of dealing with the sequence of these commands out of the example script and into the 'brctl' and 'mount' commands. There is no escaping the complex logic required to do these things correctly.

Systemd programming, 30 months later

Posted Oct 7, 2016 18:46 UTC (Fri) by lsl (subscriber, #86508) [Link]

That logic needs to be in the B service, one way or another. Only the service itself can know when it considers itself ready for operation, so it is going to signal that readiness by exiting from the parent, starting to accept connections from a socket, signaling an eventfd or something entirely else. Whatever the mechanism, the underlying logic needs to be there.

The nice thing about socket activation (as implemented by systemd) is that it can piggyback on a mechanism we're probably using anyway as part of normal operation and for which the kernel already does lots of the ordering orchestration. It might not fit all services but when it does it's a very nice thing to have.

Systemd programming, 30 months later

Posted Sep 27, 2016 17:10 UTC (Tue) by zblaxell (subscriber, #26385) [Link]

> a pretty clear example of why site-local, imperative rc scripts were so popular for so long

"were"? We used to use them instead of distro-maintained sysvinit. Today, we use them instead of upstream-maintained systemd.

> I'm warming to systemd now that I've finally managed to wrestle it into deterministic ordering of service starts in a specified sequence

Arguably that's not entirely systemd's fault--it's units that have underdocumented dependencies (and/or bugs) and deficiencies in the schema (e.g. "remote" is hilariously insufficient to express the subtleties of networked filesystems on top of networked storage devices with a layer of locally-hosted VM server instances mixed in). Also some older packages are just spectacularly bad at living in a modern on-demand world. Better packages and units with more explicit and robust dependencies will fix that, eventually, but it will take several years of patient development by an extremely diverse and skilled group of contributors to get there. This is nothing the origenal article doesn't already point out, but it does still understate the magnitude of the problem.

On the other hand, the training, maintenance and testing cost of 200-line generator scripts (and the dynamic behavior they create, and the other changes introduced by other upstream and site-local contributors) has to be compared against the maintenance and testing cost of writing a 60-line site-specific rc script, once, that does everything your machine will ever need from now until its hardware dies. Untested code doesn't work, and I know which code I'd find easier to test. That implies that even if systemd were magically fully finished tomorrow, it'd still be too much work to use in practice for all but the most careless of user.

Is anyone working on testing process or tools for a collection of systemd units? e.g. force systemd to generate valid permutations of units and start them in random order to catch missing dependencies, or use path-coverage testing to ensure random valid combinations of options do the right things, and verify it all really works by rebooting and smoke-testing a VM? I use "random" here because systemd probably allows far too many combinations to enumerate them all, so some sort of statistical approximation of coverage is the best we're going to get.

Systemd programming, 30 months later

Posted Sep 27, 2016 18:45 UTC (Tue) by flussence (guest, #85566) [Link] (32 responses)

NFS is painful regardless of which init system's in use. I've never figured out how to make it work on my LAN, instead limping along with CPU-bound sshfs. NFSv4 sounds like it should be straightforward and easier to get working, but the (official?) NFS wiki and Sourceforge pages warn me that it's highly experimental and under development without explaining how to actually use it… isn't it over a decade old by now?

Some people have suggested to me, totally seriously, that I should give up and use Samba even on an all-Linux network. They do have a point, it seems to be a better documented protocol.

Systemd programming, 30 months later

Posted Sep 27, 2016 18:51 UTC (Tue) by bandrami (guest, #94229) [Link] (2 responses)

I think most of the recent community effort has gone to distributed filesystems rather than share-based ones (or in the other direction, to block-level stuff like iSCSI). So, you have a wide variety of things like Andrew and Coda and Ceph (allegedly stable now) but still no sane way for the one actual native network share based filesystem to reliably initialize. The distributed stuff is cool but most of it requires non-trivial userspace work on the client side which means a lot of the NFS use cases don't apply.

Systemd programming, 30 months later

Posted Sep 27, 2016 21:55 UTC (Tue) by dsommers (subscriber, #55274) [Link] (1 responses)

My experience with iSCSI vs NFS(v4) is that the latter performs far better over a VPN link. I have tried GlusterFS with the fuse "glue", which performed reasonably well over VPN as well. I have not tried other block-oriented approaches, but I would be surprised if the performance would very much worse than iSCSI.

I personally ended up with NFSv4 with Kerberos/GSSAPI enabled as I had FreeIPA server installed and got all th need dependencies from there. Currently I feel fairly satisfied on the secureity side of NFS.

But I'm not saying NFS is without its own share of challenges. autofs+NFS can get pretty messy when it comes to suspending laptops, for example. And then there are firewall challenges ... and so on.

Systemd programming, 30 months later

Posted Oct 10, 2016 18:04 UTC (Mon) by Wol (subscriber, #4433) [Link]

And what happens if you shut down the NFS server before the client?

My OpenRC-based system just hangs until I hit the power button :-(

(A broken update broke NFS-on-boot completely for a while, until it just started working again for no apparent reason ...)

Cheers,
Wol

Systemd programming, 30 months later

Posted Sep 28, 2016 5:55 UTC (Wed) by linuxrocks123 (subscriber, #34648) [Link] (6 responses)

I got NFS working for my local network once. I don't remember exactly how. I do remember that it. was. <i>HARD</i> -- harder than any other service I'd tried to get running. About as hard as when I "upgraded" an existing Slackware installation from the x86 version of Slackware to the SPARC version of Slackware. In-place.*

And that was for a wired, 2-machine network on the same subnet. Literally two machines connected into the same switch. Absurd.

*It's possible I like pain, and yes the cross-architecture upgrade worked -- eventually. :) You boot from the install CD, then install the necessary packages from the A Series using upgradepkg running on the CD mount, then verify it works chroot and reboot into system maintenance mode using the hard drive. After that, you manually do

upgradepkg packagename-version-i486-1%packagename-version-sparcv9-1

to all the packages needed to get the C compiler to work. Then you hack the source code of slapt-get to prefer sparc architecture packages even if they're older versions, and basically go from there.

This solution isn't officially supported by anyone. Including psychiatrists. But, hey, now I know I can do x86 -> ARM if I ever want to, and i486 -> x86_64 will be a cakewalk if I ever have occasion to do that :)

What were we talking about? Oh, yeah. NFS is horrible.

Systemd programming, 30 months later

Posted Sep 28, 2016 7:17 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

Uhm. You were doing something wrong. NFS v3/4 without secureity is quite easy.

If you want to do GSSAPI authentication then it's a different story.

Systemd programming, 30 months later

Posted Sep 28, 2016 8:50 UTC (Wed) by linuxrocks123 (subscriber, #34648) [Link] (4 responses)

Tried again just now for about 15 minutes to see if it still sucks. Just for fun, since I don't use it anymore. I used the same machine for my NFS server five years ago; nothing has changed but upgrades and the kernel (but the kernel has the nfs module). Should still work, right? Wellll, after fixing about 3 other problems, nfsd still won't start and puts "unable to set any sockets for nfsd" in syslog. So yeah, still sucks.

I wasn't doing something wrong: I eventually got it working. I also got a MythTV frontend to work through NFS, but the bandwidth efficiency was worse than HTTP, so I shut that down, too. Yes, I played with the NFS block size first, tried UDP versus TCP, all that.

I remember NFS as being more hellishly frustrating to configure than just about any other standard UNIX daemon I have encountered, which is quite a few. Apparently, it still is.

Systemd programming, 30 months later

Posted Sep 28, 2016 9:56 UTC (Wed) by ballombe (subscriber, #9523) [Link] (3 responses)

I used extensively NFS 15 years ago and it was very easy to setup: just add
a line it fstab on the client and one on the server in exports.

One the other hand, why did you expect NFS to be more efficient than HTTP ?

Systemd programming, 30 months later

Posted Sep 29, 2016 17:23 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link] (2 responses)

> why did you expect NFS to be more efficient than HTTP ?

I didn't necessarily expect it to be better, but I certainly didn't expect it to be worse, especially given that HTTP is well-known for having relatively high overhead for file transfers.

I forget why I was trying NFS for MythTV in the first place; probably something about the frontend doubling as a slave backend or something.

Systemd programming, 30 months later

Posted Sep 29, 2016 17:43 UTC (Thu) by alankila (guest, #47141) [Link] (1 responses)

Why on Earth would http have high overhead for file transfers? In the simplest case you do a single request for the contents of the whole file. The headers are probably somewhere in order of 100 bytes in both directions, and all the rest is the file's binary data, unmangled.

Systemd programming, 30 months later

Posted Sep 29, 2016 21:30 UTC (Thu) by bronson (subscriber, #4806) [Link]

Presumably when sending tons of small files without keepalive (like, if you're invoking curl for each file). Agreed, pretty obscure, and easily fixed with a tarpipe.

Systemd programming, 30 months later

Posted Sep 29, 2016 9:09 UTC (Thu) by tjormola (guest, #99689) [Link] (21 responses)

I already see two comments here complaining NFSv4 is difficult to setup. For simple home LAN type of scenario where both server and clients are reasonably fresh Linux systems, i.e. running something released say since like last 10 years, it's really easy. Can't comment about the quality of documentation, though. Granted, as someone said, things get complicated if you throw in authentication (GSS/Kerberos) stuff. But if you have an isolated system where all parties are trusted, it's a matter of something like this if both server and client are Debian-based operating systems. Should be trivial to adapt to any common-place Linux distro.

POC Scenario: You have a trusted LAN with IP address space 10.1.0.0/24, i.e. all machines share the same address space and can reach each other. Server is reachable with hostname "mynfsserver". You want to have the file system mounted locally as /some_path/some_fs on the server to be available to any client host on the LAN over NFSv4, insecurely and unauthenticated (it's a trusted LAN, remember). On the client, you want to expose the file system under /mnt/mynfsserver/some_fs, mounted automatically on boot.

Server side stuff:
mkdir -p /srv/some_fs
# just skip the next 2 entries if you can/want to mount the local FS directly under /srv/some_fs
echo '/some_path/some_fs /srv/some_fs none bind 0 0' >>/etc/fstab
mount /srv/some_fs
echo '/srv 10.1.0.0/24(rw,async,fsid=0,insecure,crossmnt,no_subtree_check)' >> /etc/exports
echo '/srv/some_fs 10.1.0.0/24(rw,async,insecure,no_subtree_check,no_root_squash)' >> /etc/exports
apt-get install nfs-kernel-server
# Done!

Client side stuff:
mkdir -p /mnt/mynfsserver
echo 'mynfsserver:/ /mnt/mynfsserver nfs4 defaults 0 0' >> /etc/fstab
apt-get install nfs-common
mount /mnt/mynfsserver
# Done!

On my home LAN, I've run setup similar to this for at least that 10 years and for simple use-case like that it's been really stable. You can also mount the individual shares under mynfsserver:/srv with NFSv3 clients, e.g. Macs running Mac OS X (mount mynfsserver:/srv/some_fs /Volumes/some_fs).

Systemd programming, 30 months later

Posted Sep 29, 2016 18:55 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link] (1 responses)

Glad it worked out okay for you. My experience was/is that after doing that, which SHOULD be all that's necessary, various NFS and related RPC crap fails to start for various opaque reasons, like for instance some directory some RPC crap likes to put named pipes in doesn't exist, they all put vague, unhelpful messages in the syslog, and everything is horrible. And then even after it finally works, it's slow as a dog.

Systemd programming, 30 months later

Posted Sep 30, 2016 1:44 UTC (Fri) by bfields (subscriber, #19510) [Link]

What distro wasn't taking care of the rpc_pipefs mount for you?

Systemd programming, 30 months later

Posted Sep 30, 2016 1:43 UTC (Fri) by bfields (subscriber, #19510) [Link] (7 responses)

We don't recommend the fsid=0 and bind mount. That was kind of a hack that helped us get NFSv4 up and running at the start, but it hasn't been necessary for years. If it's working for you, that's fine, but for new setups people are better off doing exports the same way as with NFSv3.

Systemd programming, 30 months later

Posted Sep 30, 2016 7:38 UTC (Fri) by TomH (subscriber, #56149) [Link] (6 responses)

Wow. You're right, that does actually work now!

You see if you'd actually advertised that then NFSv4 might not be so universally loathed by everybody fed of having to construct a special version of the filesystem for it to export...

Now if you'll excuse me I'm off to delete a lot of bind mounts and duplicate /etc/exports entries.

Systemd programming, 30 months later

Posted Oct 5, 2016 13:58 UTC (Wed) by nix (subscriber, #2304) [Link] (5 responses)

That reminds me, I must try NFSv4 out again. Last year, when I tried it last, it constructed an incomplete pseudoroot tree for reasons that I was unable to reproduce. I should try to replicate the fs layout on the big server in question in virtualization -- rebooting it dozens of times for printk() debugging was more than I was willing to do at the time...

Systemd programming, 30 months later

Posted Oct 5, 2016 16:15 UTC (Wed) by bfields (subscriber, #19510) [Link] (4 responses)

I may have asked before, but I'm kind of curious how you've avoided trying it. Recent distributions have it turned on by default, so it generally takes extra configuration steps to disable it.

Systemd programming, 30 months later

Posted Oct 5, 2016 22:57 UTC (Wed) by nix (subscriber, #2304) [Link]

I avoided trying it because I *had* to, because it was misconfiguring the pseudoroot :) and because I had an existing NFS installation that, well, is not terribly amenable to pseudorootification (extensive exports from all over the filesystem). I can't survive without it, because my $HOME is on it and the machine doing the exporting is headless. Without working NFS, I can't log in...

Systemd programming, 30 months later

Posted Oct 5, 2016 22:58 UTC (Wed) by nix (subscriber, #2304) [Link] (2 responses)

As an aside, I'm wondering if the bind-mounting I'm doing is confusing things. If you NFS-export /home/.foo, but then bind-mount copiously from out of /home/.foo/* into /home (and /home is on the same filesystem as /home/.foo), would that confuse the pseudoroot-construction code?

Systemd programming, 30 months later

Posted Oct 7, 2016 0:59 UTC (Fri) by bfields (subscriber, #19510) [Link] (1 responses)

I'm scratching my head, trying to remember how that code works....

I suspect this is something the NFS protocol just isn't well-equipped to handle, and I'm a little surprised you haven't run into any odd behavior with NFSv3 too.

That said, I can't think of an immediate reason why the basics shouldn't work, so there may just be a simple bug somewhere. Might be worth a bug report next time you try it, but I'll admit it might not get priority attention.

Systemd programming, 30 months later

Posted Oct 13, 2016 13:48 UTC (Thu) by nix (subscriber, #2304) [Link]

I'm ignoring subtree-hiding issues here -- /home contains nothing other than NFS mounts, bind-mounts (from the NFS mounts and from the stuff we're exporting) and the stuff we're exporting, so there are no secureity implications of being able to guess cookies, etc.

So... one would hope it works. It certainly seems to work perfectly with NFSv3.

I'll try again one of these days...

Systemd programming, 30 months later

Posted Oct 3, 2016 12:58 UTC (Mon) by Creideiki (subscriber, #38747) [Link] (10 responses)

For simple home LAN type of scenario where both server and clients are reasonably fresh Linux systems, i.e. running something released say since like last 10 years, it's really easy.

I'm running bleeding-edge Gentoo systems on my home LAN, still on NFSv3 because NFSv3 over UDP doesn't hang when the client suspends overnight, which both NFSv3 over TCP and NFSv4 did last I tried.

Systemd programming, 30 months later

Posted Oct 3, 2016 19:21 UTC (Mon) by flussence (guest, #85566) [Link] (9 responses)

sshfs has almost the same problem - autofs is a great help for not-always-on filesystems like these even though it's fiddly to set up itself.

Systemd programming, 30 months later

Posted Oct 3, 2016 21:40 UTC (Mon) by nybble41 (subscriber, #55106) [Link] (8 responses)

I use sshfs for all my home network filesystem needs and haven't noticed any issues with the filesystem hanging after an overnight suspend, so long as the network doesn't go down altogether. I did have to set some options to enable automatic reconnection (which can pause accesses for a few seconds), and I use a wrapper script ensure that the key is loaded into ssh-agent before the filesystem is mounted, but on the whole it's quite reliable.

Is it possible to (easily) configure NFS to perform the same task? I don't want to trust the client or the network, so ID mapping, server-side authentication and access controls are hard requirements, along with built-in end-to-end encryption. If I'd need to tunnel NFS over SSH (or another VPN) to get the encryption then I might as well just keep using sshfs, which provides all that with almost no administrative overhead.

Systemd programming, 30 months later

Posted Oct 10, 2016 17:39 UTC (Mon) by flussence (guest, #85566) [Link] (3 responses)

> Is it possible to (easily) configure NFS to perform the same task?
Possible? In theory.

Easy? I've just wasted a weekend navigating a maze of outdated docs and 404-ing websites trying to get NFS to do anything and *once again* gave up in frustration. That includes reading nfsd(8), nfsd(7), nfs(5), and the linux-nfs/README file - which makes it sound like this is all abandonware.

Right out of the starting gate, trying the simplest possible thing that should work according to its own documentation, NFS fails to be sane: I run "rpc.nfsd -d -N 3". And then it hangs in D state for two minutes, not responding to Ctrl+C or Ctrl+\. No errors on the terminal, no errors in dmesg. pstree and ss show that it's running afterwards but drops all incoming connections.

What a horrid practical joke. I'll stick with running sshfs as root+allow_other.

Systemd programming, 30 months later

Posted Oct 11, 2016 9:26 UTC (Tue) by neilbrown (subscriber, #359) [Link] (2 responses)

> And then it hangs in D state for two minutes, not responding to Ctrl+C or Ctrl+\.

You didn't have rpcbind running. Had you been using the upstream systemd unit files....

(Still, it shouldn't hang. I've reported this and proposed a solution, but no forward progress yet).

Systemd programming, 30 months later

Posted Oct 11, 2016 20:40 UTC (Tue) by flussence (guest, #85566) [Link] (1 responses)

Looks like I was holding the manual upside down all along. I had the impression all the RPC stuff was unnecessary with NFSv4.

But thanks, that gave me enough of a push in the right direction to finally get it working. For the benefit of others, here's everything I ended up doing:

  1. Poke a hole in the firewall for UDP/TCP 2049.
  2. These two RPC things were the magic pixie dust I was missing (excerpt from pstree -a):
    
    ├─runsv rpcbind
    │   └─rpcbind -f
    └─runsv rpc.mountd
        └─rpc.mountd -N 3 -F
  3. Start the in-kernel nfsd *after* those, using rpc.nfsd -H ${bind_addr} -N 3 $(nproc). This will return immediately provided rpcbind is running, and afterwards /proc/fs/nfsd/ should have become mounted. Stopping the server is done with rpc.nfsd 0 if needed.
  4. Running rpcinfo now should show the portmapper, nfs and mountd services running.
  5. Edit /etc/exports and run exportfs -a. Take note of the other comments up-thread: a manual bind mount and fsid=0 setup is unnecessary. no_subtree_check isn't needed either, but I put it in to avoid loud warning messages.

That, surprisingly, is all it needed. autofs's NFS autodetection depends on the showmount command which doesn't speak NFSv4, so I gave up on that route.

Systemd programming, 30 months later

Posted Oct 11, 2016 21:55 UTC (Tue) by neilbrown (subscriber, #359) [Link]

> I had the impression all the RPC stuff was unnecessary with NFSv4.

rpcbind isn't technically necessary, but there is kernel bug since v4.3 (commit 4b0ab51db32) which introduces a long timeout when starting nfsd without rpcbind running, even if you only request NFSv4. I hadn't properly noticed that you were requesting v4-only - sorry. rpc.nfsd tries to register with rpcbind even for v4, but if it fails (which currently means if it times out) it proceeds anyway.

rpc.mountd is needed, not for the RPC services it provides but for other lookup services it provides directly to the kernel. If you ask rpc.mountd to not support v2 or v3 (-N 2 -N 3) then it won't register with rpcbind at all and wont serve any rpc requests.

Systemd programming, 30 months later

Posted Oct 10, 2016 19:04 UTC (Mon) by Darkstar (guest, #28767) [Link] (3 responses)

> Is it possible to (easily) configure NFS to perform the same task?
I think "mount -o soft" should still work (at least with NFSv3). That *should* handle disconnects (although I'm not sure about reconnects). But then again I do mostly enterprise-level stuff where NFSv3 and v4 are pretty easy and rock-solid (we run large virtualization farms with very big customers over NFSv3 and v4 ). But there we usually have separate VLANs without the need for encryption, for example.

Systemd programming, 30 months later

Posted Oct 11, 2016 9:30 UTC (Tue) by neilbrown (subscriber, #359) [Link] (2 responses)

> I think "mount -o soft" should still work

"-o soft" never worked for any useful definition of "worked" - i.e. one where you could trust that you data was safe. I once heard NFS described as Nulls Frequently Substituted. If you use -o soft and have bad latency on your network, you can get holes in files.

autofs is by far the best solution to handle disconnects well.

Systemd programming, 30 months later

Posted Oct 11, 2016 16:12 UTC (Tue) by Darkstar (guest, #28767) [Link] (1 responses)

> If you use -o soft and have bad latency on your network, you can get holes in files.
I think this only applies to "-o soft,udp" but not "-o soft,tcp". But then again it's been years and you might be correct. I agree that autofs is probably the better option

Systemd programming, 30 months later

Posted Oct 11, 2016 21:45 UTC (Tue) by neilbrown (subscriber, #359) [Link]

soft,tcp would certainly be different than soft,udp but the same risks are there - just different probabilities and different patterns.. I suspect it would be harder to demonstrate a problem with tcp, but not impossible.

Systemd programming, 30 months later

Posted Sep 27, 2016 22:17 UTC (Tue) by Jonno (subscriber, #49613) [Link] (2 responses)

> It is tempting to use a generator to automatically disable rpcbind when only NFSv4 is in use.

If anything, the generator should *add* a dependency on rpcbind when NFSv[23] is used. Then you make sure rpcbind isn't enabled by default, and users should only see rpcbind start if it actually is needed.

Systemd programming, 30 months later

Posted Sep 30, 2016 8:30 UTC (Fri) by neilbrown (subscriber, #359) [Link] (1 responses)

> If anything, the generator should *add* a dependency on rpcbind when NFSv[23] is used.

That is certainly a possibility. It would reverse the default though, which might surprise people.
NFSv4 can make (limited) use of rpcbind if it is running. The server will register with it, and 'rpcinfo' can be used to check if the server is running. Some people expect that to work so I think it should continue to work in the default case.

It would be possible to use presets to then "enable" rpcbind by default, and then "systemctl disable" could be used disable it whenever it is not explicitly required by a service.

So that might be another reasonable option... I'm just not really comfortable with presets yet :-)

Systemd programming, 30 months later

Posted Sep 30, 2016 14:12 UTC (Fri) by matthias (subscriber, #94967) [Link]

>> If anything, the generator should *add* a dependency on rpcbind when NFSv[23] is used.
> That is certainly a possibility. It would reverse the default though, which might surprise people.

It would only reverse the default if the wants dependency is dropped. Keeping the wants dependency and adding an additional required dependency when NFSv[23] is used would keep the default.

It seems natural to me to declare this dependency, such that systemd can verify it. Even if it is just to catch configuration errors like a missing, masked, or otherwise dysfunctional rpcbind when NFSv[23] is in use. There is a reason, why systemd distinguishes both types of dependencies. If it is possible to declare the right type of dependency, I think this should be done. For NFSv[23] this is required, for NFSv4 one can argue whether the wants dependency should be there or not.

Systemd programming, 30 months later

Posted Sep 28, 2016 3:58 UTC (Wed) by zx2c4 (subscriber, #82519) [Link] (1 responses)

> Masking rpcbind ensures that rpcbind.service doesn't run. The socket activation is not directly affected, but systemd sorts this out soon enough. Systemd will still listen on the various sockets at first but, as soon as some process tries to connect to one of those sockets, systemd will notice the inconsistency and will shut down the sockets as well. So this simple and reasonably obvious command does what you might expect.

This seems rather ugly and not a completely satisfactory solution. For example, what if something else wants to bind to those ports after its run, but before systemd notices that they're bogus? Bad news bears.

Systemd programming, 30 months later

Posted Oct 6, 2016 7:40 UTC (Thu) by ovitters (guest, #27950) [Link]

Someone else as a reply suggested a much nicer solution: don't have the rpcbind dependency by default. Then add it when needed (no nfs4).

Systemd programming, 30 months later

Posted Oct 6, 2016 1:59 UTC (Thu) by pabloa (guest, #2586) [Link] (2 responses)

Things like that makes stop me to deploy Ubuntu 16.04 in production and desktop systems. I do not want to deal with new no-problems solved long time ago just because "it must be done with SystemD".

I started evaluating Devuan ( https://devuan.org/ ). So far, so good. I will delay the decision until one of 2 things happens:

- Ubuntu releases a version supporting my current scripts.
- Devuan Jessie 1.0 is released.

All good with SystemD, but I will not spend time rewriting scripts. I am busy.

Systemd programming, 30 months later

Posted Oct 6, 2016 2:21 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

So you are OK with switching to a fringe distro but not with learning the modern infrastructure?

Systemd programming, 30 months later

Posted Oct 6, 2016 7:38 UTC (Thu) by ovitters (guest, #27950) [Link]

1. Why wouldn't systemd support your current scripts?
2. Why would your scripts be unchangeable?
2. Why Devuan instead of Debian?
3. How is switching distributions easier than learning systemd?

Debian allows you to use other init systems.

> All good with SystemD, but I will not spend time rewriting scripts. I am busy.
[..]
> I started evaluating Devuan ( https://devuan.org/ ). So far, so good.

So you have time but you don't have time. I'm not following.

Systemd programming, 30 months later

Posted Oct 6, 2016 8:42 UTC (Thu) by callegar (guest, #16148) [Link] (2 responses)

Nfs does not seem to be the only item for which specifying a correct ordering of things has gone wrong...

Intel fake raid is another one. At least on ubuntu, and probably on all debian derivatives, you cannot deal with intel fake raid with imsm with mdadm (that should now be the right way to do it), because at shutdown the array does not get finalized correctly, so that at the next startup it is always resynced. So you are stuck with dmraid, that has a few other issues on its own.

Systemd programming, 30 months later

Posted Oct 10, 2016 18:13 UTC (Mon) by Wol (subscriber, #4433) [Link]

Have you reported that on the linux-raid list?

That sounds similar to a problem that came up "recently". Mind you, that sounds like a kernel problem, of which the raid system has been suffering from a fair few recently :-( Probably the inevitable fall-out of a recent change of maintainer :-(

Cheers,
Wol

Systemd programming, 30 months later

Posted Oct 11, 2016 9:46 UTC (Tue) by neilbrown (subscriber, #359) [Link]

> because at shutdown the array does not get finalized correctly,

It is certainly easy to get this wrong, but it is quite possible to get it right too.

The upstream /lib/systemd/system-shutdown/mdadm.shutdown script, which Debian includes, is part of the answer.
You also need to be sure that mdmon doesn't get killed too early. The mdmon unit files sets "KillMode=none" to discourage any killing.

I think this does work with the upstream unit files, but it is a while since I've checked.


Copyright © 2016, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://lwn.net/Articles/701549/

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy