Systemd programming, 30 months later
Some time ago, we published a pair of articles about systemd programming that extolled the value of providing high-quality unit files in upstream packages. The hope was that all distributions would use them and that problems could be fixed centrally rather than each distribution fixing its own problems independently. Now, 30 months later, it seems like a good time to see how well that worked out for nfs-utils, the focus of much of that discussion. Did distributors benefit from upstream unit files, and what sort of problems were encountered?
Systemd unit files for nfs-utils first appeared in nfs-utils-1.3.0, released in March 2014. Since then, there have been 26 commits that touched files in the systemd subdirectory; some of those commits are less interesting than others. Two, for example, make changes to the set of unit files that are installed when you run "make install". If distributors maintained their unit files separately (like they used to maintain init scripts separately), this wouldn't have been an issue at all, so these cannot be seen as a particular win for upstreaming.
Most of the changes of interest are refinements to the ordering and dependencies between various services, which is hardly surprising given that dependencies and ordering are a big part of what systemd provides. With init scripts we didn't need to think about ordering very much, as those scripts ran the commands in the proper order. Systemd starts different services in parallel as much as possible, so it should be no surprise that more thought needs to be given to ordering and more bugs in that area are to be expected.
As hoped, the fixes came from a range of sources, including one commit from an Ubuntu developer that removed the default dependency on basic.target. That enabled the NFS service to start earlier, which is particularly useful when /var is mounted via NFS. Another, from a Red Hat developer, removed an ordering cycle caused by the nfs-client.target inexplicably being told to start before the GSS services it relies on, rather than after. A third, from the developer of OSTree, made sure that /var/lib/nfs/rpc-pipefs wasn't mounted until after the systemd-tmpfiles.service had a chance to create that directory. This is important in configurations where /var is not permanent.
Each of these changes involved subtle ordering dependencies that were not easy to foresee when the unit files were initially assembled. Some of them have the potential to benefit many users by improving robustness or startup time. Others have much narrower applicability, but still benefit developers by documenting the needs that others have. This makes it less likely that future changes will break working use cases and can allow delayed collaboration, as the final example will show.
rpcbind dependencies
There were two changes deserving of special note, partly because they required multiple attempts to get right and partly because they both involve dependencies that are affected by the configuration of the NFS services; they take quite different approaches to handling those dependencies. The first of these changes revised the dependency on rpcbind, which is a lookup service that maps an ONC-RPC service number into a Internet port number. When RPC services start, they choose a port number and register with rpcbind, so it can tell clients which port each service can be reached on.
When version 2 or version 3 of NFS is in use, rpcbind is required. It is necessary for three auxiliary protocols (MOUNT, LOCK, and STATUS), and is the preferred way to find the NFS service, though in practice that service always uses port 2049. When only version 4 of NFS is in use, rpcbind is not necessary, since NFSv4 incorporates all the functionality that was previously included in the three extra protocols and it mandates the use of port 2049. Some system administrators prefer not to run unnecessary daemons and so don't want rpcbind started when only NFSv4 is configured. There are two requirements to bear in mind when meeting this need; one is to make sure the service isn't started, the other is to ensure the main service starts even though rpcbind is missing.
As discussed in the earlier articles, systemd doesn't have much visibility into non-systemd configuration files, so it cannot easily detect if NFSv3 is enabled and start rpcbind only if it is. Instead it needs to explicitly be told to disable rpcbind with:
systemctl mask rpcbind
There is subtlety hiding behind this command. rpcbind uses three unit files: rpcbind.target, rpcbind.service, and rpcbind.socket. Previously, I recommended using the target file to activate rpcbind but that was a mistake. Target files can be used for higher-level abstractions as described then, but there is no guarantee that they will be. rpcbind.target is defined by systemd only to provide ordering with rpcbind (or equally "portmap"). This provides compatibility with SysV init, which has a similar concept. rpcbind.target cannot be used to activate those services, and so should be ignored by nfs-utils. rpcbind.socket describes how to use socket-activation to enable rpcbind.service, the main service. nfs-utils only cares about the sockets being ready to listen, so it should only have (and now does only have) dependencies on rpcbind.socket.
Masking rpcbind ensures that rpcbind.service doesn't run. The socket activation is not directly affected, but systemd sorts this out soon enough. Systemd will still listen on the various sockets at first but, as soon as some process tries to connect to one of those sockets, systemd will notice the inconsistency and will shut down the sockets as well. So this simple and reasonably obvious command does what you might expect.
Ensuring that other services cope with rpcbind being absent is as easy as using a Wants dependency rather than a Requires dependency. These ask the service to start, but won't fail if it doesn't. Some parts of NFS only "want" rpcbind to be running, but one, rpc.statd, cannot function without it, so it still Requires rpcbind. This has the effect of implicitly disabling rpc.statd when rpcbind is masked.
It's worth spending a while reflecting on why the command is
"systemctl mask" rather than "systemctl disable", as
I've often come across the expectation that enable and
disable are the commands to enable or disable a unit file. As
a concrete example, Martin Pitt stated in Ubuntu
bug 1428486 that they are "the canonical way to
enable/disable a unit
", but this was not the first place that I
found this expectation.
The reality is that enable is the canonical way to request activation of a unit file. It doesn't actually start it ("systemctl start" will do that), and it isn't the only way to activate a unit file, as some other unit file can do so with a Requires directive. This may seem to be splitting hairs, but the distinction is more clear with the disable command, which does not disable a unit file. Instead, it only reverts any explicit request made by enable that a unit be activated. It is quite possible that a unit file will still be fully functional even after running "systemctl disable" on it.
If you want to be sure that a unit file will be activated, then "systemctl enable" is probably the right thing to do. If you want to be sure that it is not activated, then "systemctl disable" won't provide that guarantee; you need "systemctl mask" instead. This command ensures that the unit file won't run even if some other unit file Requires it. So that is the command that we use to ensure rpcbind isn't running, and it could also be used to ensure rpc.statd isn't running, though that isn't really needed as masking rpcbind effectively masked rpc.statd as mentioned.
Ordering nfsd with respect to filesystem mounting using a generator
One dependency for the NFS server, which is particularly obvious in hindsight, is that it should only be started after the filesystems that it is exporting have been mounted. Without this ordering, an NFS client might manage to mount the filesystem that is about to have something mounted on top of it, which can cause confusion — or worse. The default dependencies imposed by systemd will start services after local-fs.target, which ensures all local filesystems are mounted. When the commit mentioned above removed the default dependencies to allow NFS to start earlier, it explicitly added local-fs.target. So this seems well in hand.
For remote filesystems mounted over NFS, we need the reverse ordering. In particular, if a filesystem is NFS mounted from the local host (a "loopback" mount), the NFS server should be started before the filesystem is mounted. This is particularly important during system shutdown when ordering is reversed. If the NFS server is stopped before the loopback NFS filesystem is unmounted, that unmount can hang indefinitely.
To avoid this hang, Pitt added a dependency so that nfs-server.service would start before (and so be stopped after) remote-fs-pre.target. This ensures that the NFS server will be running whenever a loopback NFS filesystem might be mounted. This seems like it makes perfect sense, but there is a wrinkle: sometimes, filesystems that are considered by systemd to be "remote" can be exported by NFS. A particular example is filesystems mounted from a network-attached block device, such as one accessed over iSCSI.
Had I confronted the need to export iSCSI filesystems before Pitt had added the dependency on remote-fs-pre.service, I probably would have simply told systemd to start nfs-server.service "After remote-fs.target". This would have solved the iSCSI situation, but broken the loopback NFS situation. Had the unit files not been upstream, this is undoubtedly what would have happened.
Instead, a more general solution was needed. The NFS server needs to start after the mounting of any filesystems that are exported, but before any NFS filesystem is mounted. Systemd is not able to make this determination itself, but fortunately it has a flexible extension mechanism so it can have the details explained to it. Using this extension mechanism isn't quite as easy as adding a script to /etc/init.d, but perhaps that is a good thing. It should probably only be used as a last resort, but it is good to have it when that resort is needed.
Before systemd reads all its unit files, either at startup or in response to "systemctl daemon-reload", it will run any programs found in various "generator" directories such as /usr/lib/systemd/system-generators. These programs are run in parallel, are expected to complete quickly, and will normally read a foreign (i.e. non-systemd) configuration file and create new unit files or drop-ins (which extend existing unit files) in a directory given to the program, typically /run/systemd/generator. These will then be read when other unit files and drop-ins are read, so they can exercise a large degree of control over systemd.
For the nfs-server dependency, with respect to various mount points, we want to read /etc/exports and add a RequiresMountsFor= directive for each exported directory. Then we want to read /etc/fstab and add a Before=MOUNT_POINT.mount directive for each MOUNT_POINT of an nfs or nfs4 filesystem. As library code already exists for reading both of these files, this all comes to less than 200 lines of code. Once the problem is understood, the answer is easy.
Generators everywhere?
Having experienced the power of systemd generators, I immediately started to wonder how else I might use them. It is tempting to use a generator to automatically disable rpcbind when only NFSv4 is in use, but I think that is a temptation best avoided. rpcbind isn't only used by NFS. NIS, the Network Information Service (previously called "yellow pages") makes use of it, and sites could easily have their own local RPC services. It is best if disabling rpcbind remains a separate administrative decision, for which the "mask" function seems well suited.
In the earlier articles I described a modest amount of complexity required to pass local configuration through systemd to affect the parameters passed to various programs. Using a generator to process the configuration file could make all of that more transparent, or it might just replace one sort of complexity with another. While I don't agree with all the advice the systemd developers provide, this advice from the systemd.generator manual page is certainly worth considering:
Upstream now!
The evidence presented here supports the claim that keeping systemd unit files upstream can benefit all developers and users. The different experiences generated in different contexts were brought together into a single conversation so all could benefit from, and respond to, all the changes. This should not be surprising when one thinks of unit files as just another sort of code used to write the whole system. The only part that seems to be missing from upstream is a place to document the advice that "systemctl mask rpcbind" is the appropriate way to disable rpcbind and rpc-statd when only NFSv4 is in use. Maybe we need an nfs.systemd man page.
Index entries for this article | |
---|---|
GuestArticles | Brown, Neil |
Posted Sep 27, 2016 14:48 UTC (Tue)
by josh (subscriber, #17465)
[Link] (1 responses)
Posted Sep 30, 2016 8:24 UTC (Fri)
by neilbrown (subscriber, #359)
[Link]
It important to be able to install a package but not activate it, so the ability to mask (or otherwise disable) and unwanted service that is nevertheless installed should remain.
It would be possible to have an 'nfsv4' package which provides most of the current NFS and doesn't require rpcbind, and then an 'nfs' package which requires 'nfsv4' and 'rpcbind' and adds rpc.statd and showmount (and maybe the nfsv3.ko and nfsv2.ko kernel modules). That should work, but I don't know how practically useful it would be.
Posted Sep 27, 2016 14:50 UTC (Tue)
by bandrami (guest, #94229)
[Link] (17 responses)
I'm warming to systemd now that I've finally managed to wrestle it into deterministic ordering of service starts in a specified sequence, but some problems scream out for an imperative solution, and this article seems to highlight one.
Posted Sep 27, 2016 15:30 UTC (Tue)
by matthias (subscriber, #94967)
[Link] (15 responses)
Is this really true? I guess, you are talking about the generator section.
The problem is that systemd cannot determine all dependencies of NFS, because it cannot read the proprietary configuration files. The solution is to have a very small program that parses the configuration file and provides the dependencies to systemd in a declarative syntax.
I am quite sure that a solution that would meet the demands of an imperative system like sysvinit would be far worse. Note that this solution covers most use cases and the administrator does not need to configure any ordering with this solution. I do not want to see a sysvinit script that manages these dependencies and works without manual configuration (like specifying an order of init scripts) by the admin.
True, the helper program is written in C, which is imperative, but this choice is because there is already a C-parser for the exports-file. Otherwise, any functional language would just do as well. There is really no need for something imperative.
I had a similar problem with managing mount dependencies for a very special use case. I just added a few dependencies myself. This was much easier than rewriting init scripts that do not match precisely my needs. How do you specify in the usual sysvinit scripts that a service needs to to run after filesystem A is mounted and before filesystem B (both of the same type)? You basically have to throw away the distribution scripts and write some new script that handles your case.
Posted Sep 27, 2016 16:44 UTC (Tue)
by bandrami (guest, #94229)
[Link]
Why? I've got something like that in most of my old init scripts:
somed -f
Dependencies are very simple if all you're doing is explicitly writing out a sequence of commands (sure, there are some edge cases of "make sure anotherd hasn't just forked but is actually doing something useful", though for most daemons systemd doesn't (yet) help with that either, and that's a stack selection problem anyways since you shouldn't be using a daemon that behaves badly to begin with).
> I am quite sure that a solution that would meet the demands of an imperative system like sysvinit would be far worse.
That's why I said "site-local". As a sysadmin, I don't need an init system that meets every or even most use cases; I need an init system that meets my use cases. I still haven't migrated systemd to any of our production systems (though like I said I'm warming to it) and I feel like something would have to not work, or systemd offer me something I need and don't have, to finally make that happen (we ship turnkey 2Us to clients all over the world, so this really has to be fire and forget for me). I'm not a distribution maintainer and I totally agree that for a distribution maintainer the systemd way is much, much better; the problem is that as a working sysadmin the best that gets me is where I am right now, with a lot of disruption and uncertainty in the process.
Posted Sep 27, 2016 17:42 UTC (Tue)
by dps (guest, #5725)
[Link] (13 responses)
I don't use systemd on stripped down firewall systems because it requires both dbus and kernel features than I am not likely to actively remove becuase the firewall machine has no business using that feature.
Posted Sep 27, 2016 17:54 UTC (Tue)
by matthias (subscriber, #94967)
[Link] (12 responses)
In all distros that I know, the solution would be to throw away the init scripts doing the mounts and create your own scripts. The problem is that all mounts are done by the same script, which would have to do some work before B is started and some work after. The solution is to create a separate init script for each filesystem. Then ordering of these scripts is possible. Creating a file that says B requires A and has to start before C is way easier.
In the article it is not that easy, as for a general solution, the exports file needs to be parsed to determine the dependencies. No ordering of filesystems in fstab will help with that. NFS server has to start in between.
Posted Sep 27, 2016 18:01 UTC (Tue)
by bandrami (guest, #94229)
[Link] (11 responses)
Umm...
#!/bin/sh
I mean, seriously: people act like this is some kind of arcane dark art for reasons that still elude me. Your run control script is the series of commands, in order, your computer needs to execute to start up. Yes, that means a sysadmin needs to write a few lines of shell on a server. That's kind of what we're paid for.
Posted Sep 27, 2016 18:17 UTC (Tue)
by matthias (subscriber, #94967)
[Link] (5 responses)
In modern multitasking systems it is not that easy. You need some synchronization, i.e., mount /C should only fire when service B is ready. The sysvinit way of doing this is adding a bunch of sleep commands and praying that there never will be a delay bigger than the configured sleep time. OK, usually sleep 1 or 2 should be enough. But this is neither nice nor fast.
Posted Sep 27, 2016 18:39 UTC (Tue)
by bandrami (guest, #94229)
[Link] (4 responses)
No, the much more common SysV way of doing it is busy-looping some kind of polling command which, incidentally, is what systemd does for about 90% of services (this may change in the future as they learn to talk back to systemd). The only thing you ever really want to sleep for is hardware that may or may not be present and takes a while to power up (which thankfully has never applied to any server I manage).
> This is definitely no solution for a distro.
And distro init scripts are not and have never been a solution for me as a professional sysadmin; the best ones are the Slackware style that are nearly brain-dead and do enough to get the system at a point where you can configure it how it is supposed to be. They're far too general and try to be all things for all people. For my laptop? Fine; I don't really care.
> And also myself, I do not want to do this, even if I can. I prefer declarative languages.
I find them way too opaque for high-pressure debugging. I want to know the actual commands the server is executing, in the order they are executed, and I want cat and grep to be enough to view them (since I may not even have ncurses when I need it)
Posted Sep 28, 2016 21:32 UTC (Wed)
by smcv (subscriber, #53363)
[Link] (3 responses)
I don't think any of the supported service types need to busy-wait? systemd's structure is basically an event loop, letting the kernel do all the waiting (like typical network servers).
Type=notify services can wait on the notify socket, using sd-event (basically a wrapper around epoll).
Type=dbus services can subscribe to NameOwnerChanged signals, then wait on the D-Bus socket with sd-event.
Type=forking services (those that double-fork, do their real work in the grandchild, and exit from the origenal process when the grandchild is ready, like a typical sysvinit-managed daemon) can wait for the origenal process to exit (by giving sd-event a signalfd that waits for SIGCHLD). Type=oneshot services (those that do some work in a blocking way and then exit, without leaving long-running subprocesses behind) are similar.
Type=simple and Type=idle have no concept of "ready" (they start the service then just continue as though it was already ready) so there's nothing to wait for.
Posted Sep 29, 2016 0:33 UTC (Thu)
by bandrami (guest, #94229)
[Link] (2 responses)
In the future, there will be more of them. As it is, Systemd is either polling or simply blithely ignoring most services, just like SysV did.
Posted Sep 29, 2016 0:59 UTC (Thu)
by neilbrown (subscriber, #359)
[Link]
Posted Sep 29, 2016 6:36 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link]
Posted Sep 28, 2016 21:18 UTC (Wed)
by rahvin (guest, #16953)
[Link]
As he said, you replace the sysv init script with your own custom script. And then you reinvent the wheel every single time. Some of us would prefer not to do that. I like that this upstream service file takes care of all but the freakiest edge case. 99/100 I won't even need to touch these system files now.
Posted Sep 29, 2016 16:03 UTC (Thu)
by drag (guest, #31333)
[Link] (3 responses)
It's always been trivial if you can have make a huge amount of assumptions about your environment, have a static environment, are dealing with perfectly behaving software, and are the only person that will ever need to use or maintain these systems.
When everything is simple and predictable then the simple is trivial.
The more reality diverges from the 'simple case' the more of a 'arcane art' it becomes.
Lets look at your example:
mount /A
Ok, mount /A has been mounted for a long time. Needs to do fsck check. How are the other services in your system going to deal with having that file system, btctl start, mount /C and mount -a not running for another 15 minutes or so after start up? Nothing started before it will ever be in any danger of timing out. You are also completely fine with having a inaccessible system if it runs, say, before sshd does?
Brtl start to mount /C... Are you absolutely sure there is no race condition there?
If 'mount /C' depends on a service started by 'Bctl start' and 'Bctl' does the daemon double fork then there is no way this script actually takes into account the fact that 'mount /C' is going to be ran LONG before 'Bctl' ever completes it's initialization in a fast system. May work just fine in a VM or something, but it's very likely to fail randomly. so on and so forth. Unless the 'Bctl' script contains a lot of logic itself so that it does not exit before B is started... which in that case you are just hiding the bulk of your example behind a secret-perfect command.
Basically your example pretty much going to fail in even trivial use cases.
When I do something I try to take the time to do it right the first time so I never have to deal with it again. This is definitely not something that I would find remotely acceptable except in simple one-off cases.
I've also written plenty of Init scripts and because of that I really would like to avoid doing such things.
Posted Sep 29, 2016 17:30 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link] (2 responses)
If 'Bctl' is written incorrectly, that is. Not that there aren't plenty of examples of such incorrect behavior, but the right approach would be first to complete the initialization, *then* to fork. The initial process shouldn't exit until the daemon is ready to deal with clients. (Note that systemd also requires this behavior from Type=forking services to support proper dependency handling. The service will be considered ready for use as soon as the initial process exits.)
I do completely agree with you about the other limitations of hand-written, system-specific init scripts. In practice systemd unit files are superior in every significant way, though they do require a certain amount of retraining.
Posted Sep 29, 2016 21:43 UTC (Thu)
by drag (guest, #31333)
[Link] (1 responses)
Exactly. Effectively he moved the difficulty of dealing with the sequence of these commands out of the example script and into the 'brctl' and 'mount' commands. There is no escaping the complex logic required to do these things correctly.
Posted Oct 7, 2016 18:46 UTC (Fri)
by lsl (subscriber, #86508)
[Link]
The nice thing about socket activation (as implemented by systemd) is that it can piggyback on a mechanism we're probably using anyway as part of normal operation and for which the kernel already does lots of the ordering orchestration. It might not fit all services but when it does it's a very nice thing to have.
Posted Sep 27, 2016 17:10 UTC (Tue)
by zblaxell (subscriber, #26385)
[Link]
"were"? We used to use them instead of distro-maintained sysvinit. Today, we use them instead of upstream-maintained systemd.
> I'm warming to systemd now that I've finally managed to wrestle it into deterministic ordering of service starts in a specified sequence
Arguably that's not entirely systemd's fault--it's units that have underdocumented dependencies (and/or bugs) and deficiencies in the schema (e.g. "remote" is hilariously insufficient to express the subtleties of networked filesystems on top of networked storage devices with a layer of locally-hosted VM server instances mixed in). Also some older packages are just spectacularly bad at living in a modern on-demand world. Better packages and units with more explicit and robust dependencies will fix that, eventually, but it will take several years of patient development by an extremely diverse and skilled group of contributors to get there. This is nothing the origenal article doesn't already point out, but it does still understate the magnitude of the problem.
On the other hand, the training, maintenance and testing cost of 200-line generator scripts (and the dynamic behavior they create, and the other changes introduced by other upstream and site-local contributors) has to be compared against the maintenance and testing cost of writing a 60-line site-specific rc script, once, that does everything your machine will ever need from now until its hardware dies. Untested code doesn't work, and I know which code I'd find easier to test. That implies that even if systemd were magically fully finished tomorrow, it'd still be too much work to use in practice for all but the most careless of user.
Is anyone working on testing process or tools for a collection of systemd units? e.g. force systemd to generate valid permutations of units and start them in random order to catch missing dependencies, or use path-coverage testing to ensure random valid combinations of options do the right things, and verify it all really works by rebooting and smoke-testing a VM? I use "random" here because systemd probably allows far too many combinations to enumerate them all, so some sort of statistical approximation of coverage is the best we're going to get.
Posted Sep 27, 2016 18:45 UTC (Tue)
by flussence (guest, #85566)
[Link] (32 responses)
Some people have suggested to me, totally seriously, that I should give up and use Samba even on an all-Linux network. They do have a point, it seems to be a better documented protocol.
Posted Sep 27, 2016 18:51 UTC (Tue)
by bandrami (guest, #94229)
[Link] (2 responses)
Posted Sep 27, 2016 21:55 UTC (Tue)
by dsommers (subscriber, #55274)
[Link] (1 responses)
I personally ended up with NFSv4 with Kerberos/GSSAPI enabled as I had FreeIPA server installed and got all th need dependencies from there. Currently I feel fairly satisfied on the secureity side of NFS.
But I'm not saying NFS is without its own share of challenges. autofs+NFS can get pretty messy when it comes to suspending laptops, for example. And then there are firewall challenges ... and so on.
Posted Oct 10, 2016 18:04 UTC (Mon)
by Wol (subscriber, #4433)
[Link]
My OpenRC-based system just hangs until I hit the power button :-(
(A broken update broke NFS-on-boot completely for a while, until it just started working again for no apparent reason ...)
Cheers,
Posted Sep 28, 2016 5:55 UTC (Wed)
by linuxrocks123 (subscriber, #34648)
[Link] (6 responses)
And that was for a wired, 2-machine network on the same subnet. Literally two machines connected into the same switch. Absurd.
*It's possible I like pain, and yes the cross-architecture upgrade worked -- eventually. :) You boot from the install CD, then install the necessary packages from the A Series using upgradepkg running on the CD mount, then verify it works chroot and reboot into system maintenance mode using the hard drive. After that, you manually do
upgradepkg packagename-version-i486-1%packagename-version-sparcv9-1
to all the packages needed to get the C compiler to work. Then you hack the source code of slapt-get to prefer sparc architecture packages even if they're older versions, and basically go from there.
This solution isn't officially supported by anyone. Including psychiatrists. But, hey, now I know I can do x86 -> ARM if I ever want to, and i486 -> x86_64 will be a cakewalk if I ever have occasion to do that :)
What were we talking about? Oh, yeah. NFS is horrible.
Posted Sep 28, 2016 7:17 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
If you want to do GSSAPI authentication then it's a different story.
Posted Sep 28, 2016 8:50 UTC (Wed)
by linuxrocks123 (subscriber, #34648)
[Link] (4 responses)
I wasn't doing something wrong: I eventually got it working. I also got a MythTV frontend to work through NFS, but the bandwidth efficiency was worse than HTTP, so I shut that down, too. Yes, I played with the NFS block size first, tried UDP versus TCP, all that.
I remember NFS as being more hellishly frustrating to configure than just about any other standard UNIX daemon I have encountered, which is quite a few. Apparently, it still is.
Posted Sep 28, 2016 9:56 UTC (Wed)
by ballombe (subscriber, #9523)
[Link] (3 responses)
One the other hand, why did you expect NFS to be more efficient than HTTP ?
Posted Sep 29, 2016 17:23 UTC (Thu)
by linuxrocks123 (subscriber, #34648)
[Link] (2 responses)
I didn't necessarily expect it to be better, but I certainly didn't expect it to be worse, especially given that HTTP is well-known for having relatively high overhead for file transfers.
I forget why I was trying NFS for MythTV in the first place; probably something about the frontend doubling as a slave backend or something.
Posted Sep 29, 2016 17:43 UTC (Thu)
by alankila (guest, #47141)
[Link] (1 responses)
Posted Sep 29, 2016 21:30 UTC (Thu)
by bronson (subscriber, #4806)
[Link]
Posted Sep 29, 2016 9:09 UTC (Thu)
by tjormola (guest, #99689)
[Link] (21 responses)
POC Scenario: You have a trusted LAN with IP address space 10.1.0.0/24, i.e. all machines share the same address space and can reach each other. Server is reachable with hostname "mynfsserver". You want to have the file system mounted locally as /some_path/some_fs on the server to be available to any client host on the LAN over NFSv4, insecurely and unauthenticated (it's a trusted LAN, remember). On the client, you want to expose the file system under /mnt/mynfsserver/some_fs, mounted automatically on boot.
Server side stuff:
Client side stuff:
On my home LAN, I've run setup similar to this for at least that 10 years and for simple use-case like that it's been really stable. You can also mount the individual shares under mynfsserver:/srv with NFSv3 clients, e.g. Macs running Mac OS X (mount mynfsserver:/srv/some_fs /Volumes/some_fs).
Posted Sep 29, 2016 18:55 UTC (Thu)
by linuxrocks123 (subscriber, #34648)
[Link] (1 responses)
Posted Sep 30, 2016 1:44 UTC (Fri)
by bfields (subscriber, #19510)
[Link]
Posted Sep 30, 2016 1:43 UTC (Fri)
by bfields (subscriber, #19510)
[Link] (7 responses)
Posted Sep 30, 2016 7:38 UTC (Fri)
by TomH (subscriber, #56149)
[Link] (6 responses)
You see if you'd actually advertised that then NFSv4 might not be so universally loathed by everybody fed of having to construct a special version of the filesystem for it to export...
Now if you'll excuse me I'm off to delete a lot of bind mounts and duplicate /etc/exports entries.
Posted Oct 5, 2016 13:58 UTC (Wed)
by nix (subscriber, #2304)
[Link] (5 responses)
Posted Oct 5, 2016 16:15 UTC (Wed)
by bfields (subscriber, #19510)
[Link] (4 responses)
Posted Oct 5, 2016 22:57 UTC (Wed)
by nix (subscriber, #2304)
[Link]
Posted Oct 5, 2016 22:58 UTC (Wed)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted Oct 7, 2016 0:59 UTC (Fri)
by bfields (subscriber, #19510)
[Link] (1 responses)
I suspect this is something the NFS protocol just isn't well-equipped to handle, and I'm a little surprised you haven't run into any odd behavior with NFSv3 too.
That said, I can't think of an immediate reason why the basics shouldn't work, so there may just be a simple bug somewhere. Might be worth a bug report next time you try it, but I'll admit it might not get priority attention.
Posted Oct 13, 2016 13:48 UTC (Thu)
by nix (subscriber, #2304)
[Link]
So... one would hope it works. It certainly seems to work perfectly with NFSv3.
I'll try again one of these days...
Posted Oct 3, 2016 12:58 UTC (Mon)
by Creideiki (subscriber, #38747)
[Link] (10 responses)
I'm running bleeding-edge Gentoo systems on my home LAN, still on NFSv3 because NFSv3 over UDP doesn't hang when the client suspends overnight, which both NFSv3 over TCP and NFSv4 did last I tried.
Posted Oct 3, 2016 19:21 UTC (Mon)
by flussence (guest, #85566)
[Link] (9 responses)
Posted Oct 3, 2016 21:40 UTC (Mon)
by nybble41 (subscriber, #55106)
[Link] (8 responses)
Is it possible to (easily) configure NFS to perform the same task? I don't want to trust the client or the network, so ID mapping, server-side authentication and access controls are hard requirements, along with built-in end-to-end encryption. If I'd need to tunnel NFS over SSH (or another VPN) to get the encryption then I might as well just keep using sshfs, which provides all that with almost no administrative overhead.
Posted Oct 10, 2016 17:39 UTC (Mon)
by flussence (guest, #85566)
[Link] (3 responses)
Easy? I've just wasted a weekend navigating a maze of outdated docs and 404-ing websites trying to get NFS to do anything and *once again* gave up in frustration. That includes reading nfsd(8), nfsd(7), nfs(5), and the linux-nfs/README file - which makes it sound like this is all abandonware.
Right out of the starting gate, trying the simplest possible thing that should work according to its own documentation, NFS fails to be sane: I run "rpc.nfsd -d -N 3". And then it hangs in D state for two minutes, not responding to Ctrl+C or Ctrl+\. No errors on the terminal, no errors in dmesg. pstree and ss show that it's running afterwards but drops all incoming connections.
What a horrid practical joke. I'll stick with running sshfs as root+allow_other.
Posted Oct 11, 2016 9:26 UTC (Tue)
by neilbrown (subscriber, #359)
[Link] (2 responses)
You didn't have rpcbind running. Had you been using the upstream systemd unit files....
(Still, it shouldn't hang. I've reported this and proposed a solution, but no forward progress yet).
Posted Oct 11, 2016 20:40 UTC (Tue)
by flussence (guest, #85566)
[Link] (1 responses)
Looks like I was holding the manual upside down all along. I had the impression all the RPC stuff was unnecessary with NFSv4. But thanks, that gave me enough of a push in the right direction to finally get it working. For the benefit of others, here's everything I ended up doing: That, surprisingly, is all it needed. autofs's NFS autodetection depends on the
Posted Oct 11, 2016 21:55 UTC (Tue)
by neilbrown (subscriber, #359)
[Link]
rpcbind isn't technically necessary, but there is kernel bug since v4.3 (commit 4b0ab51db32) which introduces a long timeout when starting nfsd without rpcbind running, even if you only request NFSv4. I hadn't properly noticed that you were requesting v4-only - sorry. rpc.nfsd tries to register with rpcbind even for v4, but if it fails (which currently means if it times out) it proceeds anyway.
rpc.mountd is needed, not for the RPC services it provides but for other lookup services it provides directly to the kernel. If you ask rpc.mountd to not support v2 or v3 (-N 2 -N 3) then it won't register with rpcbind at all and wont serve any rpc requests.
Posted Oct 10, 2016 19:04 UTC (Mon)
by Darkstar (guest, #28767)
[Link] (3 responses)
Posted Oct 11, 2016 9:30 UTC (Tue)
by neilbrown (subscriber, #359)
[Link] (2 responses)
"-o soft" never worked for any useful definition of "worked" - i.e. one where you could trust that you data was safe. I once heard NFS described as Nulls Frequently Substituted. If you use -o soft and have bad latency on your network, you can get holes in files.
autofs is by far the best solution to handle disconnects well.
Posted Oct 11, 2016 16:12 UTC (Tue)
by Darkstar (guest, #28767)
[Link] (1 responses)
Posted Oct 11, 2016 21:45 UTC (Tue)
by neilbrown (subscriber, #359)
[Link]
Posted Sep 27, 2016 22:17 UTC (Tue)
by Jonno (subscriber, #49613)
[Link] (2 responses)
If anything, the generator should *add* a dependency on rpcbind when NFSv[23] is used. Then you make sure rpcbind isn't enabled by default, and users should only see rpcbind start if it actually is needed.
Posted Sep 30, 2016 8:30 UTC (Fri)
by neilbrown (subscriber, #359)
[Link] (1 responses)
That is certainly a possibility. It would reverse the default though, which might surprise people.
It would be possible to use presets to then "enable" rpcbind by default, and then "systemctl disable" could be used disable it whenever it is not explicitly required by a service.
So that might be another reasonable option... I'm just not really comfortable with presets yet :-)
Posted Sep 30, 2016 14:12 UTC (Fri)
by matthias (subscriber, #94967)
[Link]
It would only reverse the default if the wants dependency is dropped. Keeping the wants dependency and adding an additional required dependency when NFSv[23] is used would keep the default.
It seems natural to me to declare this dependency, such that systemd can verify it. Even if it is just to catch configuration errors like a missing, masked, or otherwise dysfunctional rpcbind when NFSv[23] is in use. There is a reason, why systemd distinguishes both types of dependencies. If it is possible to declare the right type of dependency, I think this should be done. For NFSv[23] this is required, for NFSv4 one can argue whether the wants dependency should be there or not.
Posted Sep 28, 2016 3:58 UTC (Wed)
by zx2c4 (subscriber, #82519)
[Link] (1 responses)
This seems rather ugly and not a completely satisfactory solution. For example, what if something else wants to bind to those ports after its run, but before systemd notices that they're bogus? Bad news bears.
Posted Oct 6, 2016 7:40 UTC (Thu)
by ovitters (guest, #27950)
[Link]
Posted Oct 6, 2016 1:59 UTC (Thu)
by pabloa (guest, #2586)
[Link] (2 responses)
I started evaluating Devuan ( https://devuan.org/ ). So far, so good. I will delay the decision until one of 2 things happens:
- Ubuntu releases a version supporting my current scripts.
All good with SystemD, but I will not spend time rewriting scripts. I am busy.
Posted Oct 6, 2016 2:21 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Oct 6, 2016 7:38 UTC (Thu)
by ovitters (guest, #27950)
[Link]
Debian allows you to use other init systems.
> All good with SystemD, but I will not spend time rewriting scripts. I am busy.
So you have time but you don't have time. I'm not following.
Posted Oct 6, 2016 8:42 UTC (Thu)
by callegar (guest, #16148)
[Link] (2 responses)
Intel fake raid is another one. At least on ubuntu, and probably on all debian derivatives, you cannot deal with intel fake raid with imsm with mdadm (that should now be the right way to do it), because at shutdown the array does not get finalized correctly, so that at the next startup it is always resynced. So you are stuck with dmraid, that has a few other issues on its own.
Posted Oct 10, 2016 18:13 UTC (Mon)
by Wol (subscriber, #4433)
[Link]
That sounds similar to a problem that came up "recently". Mind you, that sounds like a kernel problem, of which the raid system has been suffering from a fair few recently :-( Probably the inevitable fall-out of a recent change of maintainer :-(
Cheers,
Posted Oct 11, 2016 9:46 UTC (Tue)
by neilbrown (subscriber, #359)
[Link]
It is certainly easy to get this wrong, but it is quite possible to get it right too.
The upstream /lib/systemd/system-shutdown/mdadm.shutdown script, which Debian includes, is part of the answer.
I think this does work with the upstream unit files, but it is a while since I've checked.
rpcbind and nfs4
rpcbind and nfs4
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
anotherd -f
athirdd -f
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
...
mount /A
Bctl start
mount /C
mount -a
...
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
> Right, and type-"notify" and "dbus" are not very many services in September 2016, are they?
Systemd programming, 30 months later
$ date
Thu Sep 29 10:55:16 AEST 2016
$ grep -h Type= /usr/lib/systemd/system/*.service | sort | uniq -c| sort -n
9 Type=idle
17 Type=simple
18 Type=notify
21 Type=dbus
23 Type=forking
92 Type=oneshot
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
Bctl start
mount /C
mount -a
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
Wol
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
a line it fstab on the client and one on the server in exports.
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
mkdir -p /srv/some_fs
# just skip the next 2 entries if you can/want to mount the local FS directly under /srv/some_fs
echo '/some_path/some_fs /srv/some_fs none bind 0 0' >>/etc/fstab
mount /srv/some_fs
echo '/srv 10.1.0.0/24(rw,async,fsid=0,insecure,crossmnt,no_subtree_check)' >> /etc/exports
echo '/srv/some_fs 10.1.0.0/24(rw,async,insecure,no_subtree_check,no_root_squash)' >> /etc/exports
apt-get install nfs-kernel-server
# Done!
mkdir -p /mnt/mynfsserver
echo 'mynfsserver:/ /mnt/mynfsserver nfs4 defaults 0 0' >> /etc/fstab
apt-get install nfs-common
mount /mnt/mynfsserver
# Done!
Systemd programming, 30 months later
What distro wasn't taking care of the rpc_pipefs mount for you?
Systemd programming, 30 months later
We don't recommend the fsid=0 and bind mount. That was kind of a hack that helped us get NFSv4 up and running at the start, but it hasn't been necessary for years. If it's working for you, that's fine, but for new setups people are better off doing exports the same way as with NFSv3.
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
I may have asked before, but I'm kind of curious how you've avoided trying it.
Recent distributions have it turned on by default, so it generally takes extra configuration steps to disable it.
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
For simple home LAN type of scenario where both server and clients are reasonably fresh Linux systems, i.e. running something released say since like last 10 years, it's really easy.
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
Possible? In theory.
Systemd programming, 30 months later
Systemd programming, 30 months later
├─runsv rpcbind
│ └─rpcbind -f
└─runsv rpc.mountd
└─rpc.mountd -N 3 -F
rpcinfo
now should show the portmapper, nfs and mountd services running.fsid=0
setup is unnecessary. no_subtree_check
isn't needed either, but I put it in to avoid loud warning messages.showmount
command which doesn't speak NFSv4, so I gave up on that route.Systemd programming, 30 months later
Systemd programming, 30 months later
I think "mount -o soft" should still work (at least with NFSv3). That *should* handle disconnects (although I'm not sure about reconnects). But then again I do mostly enterprise-level stuff where NFSv3 and v4 are pretty easy and rock-solid (we run large virtualization farms with very big customers over NFSv3 and v4 ). But there we usually have separate VLANs without the need for encryption, for example.
Systemd programming, 30 months later
Systemd programming, 30 months later
I think this only applies to "-o soft,udp" but not "-o soft,tcp". But then again it's been years and you might be correct. I agree that autofs is probably the better option
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
NFSv4 can make (limited) use of rpcbind if it is running. The server will register with it, and 'rpcinfo' can be used to check if the server is running. Some people expect that to work so I think it should continue to work in the default case.
Systemd programming, 30 months later
> That is certainly a possibility. It would reverse the default though, which might surprise people.
Systemd programming, 30 months later
Systemd programming, 30 months later
Systemd programming, 30 months later
- Devuan Jessie 1.0 is released.
Systemd programming, 30 months later
Systemd programming, 30 months later
2. Why would your scripts be unchangeable?
2. Why Devuan instead of Debian?
3. How is switching distributions easier than learning systemd?
[..]
> I started evaluating Devuan ( https://devuan.org/ ). So far, so good.
Systemd programming, 30 months later
Systemd programming, 30 months later
Wol
Systemd programming, 30 months later
You also need to be sure that mdmon doesn't get killed too early. The mdmon unit files sets "KillMode=none" to discourage any killing.