|
|
Subscribe / Log in / New account

Another new ABI for fanotify

By Jonathan Corbet
November 11, 2009
"Fanotify" is a much-revised system for providing filesystem event notifications to user space, and, possibly, allowing user space to block open() operations on specific files. The intended use case is malware-scanning utilities, but there are others: hierarchical storage has been cited as one possibility. This code has had a long, hard path into the kernel for a couple of reasons: kernel developers are not big fans of malware scanning, and nailing down the user-space ABI has been challenging.

The first obstacle has been more-or-less overcome. Even developers who think that malware scanning is the worst sort of security snake oil can agree that having these utilities use a well-defined kernel interface is better than having them employ nasty tricks like hooking into the system call table. ABI difficulties can be harder to overcome, though. With the latest fanotify posting, developer Eric Paris may have resolved this issue for at least a portion of the fanotify functionality.

The new version does away with the novel interface using setsockopt() in favor of a couple of new system calls. The first of these is fanotify_init():

    int fanotify_init(unsigned int flags, unsigned int event_f_flags,
		      int priority);

This system call initializes the fanotify subsystem, returning a file descriptor which is used for further operations. There are two flags values implemented: FAN_NONBLOCK creates a nonblocking file descriptor, and FAN_CLOEXEC sets the close-on-exec flag. Currently, event_f_flags and priority are unused; they should be set to zero.

Management of notification events is then done with fanotify_mark():

    int fanotify_mark(int fanotify_fd, unsigned int flags,
		      int dfd, const char *pathname, u64 mask,
		      u64 ignored_mask);

This call is used to "mark" specific parts of the filesystem hierarchy, indicating an interest in events involving those files. fanotify_fd is the file descriptor returned by fanotify_init(). The flags parameter must be one of FAN_MARK_ADD or FAN_MARK_REMOVE, indicating whether this call adds new marks or removes existing ones; there are also a couple of flags to control following of symbolic links and the marking of directories (without their contents).

The file(s) to be marked are determined by dfd and pathname; these parameters work much like in any of the *at() system calls. If dfd is AT_FDCWD, the pathname is resolved using the current working directory. If, instead, dfd points to a directory, the pathname lookup starts at that directory. If pathname is null, though, then dfd is interpreted as the actual object to mark.

Finally, mask and ignored_mask control which events are reported. To generate a specific event, a file must have the appropriate flag set in mask and clear in ignored_mask. The flags are FAN_ACCESS (file access), FAN_MODIFY (a file is modified), FAN_CLOSE_WRITE (a writable file has been closed), FAN_CLOSE_NOWRITE (a read-only file has been closed), FAN_OPEN (a file has been opened), and FAN_EVENT_ON_CHILD (events on children of a directory). There is also a FAN_Q_OVERFLOW event for event queue overflows, but that is not currently implemented.

Once files have been marked, the application can simply read from the fanotify file descriptor to get events. The events look like:

    struct fanotify_event_metadata {
	__u32 event_len;
	__u32 vers;
	__s32 fd;
	__u64 mask;
    };

Here, event_len is the length of the structure, vers indicates which version of fanotify generated the structure, fd is an open file descriptor for the object being accessed, and mask describes what is actually happening.

There is one crucial component missing in these patches: there is no way for the fanotify user to react to these events. In particular, the ability to block an open() call, a core part of the malware-scanning process, is missing. That, presumably, is to be added in a future revision. Meanwhile, Eric has requested permission to put the notification code into linux-next, presumably with a 2.6.33 merge in mind. As of this writing, objections have not been forthcoming.

Index entries for this article
Kernelfanotify


to post comments

Another new ABI for fanotify

Posted Nov 12, 2009 3:57 UTC (Thu) by mezcalero (subscriber, #45103) [Link] (1 responses)

I hope fanotify fixes one things that is really missing in the inotify API: some way to identify if some event was triggered by the process that is listening.

i.e. what inotify currently sucks at is to use it for reading files or devices nodes that have just been closed. i.e. a loop such as "for (;;) { wait_until_someone_closes_a_file_after_writing(); check_what_changed(); }", since the check_what_changed() call might itself open() and close() the file/device node, one would enter a loop here which is very hard to break, since one cannot distuingish between events that were triggered by the process itself or by someone else. An easy fix this could be to include the PID of the process that triggered an event. That way programs could simply ignore all events triggered by themselves.

Another new ABI for fanotify

Posted Nov 12, 2009 15:18 UTC (Thu) by eparis (guest, #33060) [Link]

Pid is included in later patches.

Also you have an open fd which will not cause you to get events. So you can just operate on that fd and you won't hit the loop, open files yourself and you will get events for it.

Another new ABI for fanotify

Posted Nov 12, 2009 11:18 UTC (Thu) by etienne_lorrain@yahoo.fr (guest, #38022) [Link] (2 responses)

> The intended use case is malware-scanning utilities

Some would say the other intended use case is malware-spreading utilities, it is better to "infect" executables which are often executed than those who lay dormant... and having a standard interface for viruses would greatly simplify their development.
Moreover, because it seems you should be able to use multiple independant virus checker, you can hook "under" or "over" a virus checker, to hide your virus from upper layers, or to add it once the file has been certified clean.

Another new ABI for fanotify

Posted Nov 12, 2009 15:26 UTC (Thu) by eparis (guest, #33060) [Link]

Clearly you don't understand the interface. I'm not going say anything other than "you are wrong" but if you do decide to do some research and find a real problem with my architecture please let me know and it will be addressed.

Another new ABI for fanotify

Posted Nov 13, 2009 2:46 UTC (Fri) by bronson (subscriber, #4806) [Link]

> having a standard interface for viruses would greatly simplify their development.

That's an argument for keeping useful features out of the kernel? Are you kidding??

Pretty much all viruses are transferred via network. Does that mean that the networking stack should be removed from the kernel?

Another new ABI for fanotify

Posted Nov 12, 2009 16:06 UTC (Thu) by xav (guest, #18536) [Link] (3 responses)

How fun ... here we have two consecutive articles, one dealing with making several file operations a transaction, the other trying to break down open() into two distinct operations (call userspace to check if I can open(), then open() for real).
I hardly see how both these approches can coexist ...

Another new ABI for fanotify

Posted Nov 13, 2009 2:41 UTC (Fri) by bronson (subscriber, #4806) [Link] (2 responses)

Why? Btrfs is dealing with bits on the platter, fanotify hooks in at the VFS. There's very little logical or conceptual overlap.

Another new ABI for fanotify

Posted Nov 13, 2009 10:13 UTC (Fri) by xav (guest, #18536) [Link] (1 responses)

There's not much overlap, but I wonder how you can implement a transaction containing one or more open() if these operations can block, call userspace and eventually abort.

Another new ABI for fanotify

Posted Dec 20, 2009 17:01 UTC (Sun) by Blaisorblade (guest, #25465) [Link]

> There's not much overlap, but I wonder how you can implement a transaction containing one or more open() if these operations can block, call userspace and eventually abort.
Transactions have been invented in databases, and in that context it's obvious that part of a transaction may fail; and even in btrfs transactions allow for failures. So, what's the problem here?
A bigger problem is instead that during the transaction the filesystem is locked, so userspace needs to avoid modifying the fs during the check, if btrfs is used. It's possible I guess, the atime change problem needs to be solved to perform reads, but that's doable. But if developers don't test this scenario, they won't notice.

I really want ... something that is almost this...

Posted Nov 13, 2009 2:28 UTC (Fri) by knobunc (subscriber, #4678) [Link]

I have a large filesystem that I index. I couldn't care less about scanning things for malware, but this mechanism could help there...

Except, it looks like the interface does not generate events for file moves.

I know about the other notification mechanisms, but the tree is rather large and I do not want to have to add inotify_watches for all of the directories within... I assume (perhaps erroneously) that inotify does not scale to tens of thousands of directories.

-ben

Fanotify has a bug in 3.1 or below

Posted Oct 12, 2011 3:23 UTC (Wed) by searockcliff (guest, #76465) [Link]


Copyright © 2009, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy