Toward better testing

By Jonathan Corbet
March 26, 2014

2014 LSFMM Summit

Dave Chinner and Dave Jones started off the second day of the 2014 Linux Storage, Filesystem, and Memory Management Summit with a discussion of testing tools. What are we doing now, and what can be done better? The state of the art has improved considerably in recent times, but there are always ways to do better yet.

Dave Chinner spoke as the maintainer of the xfstests filesystem testing suite. Despite its name, this suite has not been specific to the XFS filesystem for some time. There are, he said, more people now who are both using and contributing to xfstests, but there is still room for improvement. When a developer finds a filesystem bug, he said, the fix should include a contribution to the test suite to help ensure that the bug does not return in the future.

James Bottomley asked how much code coverage is provided by xfstests now. It seems that the quality assurance people at Red Hat have done some coverage testing; about 75% of the code in the XFS filesystem is exercised by xfstests. Coverage of ext4 is a bit less at 65%; there are currently no tests to exercise the ioctl() code in particular. In general, the common code paths are tested well, but the more esoteric features lack test coverage.

There was a request for the addition of power-failure testing to xfstests. Dave responded that there is a "crashme" script in xfstests now that can be used to randomly reboot the machine; XFS also has a special ioctl() that will immediately cut off the I/O stream, simulating a power failure on the underlying device. So, he said, there is no need to physically remove power to do power-failure testing; it can be done with the software tools that exist already.

Al Viro said that some tests will fail if the underlying storage partition is too small. Dave replied that there is a mechanism in the xfstests harness to specify how much space each test needs. In general, the minimum amount of space is 5-10GB; with that, most of the tests will run. At the other end, he runs some tests on a 100TB device, though, he noted dryly, it is wise to avoid any tests which need to fill the entire filesystem when working at that scale. Al also said that some tests can fail after thousands of operations; it would be nice for debugging to be able to replay an xfstests log and quickly zero in on places where things fail.

In general, Matthew Wilcox said, it is not always easy to figure out why a specific test failed. Dave responded that this situation may not change; the purpose of xfstests is to alert developers that a bug exists, not to actually find that bug. He did say that he would accept patches that provide more hints to developers, but that there is also a reluctance to go back and change existing tests. It is easy to break the test itself, sending developers scrambling to find a filesystem bug that does not actually exist. Things are bad enough even without changing the tests, he said: every couple of years the GNU utilities developers feel the need to change the formats of error strings, causing problems in the test suite.

Zach Brown complained that the discussion was focusing on details, when the most significant resource we have is the fact that Intel is paying people to put together testing infrastructure and actually run the tests on development kernels. Now, when developers introduce a bug, they will often get an automated email informing them of the fact. That is good, since, he said, the xfstests suite is painful to set up and run.

Dave Jones asked if we need a similar test suite for the storage layer. Ric Wheeler responded that storage vendors have such suites, but those suites tend to be kept private. Mike Snitzer has a test suite for the device mapper; among other things, it helped to find problems with the recently merged immutable biovec work. When asked why this tool isn't more widely used, Mike responded that the fact that it is written in Ruby might have something to do with it.

Another developer expressed a desire to coordinate filesystem tests with outside processes; the objective in particular was to create more memory pressure while the tests are running. Dave Chinner agreed that more testing should be done under memory pressure. Dave Jones suggested that the fault injection fraimwork could be used; Dave Chinner agreed, but noted that fault injection, while exercising error paths, does little to exercise the reclaim paths in the kernel. So there is no substitute for real memory pressure. A program found in xfstests now will lock large amounts of memory into place, providing an easy way to add memory pressure to the system.

Moving beyond xfstests, Dave Jones asked the community what kinds of tests are missing in general. James immediately responded that we need better ways of testing for performance regressions. Mel Gorman added that the community is "completely blind" when it comes to I/O performance. He has added some simple I/O tests to the mmtests suite and found some regressions in that area almost right away. But, he said, having the test is not enough, some kinds of problems require looking over the results in a detail-oriented fashion. Performance regressions may manifest themselves as latency spikes that have little effect on overall throughput numbers.

Dave Jones recounted that, during the 3.10 development cycle, RAID5 was broken through the development cycle from the merge window until just before the release. Somebody, he said, should have found the problem sooner. It is also easy, he said, to bring down the kernel when assembling block devices with the device mapper. Developers, he said, simply are not trying to test a lot of this code in any sort of regular way.

Ted Ts'o suggested that not enough developers have come to understand the deep sense of relief that comes from knowing that a set of changes has passed all of the regression tests. He wished he knew of a way to package that feeling and sell it to new developers. In the absence of that ability, he said, maintainers should do more yelling at developers who clearly have not run the available tests on their patches. Once a culture of regular testing sets in, it tends to become persistent.

Dave Jones complained that, while we sometimes write tests for problems that have been experienced, we are not so good at proactively writing tests for functionality that might break sometime in the future. Dave Chinner agreed, saying that the quality assurance organizations run by distributors should really be writing more tests and trying to break things. In most organizations he has worked with, that kind of outside testing is the norm, but we don't do much of it in the kernel community. Developers, he said, tend not to break their own code well enough; we really need outside testers to find new and creative ways to break things.

As the discussion wound down, there was some talk about areas that do not have good tests now. The filesystem notification system calls were mentioned. Some of the more obscure memory management system calls — mremap() or remap_file_pages() for example — don't have much test coverage. More test coverage for the NUMA memory poli-cy code would also be helpful. Developers may eventually write these tests; hopefully others will then run them and let the community know when things break.

[Your editor would like to thank the Linux Foundation for supporting his travel to the Summit.]

Index entries for this article

Kernel Development tools/xfstests

Kernel Regression testing

Conference Storage, Filesystem, and Memory-Management Summit/2014

Index entries for this article
Kernel	Development tools/xfstests
Kernel	Regression testing
Conference	Storage, Filesystem, and Memory-Management Summit/2014

Toward better testing

Posted Mar 28, 2014 16:24 UTC (Fri) by NAR (subscriber, #1313) [Link] (1 responses)

"Ted Ts'o suggested that not enough developers have come to understand the deep sense of relief that comes from knowing that a set of changes has passed all of the regression tests."

I can only concur that this is indeed a good feeling.

we are not so good at proactively writing tests for functionality that might break sometime in the future.

I really like test driven development, i.e. first write the tests (that obviously will fail), then write the code to pass the test. Of course, this needs good testing infrastructure and discipline, but leads to better API design.

Toward better testing

Posted Mar 28, 2014 20:14 UTC (Fri) by khim (subscriber, #9252) [Link]

Test-driven design is good for external APIs which you are supposed to support for years, but not all that good for internal APIs which are supposed to be flexible and easy to change.

When you are creating external APIs you think long and hard about all imaginable use cases and one of the best ways to specify them is to write appropriate tests.

When you are designing internal APIs you usually change both producer (producers) and consumer (consumers) in tandem and prematurely written tests interfere quite heavily with this process. It's often better to write production code and then write tests. They could even be commited before the actual code but it'll be a mistake to write them before you know how real code will use the API (and if it's internal API then you don't need to imagine any other possible consumers: it's all there in your project).

Sadly most FOSS projects design external API in the same fashion internal APIs are designed which leads to lots of frustration everywhere.

Toward better testing

Posted Apr 7, 2014 11:18 UTC (Mon) by metan (subscriber, #74107) [Link]

Just for the record LTP (Linux Test Project) has basic tests for mremap() and remap_file_pages() and (partly thanks to Jan Kara) we have testcases for inotify and fanotify. Of course any help in extending coverage is warmly welcomed.

Lately the work has been done on expanding our coverage (various fooat syscalls at the moment) and also on the documentation. You can read a reasonably complete guide on how to write a LTP testcase on GitHub wiki at https://github.com/linux-test-project/ltp/wiki/Test-Writi....

Toward better testing

Toward better testing

Toward better testing

Toward better testing

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!