Testing in the Yocto Project

May 18, 2019

This article was contributed by Richard Purdie

The ever-increasing complexity of the software stacks we work with has given testing an important role. There was a recent intersection between the automated testing being done by the Yocto Project (YP) and a bug introduced into the Linux kernel that gives some insight into what the future holds and the potential available with this kind of testing.

YP provides a way of building and maintaining customized Linux distribution; most distributions are one specific binary build, or a small set of such builds, but the output from YP depends on how you configure it. That raises some interesting testing challenges and the key to that is automation. The YP's build processes are all automated and its test infrastructure can build compilers, binaries, packages, and then images, for four principal architectures (ARM, MIPS, PowerPC, and x86) in 32- and 64-bit variants, and for multiple C libraries, init systems, and software stacks (no-X11, X11/GTK+, Wayland, etc.). It can then build and boot-test them all under QEMU, which takes around six hours if everything needs to be rebuilt; that can drop to under two hours if there are a lot of hits in the prebuilt-object cache.

Not content with that, YP has been adding support for running the test suites that many open-source projects include on a regular and automated basis. These are referred to as packaged tests or "ptests" within the project. For example, a ptest might be what would run if you did "make check" in the source directory for the given piece of software, but packaged up to be able to run on the target. There are many challenges in packaging these up into entities that can run standalone on a cross-platform target and parsing the output into a standard format suited to automation. But YP has a standard for the output and the installed location of these tests, so they can be discovered and run.

While all architectures are boot-tested under QEMU, and those tests are run on batches of commits before they're merged into YP, right now only architectures with KVM acceleration have the ptests run. Also, the ptests are run less regularly due to the time they take (3.5 hours). This means ptests are currently run on 64-bit x86 a few times a week and aarch64 is in testing using ARM server hardware.

When YP upgraded to the 5.0 Linux kernel recently, it noticed that some of its Python 3 ptests were hanging. These are the tests from upstream Python, and the people working on YP are not experts on Python or the kernel, but it was clear there was some problem with networking. Either Python was making invalid assumptions about the networking APIs or there was a kernel networking bug of some kind. It certainly seemed clear that there was a change in behavior. The bug was intermittent but occurred about 90% of the time, so it was easy to reproduce.

Due to that, YP developers were able to quickly bisect the issue down to a commit in the 5.0 kernel, specifically this commit ("tcp: implement coalescing on backlog queue"). The problem was reported to the netdev mailing list on April 7.

Nothing happened at first, since the YP kernel developers and Python recipe maintainers didn't have the skills to debug a networking problem like this and there wasn't much interest upstream. On April 23, though, Bruno Prémont also ran into the same problem in a different way. This time, the original patch author was able to figure out the problem. There was an off-list discussion about it and a patch that fixes the problem was created; it has made its way into the 5.0 stable series in 5.0.13.

The problem was in the 5.0 changes to the coalescing of packets in the TCP backlog queue, specifically that packets with the FIN flag set were being coalesced with other packets without FIN; the code paths in question weren't designed to handle FIN. Once that was understood, the fix was easy. This also highlighted potential problems with packets that have the RST or SYN flags set, or packets that lack the ACK flag, so it allowed several other possibly latent problems to be resolved at the same time.

To YP, this is a great success story for its automated testing as it found a real-world regression. YP has had ptests for a while, but it has only recently started to run them in an automated way much more regularly. It goes to show the value in making testing more accessible and automated. It also highlights the value of these existing test suites to the Linux kernel; there is a huge set of potential tests out there that can help test the kernel APIs with real-world scenarios to help find issues.

The Yocto Project would welcome anyone with an interest in such automated testing; while it has made huge strides in improving testing for its recent 2.7 release, there is so much more that could be done with a little more help. For example, the project would like to expand the number of test suites that are run, improve the pass rates for the existing tests, and find new ways to analyze and present the test results. In addition, with more people available to triage the test data, the project could incorporate some pre-release testing to help find regressions and other problems even earlier.

[Richard Purdie is one of the founders of the Yocto Project and its technical lead.]

Index entries for this article
GuestArticles	Purdie, Richard

Testing in the Yocto Project

Posted May 18, 2019 16:31 UTC (Sat) by pj (subscriber, #4506) [Link]

This sounds like a bit of a bllack eye for the kernel devs: given a reproducible bug and bug report of same, there "wasn't much interest upstream" until one of the kernel devs hit the same bug. :(

Testing in the Yocto Project

Posted May 18, 2019 16:39 UTC (Sat) by dvdeug (subscriber, #10998) [Link] (29 responses)

It seems problematic that the bug was found, but even localized to a particular commit, it didn't get any attention until someone else ran across it.

Testing in the Yocto Project

Posted May 18, 2019 18:20 UTC (Sat) by mtaht (subscriber, #11087) [Link] (7 responses)

If we didn't essentially apply a bloom filter to that first basic bug report we'd be even more awash in a sea of uncertainty and overload than we are now. How many non-reproducable bug reports do we have compared to reproduced ones? 10x1? 100x1?

I think having a diverse automated testbase so one group with an automated test suite could get verified by another group with a different automated testsuite, could possibly evolve out of all this. Ultimately we'll end up with AI's stacked upon AIs competing to keep the computing environment clean...

... duking it out with AIs stacked upon AIs competing to find holes and exploit them before the other AIs do... and other AIs ripping out the work of other AIs to find a place for themselves to execute....

... and befuddled humans as the bystanders in that rapidly escalating war hoping that no AI notices that the AIs don't actually need to keep the humans alive and around in order to thrive in the complexity collapse.

I just finished reading a pretty decent new SF book out about this sort of stuff: https://www.amazon.com/Genesis-Geoffrey-Carr-ebook/dp/B07...

Testing in the Yocto Project

Posted May 19, 2019 2:54 UTC (Sun) by roc (subscriber, #30627) [Link]

> I think having a diverse automated testbase so one group with an automated test suite could get verified by another group with a different automated testsuite, could possibly evolve out of all this.

Why would you need that? I can't think of any project that requires a bug to be confirmed in two independent test suites before investigating it. I don't even know how you would know it was the same bug.

> Ultimately we'll end up with AI's stacked upon AIs competing to keep the computing environment clean...

Maybe, but the kernel community is still far behind what other big projects consider basic testing requirements. (And those basics don't require AI.)

Testing in the Yocto Project

Posted May 19, 2019 4:19 UTC (Sun) by dvdeug (subscriber, #10998) [Link] (5 responses)

If someone can tell you the exact commit where a change happened and can offer a reproducable test, I think that's a good sign that it's not an unreproducable bug report. Whenever a developer gets huffy because he discover people trashing his program for a problem nobody reported, this is why they weren't reported. Users quickly learn that taking time to make a (good) bug report often goes unrewarded, and stop bothering making them.

Testing in the Yocto Project

Posted May 19, 2019 9:20 UTC (Sun) by knuto (subscriber, #96401) [Link] (4 responses)

Or to put it a different way: Those who have good tests ends up having to do the work of investigating other people's bugs caught by their tests. Which is why automated tests that has to pass before code gets considered for merge is so much more powerful. That way the author of a change keeps the burden of proof for the change, instead of the test owner.

Testing in the Yocto Project

Posted May 19, 2019 10:37 UTC (Sun) by roc (subscriber, #30627) [Link] (3 responses)

Indeed, the rr test suite still catches kernel regressions from time to time, which we investigate and report. I don't think anyone else is running systematic tests of x86-32 ioctl compat code.

> That way the author of a change keeps the burden of proof for the change, instead of the test owner.

That's a good point, but if you're a kernel developer and you can keep the burden of testing, triaging, diagnosing and reporting bugs on other people, it makes sense to carry on with that system. Even if in many cases you're shifting work from paid kernel developers to unpaid open source project maintainers. Even if it means you're forcing downstream consumers to duplicate tons of work maintaining multiple test infrastructures and running the same sorts of tests, or even exactly the same tests. Even if it strangles the feedback loop between detecting and fixing regressions.

Testing in the Yocto Project

Posted May 20, 2019 9:16 UTC (Mon) by metan (subscriber, #74107) [Link] (2 responses)

> Indeed, the rr test suite still catches kernel regressions from time to time, which we investigate and report. I don't think anyone else is running systematic tests of x86-32 ioctl compat code.

Well in SUSE we do run LTP compiled with -m32 on x86-64 which catches bugs in the compat layer from time to time, but the coverage of ioctls() is sparse at best, we do have some tests for /dev/tty*, /dev/net/tun, block device ioctls, /dev/random, namespace ioctls and some Btrfs ones and that's it.

Testing in the Yocto Project

Posted May 20, 2019 13:14 UTC (Mon) by roc (subscriber, #30627) [Link] (1 responses)

One thing we're doing which LTP probably isn't doing: we make syscall parameter buffers be immediately followed by an unmapped guard page, so if the kernel reads or writes too much memory, the test gets an EFAULT. Bugs where the kernel reads too much memory (and ignores what was read) are otherwise impossible to catch. We do this on x86-64 too of course but it catches fewer bugs there.

Testing in the Yocto Project

Posted May 20, 2019 13:21 UTC (Mon) by metan (subscriber, #74107) [Link]

No we don't but we should, thanks for the idea! I've created a LTP issue https://github.com/linux-test-project/ltp/issues/531 so that we can follow up on that.

Testing in the Yocto Project

Posted May 20, 2019 15:35 UTC (Mon) by rweikusat2 (subscriber, #117920) [Link] (20 responses)

In all seriousness, how do you expect this to work? Someone will need to invest the time to work out what's going on and how to fix it and for a relatively obscure problem, the someone who's being affected by it is the natural choice for "someone". Otherwise, it's going to languish until it also stops someone else from doing something he or she is trying to accomplish.

Testing in the Yocto Project

Posted May 20, 2019 16:04 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (1 responses)

Sometimes you need expert analysis of what's going on. At least to not make it take weeks to get a fix made. Looping in upstream after the bisect and then working with them works much better in my experience than working on a full fix and popping up with a patch days or weeks later when the main developers might be able to say "oh, yes, that's a simple oversight" after a few emails.

See this bug report to git: https://marc.info/?t=151242550100001&r=1&w=2 And the followup patch after just one day: https://marc.info/?t=151251216400002&r=1&w=2. Could I have come up with that `read_cache()` line? Probably, but not in that timescale. The commit message would also certainly have been far inferior.

Testing in the Yocto Project

Posted May 20, 2019 17:35 UTC (Mon) by rweikusat2 (subscriber, #117920) [Link]

This trivial oversight isn't exactly a good example of the kind of statement you're apparently trying to make. Yes, sometimes, when working with open source code, someone who's actually programmer may be needed because one would otherwise need to train some non-programmer for years (NB: not weeks). If there's no one at hand, things will have to wait until some programmer who's busy with something else needs it fixed, too. That's how open source works.

Apart from that, paraphrasing a famous quote, "the kernel is just a program".

Testing in the Yocto Project

Posted May 20, 2019 18:14 UTC (Mon) by dvdeug (subscriber, #10998) [Link]

Someone who is being affected by it probably doesn't have the knowledge to write up a patch or the political know-how to get it through the Linux kernel community. For all the complaints from kernel developers about GCC, I don't see any patches coming from the kernel developers; they seem to think it would be more efficient to let GCC developers develop the compiler and work themselves on the kernel.

That philosophy works great for code that somebody threw up on GitHub because someone else might find it useful. But if maintainers get dismayed that a feature is being dismissed as broken, with no bug reports, or that some group is running an ancient version of the kernel, this is why. The fact that updating the kernel could cause random problems, and even dedicated searching for the bug, where you can provide a reproducable test case and the commit that's causing the problem, is not going to help, is a reason to lock the kernel to one version or use a local patch if one can figure out what's wrong.

I see the problem. But if you want to make a program that's used by a wide audience, sometimes you have to respond to the needs of that wide audience, and when they can give you a test that's failing and even a commit that made the change, perhaps you should take a look and fix it. Otherwise, people will fork or stay on older versions or switch to another program that does the same thing. As you say, the kernel is just a program, and if Linux's rapid changes are a downside, no other Unix kernel is changing nearly as fast.

(I'd point out that there's not this attitude inside the kernel community; if a commit won't bootstrap on ARM, it's not the problem of the ARM community, that's the problem of the commit submitter. There's a good argument that a problem like this doesn't need time to work out what's going on; once a particular commit was known to be causing a userspace problematic change, just rollback the commit and if the submitter of the commit wants the change, they get to make it work right or explain why the problem is not a kernel bug.)

Testing in the Yocto Project

Posted May 20, 2019 20:51 UTC (Mon) by roc (subscriber, #30627) [Link] (16 responses)

Typically to fix a bug efficiently you need both developer knowledge and detailed information from the bug reporter; if you can combine both it's much less total work than forcing one party to solve it alone. In lots of projects, project developers are expected to take responsibility for working with bug reporters to work out what's going on and how to fix it, are skilled at doing so, and put a lot of effort into making it easy to report bugs, triaging and tracking reported bugs, tools for collecting data from users to help fix bugs, etc. Coordinating developers and bug reporters is a well understood problem, not some mysterious impossible challenge.

The Linux kernel is special. Linux is so dominant that downstream users are more or less stuck with it, so kernel development is structured to offload as many costs as possible to downstream. Even though it's a terribly inefficient setup and painful for downstream, downstream users have no leverage to alter the deal.

This will continue to make sense until Linux has serious competition, and then it won't.

Testing in the Yocto Project

Posted May 20, 2019 21:33 UTC (Mon) by rweikusat2 (subscriber, #117920) [Link] (13 responses)

> The Linux kernel is special. Linux is so dominant that downstream users are more or less stuck with it, so kernel development is
> structured to offload as many costs as possible to downstream. Even though it's a terribly inefficient setup and painful for
> downstream, downstream users have no leverage to alter the deal.

If *you* think this is terrible, you're perfectly entitled to that but other people who are also "downstream users" might disagree. The kernel is really just a program and as such, it can be made to dance to whatever tune someone currently prefers. It's even a pretty organized program, hence, changing it is reasonably easy compared to other open source software (I happen to have encountered).

Testing in the Yocto Project

Posted May 20, 2019 23:16 UTC (Mon) by roc (subscriber, #30627) [Link] (12 responses)

Various different downstream organizations have had to set up their own automated test farms, manage them, screen out intermittent failures, diagnose and report failures, etc. This is entails a lot of duplicated work. This is what I mean by "terribly inefficient", i.e. "very inefficient". Obviously whether that's "terrible" depends on your point of view. As a downstream project maintainer running automated tests that sometimes catch kernel regressions, I'm not happy about it.

Imagine if Firefox and Chrome didn't do their own testing, and instead Linux distros, device vendors and Web developers felt obliged to set up independent browser-testing projects with duplicated but not always identical tests, their own triage and false-positive screening, various different channels for reporting test failures upstream (many of which are ignored), etc. It would obviously be insane. Yet the kernel does this and everyone seems to accept it.

Testing in the Yocto Project

Posted May 20, 2019 23:30 UTC (Mon) by corbet (editor, #1) [Link] (6 responses)

Kernel testing needs to improve a lot — and has moved significantly in that direction. But I do wonder how you might envision this working; what organization, in particular, would do the testing that you are looking for? Who should pay for it?

The problem with the kernel is that so many of the problems people run into are workload and hardware dependent. Developers increasingly test for the things they can test, but they can never test all of the available hardware, and they can never test that your workload, in particular will not regress. So, while the project still very much needs to improve its act in this regard, I'm not sure I see an alternative to a lot of downstream testing anytime soon.

(And yes, a better answer to bug tracking is sorely needed, but there are many ghetto areas in kernel development that nobody wants to pay for, and that is one of them).

Testing in the Yocto Project

Posted May 21, 2019 0:06 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Ideally this would be an independent lab funded by the Linux Foundation (or any other trade group). A large warehouse stuffed with various hardware that runs realistic workloads.÷

Testing in the Yocto Project

Posted May 21, 2019 0:20 UTC (Tue) by roc (subscriber, #30627) [Link]

> But I do wonder how you might envision this working; what organization, in particular, would do the testing that you are looking for? Who should pay for it?

The Linux Foundation seems to have taken on the role of trying to ensure that organizations that profit from Linux put money back into it. I'm not privy to the workings of LF so I don't know whether they can take this on. Still, if there was a consensus in the kernel community that central testing infrastructure is needed and we just have to figure out how to pay for it, that would be a big step forward.

One thing that might help: the kernel community could try to normalize the expectation that writing tests, building better test frameworks, and investigating failed tests is something every kernel developer needs to do (including the expectation that every bug fix and every new feature comes with automated tests). This is normal for a lot of other projects, from tiny no-money projects to large big-money projects. Then the companies paying people to do kernel development would effectively be taxed to support the testing effort.

> The problem with the kernel is that so many of the problems people run into are workload and hardware dependent.

The kernel workload is highly variable. Then again, for browsers your workload is *every Web page in the world*, from gigantic static pages to GMail to twitchy 3D shooters and everything in between. Mostly written in obfuscated code by tens of millions of developers with ... a range of skill levels ... whom you mostly can't talk to. Browsers are sensitive to hardware variation too, though less so than the kernel, sure. But desktop applications have their own special challenges: myriad combinations of software that your software depends on and interacts with, including ultra-invasive stuff like antivirus and malware that patches your code, all changing underneath you day to day, almost entirely closed-source and often controlled by your competitors. Mobile has similar issues.

I talk a lot about browsers because that's what I know. I don't think kernels and browsers are uniquely difficult.

I get that testing the kernel is hard. That's why greater cooperation and more efficient use of testing resources is so important.

> I'm not sure I see an alternative to a lot of downstream testing anytime soon.

Sure but so far I don't even see that the kernel community sees lack of upstream testing as a problem that needs to be solved.

Look at it this way: kernel people emphasize how important it is for people to upstream their code --- "upstream first", etc. That reduces duplication of effort, ensures things stay in sync, etc. But for some reason* they don't apply the same logic to tests. Tests, test harnesses, and test infrastructure should be upstream-first too for exactly the same reasons. Yes, there will always be stuff that is hard to upstream.

* Come on, we all know what the real reason is. Testing is less fun that coding features or fixing bugs. Figuring out test failures is even less fun (though I'm working on that).

Testing in the Yocto Project

Posted May 21, 2019 0:26 UTC (Tue) by roc (subscriber, #30627) [Link] (3 responses)

> (And yes, a better answer to bug tracking is sorely needed, but there are many ghetto areas in kernel development that nobody wants to pay for, and that is one of them).

You're paying for poor bug tracking already, the cost is just smeared out as people waste time finding and diagnosing bugs people have already found and diagnosed.

I assume the actual hardware/software running costs would be insignificant, so is the problematic cost what you'd have to pay people to do triage or what? I always assumed the barrier to better bug tracking was the difficulty of coming up with a single system and getting people to pay attention to it.

Testing in the Yocto Project

Posted May 21, 2019 0:31 UTC (Tue) by roc (subscriber, #30627) [Link] (2 responses)

One easy thing that might make a real difference is to get some people from somewhere that understands testing (e.g. Google) who also have cachet in the kernel community to give a talk or talks at a kernel event about what serious testing looks like at scale. And listen to them.

Testing in the Yocto Project

Posted May 23, 2019 4:25 UTC (Thu) by tbird20d (subscriber, #1901) [Link] (1 responses)

Please come to the Automated Testing Summit in Lyon, France, in October. It's a new event, where I hope we can talk about (and listen to testing experts on) exactly the issues you raise. The CFP should be available shortly, and personally I'd love to hear from you and other testers about your experiences. I'll announce the CFP on the automated testing e-mail list when it's ready (which should be within a few weeks).

Testing in the Yocto Project

Posted May 23, 2019 21:42 UTC (Thu) by roc (subscriber, #30627) [Link]

I live on the other side of the world and don't have money to travel that far. (Remember the part where unpaid project maintainers are being asked to assume the testing burden for paid kernel developers.) Besides, I'm not actually a testing expert, just one of many many developers who have worked on a bug project that takes upstream testing seriously.

Testing in the Yocto Project

Posted May 20, 2019 23:47 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

And it's not really excusable at this point, the kernel has more than enough powerful backing to get funds for a full testing lab. With a variety of different hardware and software.

Even in this case some duplication will still inevitably take place, but at least it can be greatly minimized.

Testing in the Yocto Project

Posted May 21, 2019 9:04 UTC (Tue) by metan (subscriber, #74107) [Link] (1 responses)

At least people from various kernel QA departments started to talk to each other in a last year or so. I suppose that there is enough funding already but the real problem is that every company works more or less in isolation. Maybe we can solve that with standard interfaces and components so that each company can plug in a hardware they care for into a virtual kernel testing lab, at least that seem to be a way things started to move slowly on. The problem now is that the problem is complex and as far as I can tell there is no significant manpower behind it.

Testing in the Yocto Project

Posted May 23, 2019 4:33 UTC (Thu) by tbird20d (subscriber, #1901) [Link]

Indeed. A group started to get together last year, and made some progress at least articulating some of the issues involved.
See https://elinux.org/Automated_Testing_Summit_2018 and https://lwn.net/Articles/771782/

I'm aware of 3 different get-togethers to continue working on pushing stuff forward: A testing microconference at Plumbers, a kernel testing summit immediately after Plumbers, and the 2019 Automated Testing Summit after OSSEU/ELCE in France. Some of these have not gotten much visibility yet, but hopefully that will change shortly.

Testing in the Yocto Project

Posted May 25, 2019 17:14 UTC (Sat) by flussence (guest, #85566) [Link] (1 responses)

>Imagine if Firefox and Chrome didn't do their own testing, and instead Linux distros, device vendors and Web developers felt obliged to set up independent browser-testing projects with duplicated but not always identical tests, their own triage and false-positive screening, various different channels for reporting test failures upstream (many of which are ignored), etc. It would obviously be insane. Yet the kernel does this and everyone seems to accept it.

That insanity is already old news. Are you completely unaware of the long tail of blogs posting cargo-culted tweaks to get things like basic GPU acceleration and sound to work in browsers?

“This web browser best viewed with Nvidia on Windows 10” is not a good look, but everyone seems to accept it.

Testing in the Yocto Project

Posted May 27, 2019 23:11 UTC (Mon) by roc (subscriber, #30627) [Link]

> Are you completely unaware of the long tail of blogs posting cargo-culted tweaks to get things like basic GPU acceleration and sound to work in browsers?

I'm aware that real-world combinations of software and hardware vastly exceed anything you can test in any reasonable test farm. Firefox's upstream tests aren't perfect, and can't ever be, so tweaks to Firefox's configuration in unusual situations remain necessary. (Keep in mind, though, that many of those "cargo-culted tweaks" aren't addressing Firefox bugs but are choosing different points in a tradeoff, e.g. choosing performance over stability by using GPU drivers with known bugs.)

That doesn't affect what I wrote, though. It is still true that, unlike the kernel, Firefox and Chrome upstreams do massive testing in a way that's tightly integrated with the development process, and take responsibility for detecting, diagnosing and fixing regressions.

Testing in the Yocto Project

Posted May 23, 2019 22:01 UTC (Thu) by roc (subscriber, #30627) [Link] (1 responses)

And on cue, the 5.2.0 kernel breaks rr completely. https://github.com/mozilla/rr/issues/2360

Testing in the Yocto Project

Posted May 27, 2019 23:11 UTC (Mon) by roc (subscriber, #30627) [Link]

Phew, that was a false alarm.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Testing in the Yocto Project

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.