A report from the Automated Testing Summit
In the first session of the Testing & Fuzzing microconference at the 2018 Linux Plumbers Conference (LPC), Kevin Hilman gave a report on the recently held Automated Testing Summit (ATS). Since the summit was an invitation-only gathering of 35 people, there were many at LPC who were not at ATS but had a keen interest in what was discussed. The summit came out of a realization that there is a lot of kernel testing going on in various places, but not a lot of collaboration between those efforts, Hilman said.
The genesis of ATS was a discussion in a birds-of-a-feather (BoF) gathering at the 2017 Embedded Linux Conference Europe (ELCE) on various embedded board farms that were being used for testing. A wiki page and mailing list were created shortly after that BoF. The summit, which was organized by Tim Bird and Hilman, was meant to further the work going on in those forums and was set for October 25 in Edinburgh, Scotland during ELCE 2018. There were 22 separate testing projects represented at ATS, he said.
The main idea was to try to find common ground between all of the different projects and to figure out what was needed to collaborate more. A lot of effort is being put into all of these projects, but there is no place to share resources and development ideas. There are lots of different test suites and new ones are being developed, but there is no real coordination on what tests to run and how they should be run. Beyond that, there is no common definition of test plans or strategies and how to report results. In a way, running tests is easy, Hilman said, doing something with the results is the hard part.
He wanted to summarize what was discussed at the summit, but also pointed to some detailed minutes that were kept. One of the questions was why there hasn't been more collaboration along the way. Some companies think of their test suites and infrastructure as a kind of "secret sauce" but, as is often found in the open-source world, it actually turns out that most are doing much the same things. But some tests are written specifically for a particular lab setup, device, or test framework, so are not all that useful to share. Some companies also fear sharing their tests because that might show how little testing they are actually doing, he said, especially if they are saying they do a lot more testing than, in reality, they are.
Some time at ATS was spent bikeshedding over terminology, but there is a need to collect terms and definitions into a glossary. The discussion "took a fair amount of time", he said, but was useful because a lot of time was being wasted when people used unfamiliar terms or defined the same terms differently. Most of the participants come from the embedded world, so there are terms and concepts that may not have been familiar to those from the server and desktop spaces. For example, "DUT", which means "device under test", was confusing some attendees before it was clarified and added to the glossary.
There are multiple areas where collaboration is not happening but the ATS participants needed to figure out which areas to start with. They decided on trying to work out some common test definitions. That would include where to get the source for test suites, how to build them, how to run them, and how to determine the pass/fail criteria. Attendees also spent a lot of time discussing output formats, he said. There is a wide variety currently, including XML, plain text, JSON, and 0 or 1 exit status. That diversity makes it hard to collect up results from multiple test suites and coherently combine them for consumption by humans.
Pass/fail conditions are influenced by what is being tested and by whom. For example, the Linux Test Project has lots of tests, but many of those are never going to pass on some embedded targets. That means there are lists of specific tests to skip in different frameworks, but those often do not give any reason why a test appears on the list. It might be that there are hardware-specific reasons that the test cannot pass, but it might be that some developer was just lazy, didn't figure out why it didn't pass, and simply added it to the skip list. So, even the same test suite may be run differently on a particular test framework or hardware device.
The test definition collaboration will start with a survey to determine what various testing efforts are already using and how. Results from that will help guide further work. He doubts that there will be a single test framework or test specification mechanism that will come out of this work, but it should help narrow things down a ways. A survey was already done to gather information about the testing infrastructure in use; that survey and its results are available on the wiki.
ATS attendees also spent some time trying to understand what others are doing, particularly in terms of lab and infrastructure configuration. The gathering had an embedded bias to some extent, because of its roots, so there was a lot of discussion about power-distribution units (PDUs), handling serial consoles, and so on. There was also some discussion of the test frameworks from a high level; they tried "not to get too much down into the weeds" of specific frameworks, Hilman said.
The action items from the meeting include cleaning up the glossary. They will also be creating a survey to start the test definition collaborative effort. It turns out that most test labs have their own "scripts and hacks" for working with PDUs, but there was no place to collect them up for sharing. PDUDaemon is an existing project that handles PDUs, so another action item is to have people add their scripts and such to that project.
There is a plan to meet again at next year's ELCE, which is in October in Lyon, France. There was a question about having a gathering with a wider scope and Hilman said that had been discussed. Perhaps also having a gathering at Plumbers or another conference would make sense. Another question concerned how many testing efforts oriented toward servers and desktops were represented. Hilman said that it was heavily skewed toward embedded testing, but that around half a dozen of the projects were for server/desktop testing frameworks and efforts.
Another question concerned comparative testing, such as comparing performance release to release. Much of that data is being collected in one form or another, but in order to be useful, the underlying testing regime, configuration, hardware, and so on must be reliably captured to allow apples-to-apples comparisons. Similarly, lab-to-lab differences, including hardware such as storage and networking, also play a role, so comparisons between different labs' data may be difficult.
He concluded his session by suggesting that interested people look at the wiki and participate on the mailing list.
[I would like to thank LWN's travel sponsor, The Linux Foundation, for assistance in traveling to Vancouver for LPC.]
Index entries for this article | |
---|---|
Conference | Linux Plumbers Conference/2018 |
Posted Nov 15, 2018 15:15 UTC (Thu)
by mbligh (subscriber, #7720)
[Link] (3 responses)
Posted Nov 15, 2018 17:53 UTC (Thu)
by willy (subscriber, #9762)
[Link]
Posted Nov 15, 2018 19:28 UTC (Thu)
by tbird20d (subscriber, #1901)
[Link] (1 responses)
I think it's really important to figure out why things stalled in previous efforts, and find natural incentives that will help people and companies
I think one key will be establishing some de-facto projects and standards that people can rally around. But this requires the very difficult
Posted Nov 15, 2018 19:31 UTC (Thu)
by tbird20d (subscriber, #1901)
[Link]
Posted Dec 4, 2018 16:52 UTC (Tue)
by anmar (subscriber, #57578)
[Link] (1 responses)
Posted Dec 7, 2018 8:47 UTC (Fri)
by sergeyb (guest, #102934)
[Link]
A report from the Automated Testing Summit
A report from the Automated Testing Summit
A report from the Automated Testing Summit
Hopefully, we'll make some progress, but we'll see what happens.
commit resources to shared, rather than private, efforts. To me, it's like embedded Linux all over again. It took decades to overcome the
fragmentation, and to convince companies to start sharing their code. (And we're still not as mature about handling open source in
embedded as we are in the enterprise space). I hope this effort won't take as long.
task of breaking up currently monolithic systems into re-usable components. This is not going to be easy, for sure.
A report from the Automated Testing Summit
A report from the Automated Testing Summit
A report from the Automated Testing Summit
Attendees also spent a lot of time discussing output formats, he said.
There is a wide variety currently, including XML, plain text, JSON, and 0 or 1 exit status.
That diversity makes it hard to collect up results from multiple test suites and coherently combine them for consumption by humans.
Please don't reinvent the wheel! There are at least three popular test report formats widely used in testing opensource (and commercial) software: TAP (TestAnythingProtocol), SubUnit and JUnit (XML-based). Many testing frameworks supports these formats and you should have serious reasons to create a new one.
There is a page with short description of each format and comparison table.