|
|
Subscribe / Log in / New account

Drivers as documentation

By Jonathan Corbet
November 22, 2011
As a community, we are highly concerned with the quality of our code. Kernel code is reviewed for functionality, long-term maintainability, documentation, and more. Driver code is not always reviewed to the same degree, but it can be just as important - if our drivers do not work, our kernel does not work. There is an aspect to the long-term maintainability of drivers that could use more attention: the degree to which a driver documents how its hardware works.

One might argue that the job of documenting the hardware falls on whoever writes the associated datasheet. There is some truth to that claim, but, in many cases, only the original author of the driver has access to that datasheet. Those who come after can try to extract documentation from the vendor or to search for clandestine copies hosted on the net. But often the only option is to figure out the hardware from the one source of information that is actually available: the existing driver. If the driver source does not help that new developer, one can argue that the original author has fallen down on the job.

So, if a driver contains code like:

    writel(devp->regs[42], 0xf4ee0815);

it is missing something important. In the absence of the datasheet, there is no way for any other developer to have any clue of what that operation is actually doing.

The problem is worse than that, though; datasheets often omit useful information, obscure the truth, and lie through their teeth. The hardest part of getting a driver to work is often the process of figuring out what the hardware's features and special needs really are. It often seems, for example, that the datasheet is written before the process of designing the hardware begins. As time passes, the understanding of the problem grows, and deadlines loom, hardware engineers start to jettison features that cannot be made to work in time or that, in their sole and not-subject-to-appeal opinion, can be painlessly fixed in software. Updating the datasheet to match the actual hardware never happens.

Thoughtful driver developers will, on discovery of the imaginary nature of a specific hardware feature, add a comment to the driver; that way, no future maintainer has to figure out (the hard way, involving keyboard imprints on the forehead) why the driver does not use a specific, helpful-looking hardware capability.

Then there is the matter of "reserved" bits. There has not yet been a datasheet written that did not contain entries like:

Weird tangential functions register (offset 0xc8)
BitsFunction
17Reserved: do not touch this bit or the terrorists will win

Somewhere, deep within the company, there will be a maximum of two engineers who know that the document is incomplete, but that nobody had ever gotten around to updating it. If you can corner one of those people, you can usually get them to admit that this bit should be documented as:

Weird tangential functions register (offset 0xc8)
BitsFunction
170 = DMA engine randomly locks up
1 = DMA engine functions as expected
Default value = 0

A developer who cannot get his hands within range of the neck of at least one of those hardware engineers will likely spend a lot of time figuring out that they need to set the "make it work" bit. This effort can involve reverse-engineering proprietary drivers or, in cases of pure desperation, playing with random bits to see what changes. Once that bit has been located, it is natural for the tired and frustrated developer to quietly set the bit before heading off in a determined effort to eliminate the memory of the entire process through the application of large amounts of beer. A particularly forward-thinking developer might make a note on a printed version of the datasheet for future reference.

But handwritten notes are not usually helpful to the next developer who has to work on that driver. A moment spent documenting that bit:

    #define WTF_PRETTY_PLEASE  0x00020000 /* Always set this or it locks up */

may save somebody else hours of unnecessary pain.

It is tempting to think of a completed driver as being done. But driver code, like other kernel code, is subject to ongoing change. Kernel API changes must be dealt with, problems need to be fixed, and newer versions of the hardware must be supported. Depending on how much beer was involved, the original author may remember that device's peculiarities, but those who follow will not. Everybody would be better served if the driver did not just make the hardware work, but if it also made the reader understand how the hardware works.

Doing so is not usually hard. Define descriptive names for registers, bits, and fields rather than putting in hard-coded constants. Note features that are incompletely described, incorrectly described, or entirely science-fictional. Comment operations that have non-obvious ordering requirements or that do not play well together. And, in general, code with a great deal of sympathy for the people who will have to make changes to your work in the future. Some hardware can never be properly documented because the relevant information is simply not available; see this 2006 article for an example. But what information is available should be made available to others.

Core kernel hackers are occasionally heard to make dismissive remarks about driver developers and the work they do. But driver writers are often given a difficult task involving a fair amount of detective work; they get this task done and make our hardware work for us. Writing drivers that adequately document the hardware is not an unreasonable thing to ask of these developers; they have the hardware knowledge and the skills to do it. The harder problem may be asking driver reviewers to insist that this extra effort be made. Without pressure from reviewers, many drivers will never enable readers to really understand what is going on.

Index entries for this article
KernelDevelopment model/Kernel quality


to post comments

Drivers as documentation

Posted Nov 23, 2011 23:50 UTC (Wed) by magnuson (subscriber, #5114) [Link] (3 responses)

As a digital hardware engineer myself I have to say that I got some hearty LOLs out of this article. Sadly, all of this is true. Specs are written before the hardware is coded and seldom is it updated after the fact once the rubber hits the road.

As for the the example I can tell you precisely why it's documented that way. "DMA engine randomly locks up" is a hardware bug that customers may demand fixed at great expense. A typical full layer spin for an IC will run you about $500,000 these days and that's if you have push a lot of volume. "Reserved" is no problem at all. Just don't do that. Simple.

I've coded a few magic constants in my day but they've been clearly labels as such with at least an attempt to describe what they do. Mostly they are escape hatches in case of bugs elsewhere... In any case my neck remain un-wrung to I must be doing something right.

Drivers as documentation

Posted Nov 24, 2011 11:57 UTC (Thu) by gnb (subscriber, #5132) [Link] (1 responses)

> "Reserved" is no problem at all. Just don't do that. Simple.

Ideally yes, but in the example given the bit defaults to the wrong value and needs to be set in defiance of the datasheet. This does happen.

Drivers as documentation

Posted Nov 25, 2011 5:56 UTC (Fri) by jzbiciak (guest, #5246) [Link]

In some cases, the "reserved" bits exist to disable or modify some feature, say as part of a test mode. They get marked "reserved", because a future product in the same family might want to use that bit for some other functionality. Or, the behavior of the test modes themselves might change between different revisions of the device.

So it doesn't surprise me at all that there might not be reasonable customer-level documentation for some of these "reserved" bits. At the same time, I can also see that workarounds for flaky designs ("Oops, the frobnitz accelerator sometimes frotzes when it should gronk, if two quux DMAs come in consecutive cycles") might rely on these "reserved" bits. The bit in the article above might be documented internally as "disable frobnitz acceleration (internal test purposes only; do not use for normal operation)".

Other times, it's due to the same peripheral getting used in different configurations on different chips, and the field in question should be "irrelevant" for this particular chip. So the bit exists and the feature it controls exists, but the feature isn't necessarily useful on this chip, or wasn't spec'd to be on this chip, and so it might not be tested. The fact that enabling it anyway stops DMAs from randomly crashing might be a happy accident.

In my day job, I'm at the head of one of these pipelines, writing specs that lead to the hardware and later to the customer documentation. I also had some good chuckles reading through this article. From what I've seen, the process of turning my specs into end customer documentation involves a lot of deleting (missing some of the internal implementation details, but invariably deleting some important detail customers need), and inserting several *ahem* interesting grammatical twists and confusing diagrams. I don't envy the folks that have to make the end-customer documentation from my specs, but sausage making is sausage making, no matter who makes it.

Drivers as documentation

Posted Dec 1, 2011 16:41 UTC (Thu) by LeftCoastDave (guest, #81645) [Link]

I'm glad you are a HW engineer and you admit it. In all my years of firmware, I've met very few HW engineer's that will admit this.

This article hit very close to home, especially the part about DMA randomly locking up. I previously spent a lot of time on a deeply embedded firmware design where I would scan all DMA operations looking for certain lockup conditions, then abort the DMA hardware operation and replace the operation by a firmware memcpy.

I believe the issue with digital HW engineers comes from their work environment. Here are a few reasons:

1) Their designs tend to be quite in depth, and it is difficult to read verilog/vhdl code, so the code is heavily supported by word docs with block diagrams. This of course results in code slowly migrating away from documentation, and without the review process only the designer of the module knows how it works. They've done this for so long, that they believe this is the only way to do development.

2) Because of (1), brick walls form around modules, and only black box testing is done by simulations. Now no one looks at your code, no one appreciates being told that their code isn't up to par for readability, so they get very defensive and don't like to admit any fault internal to their design.

3) On top of all that chip schedules are always rushed, but they have no easy ability to go back and fix small mistakes, so they try and hide them under the carpet and blame firmware, or expect firmware to just figure out how to make it work.

4) It takes month for a chip to come back from the fab plant, months later for firmware to finish a dev. kit for it, and start shipping it out to customers, months later when a customer finds a bug. A year turn around is normal, longer is not uncommon. Digital HW engineers are usually onto another project by that time

Now every time a bug is found there is knee jerk reaction to blame firmware, even when a firmware team approaches the problem as "we don't know where the bug is, but we need helping locating it". The HW engineer just doesn't want to open up that can of worms, they know was just a rush job. Excuses like "We aren't going to re-spin the chip for this bug, so why should I spend time looking into it", are the norm. This really doesn't help, firmware implement a workaround, when the firmware team doesn't know the root cause.

I believe the only way this can change is from a management viewpoint that digital designers aren't just designing a chip right now, but they need to support the chip for years, and that should be budgeted for right up front. Documentation including verilog/vhdl code comments can happen well after the chip has taped out. This needs to be a serious priority in any schedule.

Drivers as documentation

Posted Nov 24, 2011 10:30 UTC (Thu) by wsa (guest, #52415) [Link] (2 responses)

The recommendations given here how code and documentation should be are very valid. Yet, I feel you have been hit by a bad example (which _are_ in the kernel, sadly), and generalize from that, which I feel, is inapropriate.

I am mainly active in driver subsystems which are used for current SoCs, so more than one. In none of those subsystems, such hardcoded values you mentioned would hardly ever pass a review. Comments are also required every time the datasheet is wrong or unclear. Most reviewers are very aware of that and insist on that. The bigger problem here IMHO is that it is often very hard to see if a developer spent days on a simple writel(), because it is one of many writel() in the code. So, for a review which goes into that level of detail, a reviewer would have to study the datasheet at least as deep as the driver author. Deeper would be even better. Given that usually around 60% of all changes per release go to drivers, this is unlikely to happen. Most subsystems are already short of reviewers, because the code affects less people. We have a scaling problem here, and sadly, this is no news. Spreading the word like this hopefully helps a little bit, although I'd think this should more go to driver developers than reviewers...

Drivers as documentation

Posted Nov 25, 2011 5:59 UTC (Fri) by jzbiciak (guest, #5246) [Link] (1 responses)

It wouldn't surprise me if some of these magic constants bleed into kernel code because they came from a poorly documented reference implementation from the vendor itself. It could be that the Linux driver writer never knew the full interpretation of the magic constant to begin with.

Drivers as documentation

Posted Nov 25, 2011 10:10 UTC (Fri) by wsa (guest, #52415) [Link]

Yup, that's why I wrote 'hardly'. There are also drivers which mainly copy sniffed behaviour, but work fine, nonetheless. But that's a detail, the bigger issues are outlined above.

Drivers as documentation

Posted Nov 24, 2011 13:21 UTC (Thu) by juliank (guest, #45896) [Link] (2 responses)

Documentation or self-documenting code is important. For example, for the nvec driver (in staging), I have added code documentation (mostly kernel-doc-style function documentation) and am now working on moving the magic constants and stuff into enums with readable and understandable names.

It looks fairly out of place compared to the rest of the kernel, but since we don't have any documentation on that device at the moment, and implement the driver in the community, having the documentation makes it easier to understand the driver and fix bugs.

We also have some strange things, like having to add an udelay() somewhere because the device otherwise locks up, and we don't know why. NVIDIA's original Android driver did not have those things and worked. Maybe there documentation has information about this, but we don't have it (although it might be made public if we're lucky).

Drivers as documentation

Posted Nov 25, 2011 6:02 UTC (Fri) by jzbiciak (guest, #5246) [Link] (1 responses)

Ok, this is somewhat offtopic, but it piqued my curiosity...

We also have some strange things, like having to add an udelay() somewhere because the device otherwise locks up, and we don't know why

Memory ordering issue, and a barrier of some sort is required? Does the lockup happen on the same chip as the original usleep-less Android implementation?

Drivers as documentation

Posted Nov 25, 2011 14:44 UTC (Fri) by juliank (guest, #45896) [Link]

> Memory ordering issue, and a barrier of some sort is required?
> Does the lockup happen on the same chip as the original
> usleep-less Android implementation?

Seems I misremembered. It does not lock up, it just sends
incomplete messages. I added an udelay(100) in commit
de839b8f06bc5dd3f5037c4409a720cbb9bf21c3 [1] which seems to
prevent that.

[1] https://git.kernel.org/?p=linux/kernel/git/torvalds/linux...

Thanks for the memories :-)

Posted Nov 24, 2011 16:50 UTC (Thu) by felixfix (subscriber, #242) [Link]

Haven't written device drivers for a few years now, I miss it, but careers change in unexpected ways, and I doubt anyone would hire me for that now. But one of the most satisfying aspects was debugging the datasheets. Some were just slipshod and done in a hurry and fairly easy to figure out even before writing code. Some were horrible; I have memories of some Motorola QUICC chip which listed about 17 steps before even blinking an LED, and they had steps 8 and 9 misplaced or reversed or some such, in a completely non-obvious way.

Hair pulling while it went on, but as progress was made, satisfaction grew, and the final victory was sweet indeed.

Thanks for the memories. A part of me hopes the engineers are never given so much time and incentive that the datasheets are always correct ...

Must Be One

Posted Nov 24, 2011 17:36 UTC (Thu) by wildea01 (subscriber, #71011) [Link]

There's a lovely bit in the hardware configuration register for the SMSC9118 ethernet controller:

BIT: 20
DESCRIPTION: Must Be One (MBO). This bit must be set to “1” for normal device operation
TYPE: R/W
DEFAULT (reset value): 0

We do at least set it in Linux, but it's a bit cryptic:

#define HW_CFG_SF_ (0x00100000) /* R/W */

God knows what it's really doing.

Drivers as documentation

Posted Nov 24, 2011 19:57 UTC (Thu) by giraffedata (guest, #1954) [Link]

I have on multiple occasions written an entire manual for a piece of hardware as I reverse engineered it in trying to use it. The problems go beyond incorrect and flatly missing information, because there is also information that just isn't clear. There's a certain satisfaction in setting the world right by doing the job the hardware supplier failed to do.

But I don't see much value in agitating for device driver writers to do this work. The same reasons that drive a device maker not to provide adequate documentation also drive the driver writers.

Our salvation may be that the drivers, unlike the data sheets, are open source, so the next frustrated user might correct the problem.

Generating driver code from specification

Posted Nov 24, 2011 22:10 UTC (Thu) by cpeterso (guest, #305) [Link]

Device drivers are such a huge portion of Linux kernel code (and bugs!) that creating a domain specific language might be a worthwhile abstraction. The DSL compiler could have lots of static error checking and then generate driver code. As kernel APIs or best practices change, then DSL compiler can be updated and regenerate new drivers without changing the input specs.

A sneaky benefit is that the spec DSL could be designed such that device manufacturers must better document how their hardware actually works. :)

Intel Labs is working on a project (with funding from Google) called Termite that generates driver code from a hardware specification language:

http://ertos.nicta.com.au/research/drivers/synthesis/

http://www.theregister.co.uk/2011/06/10/automatic_device_...

Drivers as documentation

Posted Nov 27, 2011 5:52 UTC (Sun) by neilbrown (subscriber, #359) [Link] (1 responses)

> So, if a driver contains code like:
>
> writel(devp->regs[42], 0xf4ee0815);
>
> it is missing something important.

One could argue that one important thing this code fragment is missing is GPL compliance.

> The source code for a work means the preferred form of the work for
> making modifications to it.

You cannot make sensible modifications to the above. You need it to be combined with documentation. So if the datasheet were included with the source code it might be OK. But without the data sheet ... it seems little different to binary firmware blobs.

(I've just been looking at init_lb035q02_panel() in

drivers/video/omap2/displays/panel-lgphilips-lb035q02.c

and feel that something is definitely missing in the freedom I am being given to understand and modify this code).

Do we need a sub-tree of Documentation which contains lots of PDFs of data sheets???

Datasheets

Posted Nov 27, 2011 15:36 UTC (Sun) by corbet (editor, #1) [Link]

The problem, of course, is that lots of driver developers get the datasheets under NDAs and cannot contribute them anywhere public. There are those who say that we simply should not develop drivers under such conditions, while others say it's better to have the driver with as much useful information crammed into it as possible.

GPL compliance seems like a hard claim to make, in any case. There is no other form of the code to be "the preferred form of the work for making modifications."


Copyright © 2011, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy