A longstanding GnuTLS certificate validation botch

By Jake Edge
March 5, 2014

Something rather reminiscent of Apple's "goto fail;" bug has been found, but this time it hits rather closer to home for the free software community since it lives in GnuTLS. Certificate validation for SSL/TLS has been under some scrutiny lately, evidently to good effect. But this bug is arguably much worse than Apple's, as it has allowed crafted certificates to evade validation checks for all versions of GnuTLS ever released since that project got started in late 2000.

Perhaps the biggest irony is that the fix changes a handful of "goto cleanup;" lines to "goto fail;". It also made other changes to the code (including adding a "fail" label), but the resemblance to the Apple bug is too obvious to ignore. While the two bugs are actually not that similar, other than both being in the certificate validation logic, the timing and look of the new bug does give one pause.

The problem boils down to incorrect return values from a function when there are errors in the certificate. The check_if_ca() function is supposed to return true (any non-zero value in C) or false (zero) depending on whether the issuer of the certificate is a certificate authority (CA). A true return should mean that the certificate passed muster and can be used further, but the bug meant that error returns were misinterpreted as certificate validations.

Prior to the fix, check_if_ca() would return error codes (which are negative numbers) when it encountered a problem, which would be interpreted as a true value by the caller. The fix was made in two places. First, ensuring that check_if_ca() returned zero (false) when there were errors, and second, also testing the return value in verify_crt() for != 1 rather than == 0.

It is hard to say how far back this bug goes, as the code has been restructured several times over the years, but the GnuTLS advisory warns that all versions are affected. There are a lot of applications that use GnuTLS for their SSL/TLS secure communication needs. This thread at Hacker News mentions a few, including Emacs, wget, NetworkManager, VLC, Git, and others. On my Fedora 20 system, attempting to remove GnuTLS results in Yum wanting to remove 309 dependent packages, including all of KDE, Gnucash, Calligra, LibreOffice, libvirt, QEMU, Wine, and more.

GnuTLS came about partly because the OpenSSL license is problematic for GPL-licensed programs. OpenSSL has a BSD-style license, but still includes the (in)famous "advertising clause". The license has been a source of problems before, so GPL programs often avoid it. One would hope that the OpenSSL developers are diligently auditing their code for problems similar to what we have seen from Apple and GnuTLS.

It was a code audit done by GnuTLS founder Nikos Mavrogiannopoulos (at the request of Red Hat, his employer) that discovered the bug. He may well have been the one to introduce it long ago, as he has done much of the work on the project—and the file in question (lib/x509/verify.c). He described it as "an important (and at the same time embarrassing) bug". It is clearly that, but it is certainly a good thing that it has at last been found and fixed.

Several commenters in various places have focused on the "goto" statement as somehow being a part of the problem for both Apple and GnuTLS. That concern seems misplaced. While, in both cases, a goto statement was located at the point where the bug was fixed, the real problem was twofold: botched error handling and incomplete testing. While Edsger Dijkstra's advice on goto and its harmful effects on the structure of programs is cogent, it isn't completely applicable here. Handling error conditions in C functions is commonly done using goto and, if it is done right, goto actually adds to the readability of the code. Neither Apple nor GnuTLS's flaw can really be laid at the feet of goto.

In something of a replay of the admonishments in last week's article on the Apple flaw: all security software needs to be better tested. We are telling our users that we are protecting their communications with the latest and greatest encryption, but we are far too often failing them with implementation errors. Testing with bad certificates would seem to be a must; some presumably was done for both code bases, but obviously some possibilities of badly formed or signed certificates were skipped. More (and better) testing is indicated.

[ Thanks to Paul Sladen for the heads-up about this bug. ]

Index entries for this article
Security	Secure Sockets Layer (SSL)

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 15:51 UTC (Wed) by zorro (subscriber, #45643) [Link] (90 responses)

I cannot help wonder why such critical code is written in C and not C++. With RAII and exceptions you get automatic cleanup and error propagation for free.

That said, refactoring code without an automated test suite that covers that code is just asking for trouble.

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 16:05 UTC (Wed) by pizza (subscriber, #46) [Link] (5 responses)

> I cannot help wonder why such critical code is written in C and not C++. With RAII and exceptions you get automatic cleanup and error propagation for free.

Probably because C++ actually gives you more way to shoot yourself rather than fewer -- It has all of the problems of straight C, and a boatload of new ones.

And that's even before you start talking about buggy standard libraries, compiler/platform idiosyncracies, and other things that make C++ considerably less portable than straight C.

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 20:36 UTC (Wed) by luto (subscriber, #39314) [Link] (4 responses)

Bah. If you use C++ cleanly (as oppsed to saying "yay, fancy features!") you can do a decent job. But writing exception-safe code, for example, is hard.

I'm cautiously optimistic that Rust will improve the situation.

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 21:36 UTC (Wed) by tjc (guest, #137) [Link]

> I'm cautiously optimistic that Rust will improve the situation.

I'm hopeful, but not yet optimistic. I have "fought the long defeat" for too long to be optimistic without empirical evidence.

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 23:00 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Rust adds a natural error propagation boundary - task. So exceptions simply crash the task, cleanly releasing all the resources associated with it.

It feels strange at first, but once you adapt the style to 'let it crash' tasks - it becomes natural.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 12:58 UTC (Thu) by etienne (guest, #25256) [Link] (1 responses)

> It feels strange at first, but once you adapt the style to 'let it crash' tasks - it becomes natural.

But then programmer begin to use the "let it crash", when for instance synchronization should have been needed, and you finish by having better performance analyzer which tells you the number of crash per second of each subsystems, to know which subsystem you need to rewrite to get better interactivity...

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 16:24 UTC (Thu) by apoelstra (subscriber, #75205) [Link]

> But then programmer begin to use the "let it crash", when for instance synchronization should have been needed, and you finish by having better performance analyzer which tells you the number of crash per second of each subsystems, to know which subsystem you need to rewrite to get better interactivity...

Rust conditions ought to be for exceptional situations, just as exceptions are. So if you are a situation with several crashes per second probably you have a defective design (or some serious bugs). A task crashing would probably result in something like a '500 Internal server error' (or for a GUI application 'This tab has crashed, please reload the page and try again'), and these things should not be happening as a matter of course.

I certainly can't see a 'crash on everything' scenario becoming idiomatic.

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 16:12 UTC (Wed) by mpr22 (subscriber, #60784) [Link] (13 responses)

I cannot help wonder why such critical code is written in C and not C++.

Because libraries implemented in C++ generally only get used by programs written in C++, because interfacing to libraries implemented in C++ from any other language entails significant pain. I like C++, but I have no difficulty understanding why people don't use it to implement libraries intended for general adoption in non-C++ programs.

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 21:45 UTC (Wed) by cesarb (subscriber, #6266) [Link] (8 responses)

> Because libraries implemented in C++ generally only get used by programs written in C++, because interfacing to libraries implemented in C++ from any other language entails significant pain.

What matters is the interface, not the implementation. You can have a library implemented in C++ but with a pure 'extern "C"' interface, and it'll be as easy to use as a library implemented in C.

No, the real problem with writing a library in C++ is the "which C++ runtime" question. The C++ ABI is not a solid as the C ABI, and there is more than one C++ runtime. Loading more than one C++ runtime on the same process, with the common ELF linking rules, is not a good idea.

(On Win32, there is more than one C runtime, but it's less of a problem because its symbol resolution rules make it easy to have several incompatible runtimes in the same process without many issues.)

But I believe the reason C is used instead of C++ is complexity and historical reasons. Many libraries were first written back when C++ compilers were not as good, so they used C by default; and C is simpler to write and understand.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 3:10 UTC (Thu) by proski (subscriber, #104) [Link] (7 responses)

Does it mean "DLL hell" is now a problem on Linux and not on Windows? That's something to think about, especially if it hinders C++ adoption.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 7:06 UTC (Thu) by Seegras (guest, #20463) [Link]

It still is a problem on Windows. And there's also a "Codec Hell" on Windows.

A longstanding GnuTLS certificate validation botch

Posted Mar 8, 2014 17:41 UTC (Sat) by smurf (subscriber, #17840) [Link] (1 responses)

… and hindering C++ adoption is a bad thing in what way exactly?

SCNR,

A longstanding GnuTLS certificate validation botch

Posted Mar 9, 2014 7:00 UTC (Sun) by HelloWorld (guest, #56129) [Link]

C++ is an abomination, and yet it's still much better than C.

A longstanding GnuTLS certificate validation botch

Posted Mar 10, 2014 19:05 UTC (Mon) by drag (guest, #31333) [Link] (3 responses)

> Does it mean "DLL hell" is now a problem on Linux and not on Windows?

It's always been a problem on both systems. Microsoft largely solved their issue by taking various approaches to versioning DLLs, preventing applications from overwriting library files, static compiling, heuristics involving locating and using libraries based on what was bundled with the applications and a few other things I only have a distant and vague memory of.

Linux distributions fixed it's own version this by simply making things really difficult for users that try to install anything that isn't carefully compiled, versioned, and controlled by whatever distribution they happen to be using. Then when users run into problems with missing libraries or conflicts they can are called idiots in online forums and such places for not using one of the applications that can be installed by the local package management software... and if that didn't work then the normal advice is to reformat the disk drive and install a different Linux OS that probably had a working version of whatever application they are struggling with.

> That's something to think about, especially if it hinders C++ adoption.

C++ ABI breakage caused myself and many other Linux users a huge number of headaches in the past. Now it's been a long time since I've ran into issue.

However I still consider it a very bad sign whenever I run into a applications or APIs that makes use of 'boost' or any similar binding generation thingy. If it happens to work, great, but if it doesn't then it's going to be a nightmare.

A longstanding GnuTLS certificate validation botch

Posted Mar 25, 2014 15:50 UTC (Tue) by nix (subscriber, #2304) [Link] (2 responses)

Uh, drag, Linux's DLL hell was largely (though not entirely) obviated by SONAMEs and symbol versioning. You seem to have become so obsessed with the claim that distributions are evil that you're allowing it to twist your view of the world into zones best described as mendacious reasoning.

A longstanding GnuTLS certificate validation botch

Posted Mar 25, 2014 17:21 UTC (Tue) by khim (subscriber, #9252) [Link] (1 responses)

Unfortunately that's not true. SONAMEs and symbol versioning could fix the “DLL hell” in theory, but they are not used consistently (libraries are not bumping SONAMEs when they change API and/or they don't use symbol versioning consistently thus you can not use different versions in the same process). In the end situation for third party developers is better in Windows: at least there you can use boost both in application and it's plugin even if versions of boost are different. On Linux… it does not really work.

A longstanding GnuTLS certificate validation botch

Posted Mar 25, 2014 18:50 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

After struggling with packaging application on Linux for multiple platforms I've learned to like Windows. It keeps developers honest - there is no single global prefix for libraries, so each dependency has to be referred specifically from the build files. So assembling all the dependencies in one installer package is pretty much trivial.

On Linux it's superficially easy to just install all the dependencies using apt-get or similar package managers. Everything will even work on developer's machine. But then comes the packaging time and extracting all these dependencies (even accidental ones) becomes a task which can easily result in a non-working package if you forget something.

And that's even without going into the !@#&!&*!^@ glibc versioning that forces me to use an ancient RHEL image to get a 'universal' binary.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 20:19 UTC (Thu) by dashesy (guest, #74652) [Link] (3 responses)

Should also add that C++ is harder and fewer people can get it right, whereas some elite companies (Google, ...) can filter out people who do not understand stupid but hard C++ idioms like virtual inheritance, other projects should not raise the skill-set level too high to afford over their lifespan. In lay terms, you never know if the next person understands the *fancy* feature you like correctly, or just will screw things up.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 22:07 UTC (Thu) by khim (subscriber, #9252) [Link] (2 responses)

Actually Google does not do that. It's just don't worth it.

What Google can do (and what it actually does) is creation of some kind of Google/C++ language where some missing pieces are filled with Libraries/base and where most problematic pieces are removed by a style guide.

This approach works, but it's really hard to do: in a sense that means that each С++ project uses it's own XXX/C++ language. The fact that there are no "battareis included" C++ is really hurting it. And no, boost is not it: it's just a pile of tricks some of which are great and some of which are really awful.

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 2:41 UTC (Fri) by dashesy (guest, #74652) [Link] (1 responses)

Yes the only sane way of using C++ is to take a subset of it and force it via policy, meanwhile enforce code review and the policy strictly.
I know many who love boost and cannot live without it, but IMO it is the worst type of hacks it is a language-level hack to foist C++ to something that it is not. This is besides the fact that one has to make sure the compiler itself knows C++ quirks enough to compile boost correctly!
On the other hand Qt is beautiful and can make C++ tolerable.

A longstanding GnuTLS certificate validation botch

Posted Mar 9, 2014 7:11 UTC (Sun) by HelloWorld (guest, #56129) [Link]

> I know many who love boost and cannot live without it, but IMO it is the worst type of hacks it is a language-level hack to foist C++ to something that it is not.
This is Not Even Wrong. It's so vague it's meaningless and you're talking as if boost were a single library while in reality it's a collection of libraries, some of which are unquestionably great.

> This is besides the fact that one has to make sure the compiler itself knows C++ quirks enough to compile boost correctly!
The solution to broken compilers is to not use them.

RAII in C

Posted Mar 5, 2014 16:53 UTC (Wed) by cesarb (subscriber, #6266) [Link] (7 responses)

It's not well known, but at least with gcc you can have RAII in C (and given how compatible clang is with gcc, it probably also has the same extension). It's widely used in the systemd codebase (which, as is well-known around here, is not shy about using less-portable but useful features).

RAII in C

Posted Mar 5, 2014 17:03 UTC (Wed) by tjc (guest, #137) [Link] (6 responses)

Via the cleanup function, or is there some better way now?

RAII in C

Posted Mar 5, 2014 17:36 UTC (Wed) by bronson (subscriber, #4806) [Link] (5 responses)

Yeah! If they're talking about this:

    void cleanup_file(FILE **fp)
    {
        // handle a bunch of boundary conditions, probably using global variables
        ...
    }

     int main() {
         FILE *fp __attribute__ ((__cleanup__(cleanup_file)));
         ...
     }

then ew.

But if there's a better way I really want to hear it!

RAII in C

Posted Mar 5, 2014 20:29 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

It'd be nicer with "__cleanup__(func, ctx)" so you could hide all of the global variables from the function. Alternatively, wrap FILE* in a struct with the data you'd have in ctx.

RAII in C

Posted Mar 5, 2014 21:29 UTC (Wed) by cesarb (subscriber, #6266) [Link] (2 responses)

systemd's way is:

void foo(void) {
    _cleanup_fclose_ FILE *f = NULL;
    ...
}

Using the following definitions:

#define _cleanup_(x) __attribute__((cleanup(x)))

static inline void fclosep(FILE **p) {
    if (*p)
        fclose(*p);
}

#define _cleanup_fclose_ _cleanup_(fclosep)

(Actually, fclosep is defined via another macro, DEFINE_TRIVIAL_CLEANUP_FUNC, but it expands to the static inline I typed above.)

Not that ugly (compare "_cleanup_fclose_ FILE *f = NULL;" with a C++ equivalent of "scoped_FILE f(nullptr);", assuming an equivalent scoped_FILE helper class). Yes, C++'s way is slightly cleaner to use (and way more complex to define), but the difference is not that big.

RAII in C

Posted Mar 6, 2014 18:43 UTC (Thu) by bronson (subscriber, #4806) [Link] (1 responses)

Hm, not bad... My overengineering spidey sense is tingling a bit but, I agree, it doesn't seem any worse than other non-GC solutions.

Looking forward to playing with these macros next time I'm waist deep in C.

RAII in C

Posted Mar 6, 2014 22:52 UTC (Thu) by cesarb (subscriber, #6266) [Link]

It might look like overengineering because I only showed one of the _cleanup_*_ definitions. The version of the systemd code I'm looking at has nine different _cleanup_*_ variants, four of them defined via the DEFINE_TRIVIAL_CLEANUP_FUNC macro. Search for _cleanup_ at http://cgit.freedesktop.org/systemd/systemd/tree/src/shar... (and pretend the rest of that file does not exist; it lives up to the tradition of a module called "util" being used as a dumping ground for assorted bits and pieces).

RAII in C

Posted Apr 9, 2014 13:02 UTC (Wed) by psevon (guest, #96490) [Link]

The problem with cleanup is that it doesn't allow for passing the allocated object to an outer scope and stil be subject to automatic cleanup, which is one of the main strengths of smartpointers in C++. It also doesn't work nicely together with setjmp/longjmp based exception mechanisms, since the intermediate scope (i.e., functions between the function doing setjmp and function doing longjmp in the call stack) cleanups will never be triggered. See my project https://github.com/psevon/exceptions-and-raii-in-c for an implementation of smartpointers and exceptions in C that avoids the above mentioned problems.

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 17:23 UTC (Wed) by nix (subscriber, #2304) [Link] (45 responses)

That doesn't help. The problem isn't error propagation: it's that different work is necessarily done on error and non-error paths, and error paths hugely outnumber non-error paths in most software but are hardly tested. Automatic error propagation doesn't reduce the number of error paths but can often *increase* it, by enabling silent, invisible error propagation from loci in the code that the developer has no idea can fail at all. (Witness how long it took to come up with rules for exception-safe containers in C++, and how complicated they are.)

It is very rare to test all failure paths in code. Heck, it's rare even to test all memory-allocation loci under conditions of failure: the only free software I know of that does such testing is SQLite. As for other error paths, it's probably best to assume that they're barely tested at all except by accident and by users. All of us writing software are to blame here -- but testing error paths comprehensively is terrifyingly hard and astonishingly boring, and our bosses and boredom thresholds will rarely let us spend twenty times longer writing the testcases than we spend writing the code. We probably need some sort of automated way to inject faults at every potential fault locus, and test the resulting code, but I know of few such frameworks :/

(Even if that were done, the problem of *multiple* errors happening in combination remains, as does the problem of multiple errors happening in narrow race windows in concurrent systems. Such multiple failure cascades (mostly in non-software domains) are the predominant cause of disasters in most safety-critical systems, such as aviation. Systematic testing is of course incapable of identifying such faults due to the exponential explosion of test cases with error paths (much faster than exponential if 'error races' are considered), and formal proof is impractical in all but a tiny minority of cases: doing it automatically is likely to be horribly limiting too, I can see giant neon letters labelled 'RICE'S THEOREM' lighting up the sky from here.

We are, basically, screwed here. I can see no solution and no practical way to even improve the situation significantly. Testing all failure paths in isolation is barely possible, but that will only solve *some* of these problems, not all; perhaps not even most.

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 17:55 UTC (Wed) by raven667 (subscriber, #5198) [Link] (2 responses)

> We are, basically, screwed here. I can see no solution and no practical way to even improve the situation significantly.

This is probably the best perspective to start from, even though it is depressing for those who get into computers for the mathematical perfection. You get better by accepting risk and then managing it, rather than trying to eliminate risk using math or by expecting perfection.

How would computing and networking look if we just accepted the risk and rolled back all of the encryption, signature verification, complex policy tagging and enforcement and just had some very basic permissions. It's like ChaosMonkey, how you design complex systems to be robust by expecting failure rather than treating it as exceptional. Would we develop more robust audit features or flexibility to roll back malicious changes rather than trying to prevent them. Malicious activity on computer networks is ultimately caused by people, and as a species we've been dealing with malicious people since forever, how could those coping mechanisms be translated into the computer network?

Could we handle the problems that encryption and the like are intended to solve in meatspace rather than with technology in computerspace?

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 13:08 UTC (Fri) by robbe (guest, #16131) [Link] (1 responses)

> Would we develop more robust audit features or flexibility to roll back
> malicious changes rather than trying to prevent them.

Encryption is not about data integrity ... and you can't "roll back" confidentiallity.

Even better, you usually can't even detect that confidentiality has been breached. That's in stark contrast to something like ChaosMonkey.

> Could we handle the problems that encryption and the like are intended to
> solve in meatspace rather than with technology in computerspace?

Do elaborate. How can an intimate conversation (attorney-client, spouse-spouse, priest-penitent) be kept private without technical means (encryption in this case)?

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 17:16 UTC (Fri) by raven667 (subscriber, #5198) [Link]

In meat-space you can use a long range microphone for example to overhear intimate conversations, but there are rules about doing so and using this information which can be enforced regardless of the technical capability of gathering or preventing the gathering of this information. You are right and that there is no perfect prevention built into this scheme, some number of people will abuse their ability to eavesdrop, but that's still true with technical prevention because it is not perfect and you are no longer prepared to deal with the loss of confidentiality.

At some point it's always good to re-validate your assumptions, we keep spending more calories in prevention technology, have we exceeded the cost of what the loss would be without the prevention?

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 19:34 UTC (Wed) by zorro (subscriber, #45643) [Link] (11 responses)

Part of the problem is error propagation. From the article:

A true return should mean that the certificate passed muster and can be used further, but the bug meant that error returns were misinterpreted as certificate validations. [...] Prior to the fix, check_if_ca() would return error codes (which are negative numbers) when it encountered a problem, which would be interpreted as a true value by the caller.

This problem is caused by the fact that C does not have a standard way to propagate errors. C++ does not have that problem: you thrown an exception. If you are smart enough to develop a critical security library then you should be smart enough to write exception-safe C++ code as well.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 1:37 UTC (Thu) by nix (subscriber, #2304) [Link] (10 responses)

You do realise that it took over a *decade* for the smartest C++ developers on the planet to figure out how to produce exception-safe containers in C++?

This is not remotely simple stuff. It's probably far harder than writing secure code.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 14:10 UTC (Thu) by jwakely (subscriber, #60262) [Link]

But since then the correct approach has been enshrined in the standard library and documented in several widely-read books. It's a solved problem now, and it's really not that hard. The world has moved on since the 1990s. http://www.boost.org/community/exception_safety.html is more than a decade old now.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 17:28 UTC (Thu) by zorro (subscriber, #45643) [Link] (8 responses)

This is not remotely simple stuff. It's probably far harder than writing secure code.

And yet, here we are discussing two critical bugs in supposedly secure C code, both related to error handling and resource cleanup.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 18:02 UTC (Thu) by pizza (subscriber, #46) [Link] (7 responses)

>> This is not remotely simple stuff. It's probably far harder than writing secure code.

> And yet, here we are discussing two critical bugs in supposedly secure C code, both related to error handling and resource cleanup.

Those two statements are not in contradiction. If gnutls was written in C++ the situation would have likely been far worse with many more (and even more difficult to test) problems lurking under the hood.

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 8:38 UTC (Fri) by zorro (subscriber, #45643) [Link] (6 responses)

That's pure speculation. How can you claim there would be many more problems lurking under the hood if gnutls were written in C++? Do you know how many problems there are lurking under the hood now?

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 10:20 UTC (Fri) by smurf (subscriber, #17840) [Link] (5 responses)

No, but we do know that C++ code, particularly when it's older, has failure modes which Mr. Stroustrup was unable to even conceive of when he first designed the thing and which C is completely incapable of.

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 10:51 UTC (Fri) by hummassa (guest, #307) [Link] (4 responses)

> No, but we do know that C++ code, particularly when it's older, has failure modes which Mr. Stroustrup was unable to even conceive of when he first designed the thing and which C is completely incapable of.

Now it seems that you're trolling. Which failure modes are those? The only failure modes I see in C++ are the C-related ones (null pointer dereferencing, buffer overflows, integer overflows and underflows).

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 12:22 UTC (Fri) by nix (subscriber, #2304) [Link] (3 responses)

Exception throws from unexpected places, leaving the code in an inconsistent state. (Yes, when properly written the code won't have any such bugs. When properly written, code has no bugs at all...)

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 19:34 UTC (Fri) by hummassa (guest, #307) [Link] (2 responses)

As someone else commented, the 90's are over for quite some time now.

> Exception throws from unexpected places

those, nowadays, call unexpected() instead of "leaving the program in an inconsistent state". unexpected(), left to its own devices, will abort the program.

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 20:37 UTC (Fri) by cesarb (subscriber, #6266) [Link] (1 responses)

That's only if you are using exception specifications, which is AFAIK not recommended (except for C++0x's nothrow).

I think what nix meant is: if you are not very careful, you can write code which is not exception-safe. An exception thrown in the middle of that code will lead to inconsistent state. RAII helps a lot, but not everything can easily be expressed in RAII style.

And even if you are very careful, code can have bugs. Exception-safety bugs can be quite hard to see by just reading the code: you have to consider that every line of code within a function could throw an exception. Even apparently innocent code like "a = b + c;" can throw an exception, courtesy of operator overloading.

Contrast this with C, where only function calls can do nonlocal exits, and even then only in the presence of longjmp(). Most functions will not call longjmp() (and if you use it from a signal handler, you deserve to lose). In C, the code flow is much simpler: it's all explicit, and visible by looking at the function's body. Even gcc's cleanup extension does not change that.

A longstanding GnuTLS certificate validation botch

Posted Mar 8, 2014 23:15 UTC (Sat) by nix (subscriber, #2304) [Link]

Quite. I'm not saying it's impossible to make it work, obviously it isn't. It's just not at all easy, and it's not obvious when you got it wrong.

I like exceptions, but I'm wary of them in much the same way as I would be of a gun that has a habit of firing spontaneously and exploding when fired. :)

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 21:12 UTC (Wed) by wahern (subscriber, #37304) [Link] (7 responses)

Just interpose malloc with dlsym+RTLD_NEXT. Then use backtrace(3) and/or GCC's __builtin_frame_address and __builtin_frame_return, plus libbfd for reading ELF information.

Put those two together, and you can programmatically fail malloc at any point in your program, more or less.

I've done this, but to profile memory usage, not for failure testing. For allocation failure testing I usually just interpose malloc, randomly return failure, and re-run for a very long time.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 1:38 UTC (Thu) by nix (subscriber, #2304) [Link] (6 responses)

That's pretty much how SQLite does it, yes (sans the horribly unreliable backtrace() / __builtin_frame_*() functions: it uses gcov instead, IIRC, which is surely a better way to verify test coverage.)

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 20:35 UTC (Thu) by wahern (subscriber, #37304) [Link] (5 responses)

AFAIU, gcov uses gdb for tracing/interposing malloc, and gdb gets its symbol information from libbfd (part of the gdb suite), so I'm not sure if gcov could be anymore reliable. I did have to write my own code for tracing the stack across a signal handler, but it basically does the same thing as gdb.

The beauty of rolling these tricks _into_ the application is that you can enable and disable them at anytime. And in my use cases, they're also controllable from Lua. So if somebody wants to profile a bit of code, on their development box or out in the field, there's zero time wasted bootstrapping an environment. I also wrote a slew of async-signal-safe routines (like time and string formatting routines, which are notoriously unsafe in glibc) and mutexes which allow me to log a full-blown stack trace on a segfault or lockup. That reduces turn around time on bug analysis tremendously. And all of this is fairly portable across different Linux environments, of which there are many at my workplace, without worrying about what packages are installed within the developer's or firmware's environment.

But I only put this stuff into heavily used and very complex applications. For smaller projects, yeah, it just makes sense to rely on the standard toolset.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 21:49 UTC (Thu) by nix (subscriber, #2304) [Link] (4 responses)

gcov doesn't interpose malloc at all. gcov just tells you which bits of the application you haven't tested yet (in particular, in this context, which failure paths): i.e. it's replacing the backtrace() horror. (But, of course, it's not rolled into the application, and it requires building with special flags and writes out coverage results as the application runs.)

For the malloc-interposition, one uses LD_PRELOAD as you suggested.

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 6:47 UTC (Fri) by peter-b (subscriber, #66996) [Link] (3 responses)

My preferred way of using gcov is in conjunction with a set of unit tests. This helps one confirm that the code has been tested rather than merely run!

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 12:20 UTC (Fri) by nix (subscriber, #2304) [Link] (2 responses)

Well, yes. The unit tests you'd use in conjunction with this one would be some evil thing that hooked up an LD_PRELOADed malloc() which listened to environment variables to tell it when to report failure ('make the Nth malloc fail', something like that), then iterated through all possible single-malloc-failure cases: probably the wrapper would signal via a special message on stderr or something when the 'Nth malloc' counter got high enough that it was never reached during program execution. Then you'd run your entire testsuite under such a failure-iterator, looking for abnormal failures (coredumps, etc: exit()s are probably expected, except in deep library code), and wait a very very long time.

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 16:44 UTC (Fri) by jwakely (subscriber, #60262) [Link] (1 responses)

I've always thought this was a far better solution (but have never actually used it): http://blogs.gnome.org/otte/2007/11/03/robustness-testing/

A longstanding GnuTLS certificate validation botch

Posted Mar 8, 2014 23:13 UTC (Sat) by nix (subscriber, #2304) [Link]

Oh, that's *neat*. You'd need to do a bit more work to make it work for threaded programs, programs with children, network state and the like, but it's still neat!

(I wonder if CRIU could be leveraged for this instead of a simple fork()?)

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 21:48 UTC (Wed) by jwarnica (subscriber, #27492) [Link]

Along the same line of thinking, is "crash-only software", discussed here, a while ago: http://lwn.net/Articles/191059/

(Though, I grant, that is a higher level architectural problem then low-level code, but the theory applies)

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 22:20 UTC (Wed) by paulj (subscriber, #341) [Link] (20 responses)

+1 to your comment. Exceptions can indeed make things worse, by exploding the number of possible branch-points for errors, while hiding them - making code harder for humans to reason about.

I do think though there are at least some techniques that might reduce the rate at which certain security sensitive errors are introduced, and the impact of bugs. In particular, state transitions could be specified in a more restricted and easier to verify way, e.g. as states and transitions in an FSM, rather than hand-written branches acting directly on state. I've this blog on it:

http://paul.jakma.org/2013/12/05/code-and-error-handling-...

An FSM is easier to analyse than full-blown code. E.g. it is guaranteed you can detect unreachable states. The states and transitions can be abstracted out from the rest of the code, and specified in a more concise way, making it easier for humans to analyse too. Idempotent error states are easier to specify, making it easier and safer for /other/ code to interact with that code.

It's no magic bullet, of course, but it can help. It applies to a variety of languages.

There's an unfortunate lack of general purpose FSM tools for C though, last I checked, to actually do the checking. The ones I know about seem to be heavily orientated to string-parsing FSMs.

Not relevant to the SSL bugs (Which were logical), but for network parsing code generally the code doing the syntactical parsing of external input should also use a bounded-buffer abstraction to access that external input. That some languages don't have built-in memory-access checking is not a barrier to implementing your own checking-abstractions!

The issue isn't really C, but programming practice that under-values using layers of formal abstractions, as well as testing. None of this is new.

PS: I'd love to hear about other abstractions that could be applied to help systematically catch errors. ???

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 22:25 UTC (Wed) by paulj (subscriber, #341) [Link]

Oh, and the first example in the final section, the series of operations wrapped in ifs for error-handling, looks a lot like the recently infamous Apple SSL code in outline. ;)

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 1:42 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

I agree with all of that. We really need a proper DSL for that sort of error-handling FSM though: the example in your blog post is quite hard to read (as are all FSMs done via raw C array initialization, even if the predominant error state is preinitialized).

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 9:36 UTC (Thu) by paulj (subscriber, #341) [Link]

Yes, agreed.

In my defence, the blog post after the example does say "Ideally, a language would have concise, syntactical support for specifying the allowed state transitions.", the example was meant to illustrate it can be done even without such (and is even more verbose, not making use of default value initialisation). ;)

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 10:27 UTC (Thu) by deepfire (guest, #26138) [Link]

I would argue that C itself is the problem, and no C++ is not the solution.

The solution is higher-level languages, that do not have you to "balance the stack manually", to put in in a way that's understandable to C programmers.

Expressive types allow you to encode _and_ automatically check higher-level properties of your program, in a way that's Rice-Uspensky-tractable.

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 21:32 UTC (Fri) by wahern (subscriber, #37304) [Link] (2 responses)

You can use Ragel's State Charts feature to write a concise FSM. State Charts allow you to define arbitrary state machines, and you don't need to literally be parsing any input. I could just as easily be feeding it a series of integers used in a switch-based FSM.

I wrote the following state chart (my first time using this method) for an asynchronous MySQL client library I wrote. I feed the state chart the tag byte of every packet received, and the FSM verifies that we're in the correct state. If the state doesn't transition to a valid new state as declared in the chart, then it errors out.

It looks like:

#
# MySQL Stream Tracer
#
# Separate FSM to keep track of what packets we're expecting. In
# the future this might need to be either all C code, or rolled into
# the stream machine.
#
machine label;

alphtype unsigned char;

access fsm->;

action greeting { packet->type = MYSQL_GREETING; }
greeting = any;

action okay { packet->type = MYSQL_OKAY; }
okay = 0x00;

action error { packet->type = MYSQL_ERROR; }
error = 0xff;

action soh { packet->type = MYSQL_SOH; }
soh = (1 .. 254);

action field { packet->type = MYSQL_FIELD; }
field = (0 .. 250);

action row { packet->type = MYSQL_ROW; }
row = (0 .. 253); action etb { packet->type = MYSQL_ETB; }
etb = 0xfe;

packets =
start : ( greeting @greeting -> result ),
result : ( okay @okay -> result | error @error -> result | soh @soh -> fields ),
fields : ( field @field -> fields | etb @etb -> rows ),
rows : ( row @row ->rows | etb @etb -> result );

action oops { fsm->error = MYSQL_E_OUTOFORDER; goto error; }

main := packets $!oops;

A longstanding GnuTLS certificate validation botch

Posted Mar 8, 2014 13:07 UTC (Sat) by paulj (subscriber, #341) [Link]

I didn't realise Ragel could handle binary input. Interesting, thanks!

A longstanding GnuTLS certificate validation botch

Posted Mar 8, 2014 23:26 UTC (Sat) by nix (subscriber, #2304) [Link]

That's still not a lovely syntax, but it's *much* better.

I've been meaning to learn more about ragel. Now I have an excuse :)

A longstanding GnuTLS certificate validation botch

Posted Mar 9, 2014 12:00 UTC (Sun) by ibukanov (guest, #3942) [Link] (12 responses)

The technique described in your blog effectively changes the code pattern:

int example() {
    ...
    int errors = f1();
    if (errors) {
         error_cleanup1();
         return errors;
    }
    int errors = f2();
    if (errors) {
         error_cleanup2();
         return errors;
    }
    int errors = f3();
    if (errors) {
         error_cleanup3();
         return errors;
    }
    ...
    normal_cleanup();
    return 0;
}

into

void example(int *errorp) {
    if (*errorp)
        return;
    ...
    f1(errorp);
    f2(errorp);
    f3(errorp);
    ...
    cleanup();
    return 0;
}

That is, this moves the error check into the callee freeing the caller from doing the repeated error checks. This is possible as long as the error state idempotent and indeed minimizes the number of branches in the code.

This pattern does not require using finite state machines for the whole code. I have read about it in a book published over 30 years ago about embedded systems programming and always wondered why this have never became mainstream.

A longstanding GnuTLS certificate validation botch

Posted Mar 9, 2014 15:31 UTC (Sun) by vonbrand (guest, #4458) [Link] (6 responses)

This complicates the callee for no benefit, it turns out to have to check for errors before doing anything. The error testing is hidden (how do I know all of the callees are doing their job?), and moreover handling cases where an error in the first part allows to do (part of) the rest are a mess.

Just not calling the other tasks if they make no sense trades a call and a branch inside for a simple branch. But who's worrying about performance in error paths... it does cut down on code size if the tasks are used repeatedly (only the branch inside, not each time it is called upon). Perhaps that was the real reason for this coding pattern?

A longstanding GnuTLS certificate validation botch

Posted Mar 9, 2014 16:39 UTC (Sun) by ibukanov (guest, #3942) [Link]

> This complicates the callee for no benefit, it turns out to have to check for errors before doing anything.

In that model the callee has to check for errors only as a debugging tool so the origin of the error and its propagation through the code can be logged. In general the idea is to treat errors similar to NaN that does not affects the data flow but only the result. That minimizes the amount of rarely taken branches potentially allowing to test all code paths.

> how do I know all of the callees are doing their job?

There is no need for that as long as the code that detected the error properly taints the evaluation with error condition.

> handling cases where an error in the first part allows to do (part of) the rest are a mess.

I do not see the source of supposed extra mess. If one can recover from a particular error, the error state can be cleared.

A longstanding GnuTLS certificate validation botch

Posted Mar 9, 2014 19:48 UTC (Sun) by ibukanov (guest, #3942) [Link] (4 responses)

> handling cases where an error in the first part allows to do (part of) the rest are a mess.

Consider what a good parser does when it detects a syntax error. First it reports it. Second it tries to recover from it guessing if necessary to allow to report *other* errors during single pass. The end result is that it produces a valid parsed tree reflecting its guesses but that tree is tainted with errors so the code generation would never be performed.

Effectively this replaces all the error checks in all callers in the parser implementation by a single check in the code generator for presence of errors.

A longstanding GnuTLS certificate validation botch

Posted Mar 9, 2014 22:12 UTC (Sun) by vonbrand (guest, #4458) [Link] (3 responses)

If you'd like to find all errors that is right. But if an error means curtains, better bail out early. In security-sensitive code, you'd better make sure there is no way the tainted state gets somehow untainted (by mistake or by nefarious intent).

A longstanding GnuTLS certificate validation botch

Posted Mar 9, 2014 22:34 UTC (Sun) by ibukanov (guest, #3942) [Link] (2 responses)

> But if an error means curtains, better bail out early.

in a security-sensitive code it is critical to be able to test all the branches if a formal verification is not feasible. An early bailout not only hinders that via exponential growth of branch space but also subjects the software for timing attacks.

As for accidental untainting, consider that GnuTLS and Apple bugs are exactly the examples of this occurring in the code that follows established practice of error handling in C. Yet I have not heard of bugs caused by untainting errors reported by a parser - typically an infrastructure to support reporting of multiple errors naturally minimizes the number of places in the code where the error state can be cleared. It is just harder to wipe out an error array than to clear or ignore a return value flag.

A longstanding GnuTLS certificate validation botch

Posted Mar 10, 2014 0:37 UTC (Mon) by vonbrand (guest, #4458) [Link]

Bailing out early is what cuts down exponential growth. Finding out that after the first sanity test fails the thirtieth does (or doesn't) adds no useful data (state is tainted, i.e., known broken anyway).

A longstanding GnuTLS certificate validation botch

Posted Mar 10, 2014 0:46 UTC (Mon) by vonbrand (guest, #4458) [Link]

In a compiler "bad untainting" leads to an error cascade, a sadly well-known phenomenon. But people just fix the obvious errors and compile again, they will rarely report that as a compiler bug.

A longstanding GnuTLS certificate validation botch

Posted Mar 9, 2014 16:04 UTC (Sun) by paulj (subscriber, #341) [Link]

I don't know why it isn't used more. I've only really seen it in network protocol code, where the code is implementing an RFC that explicitly specifies a state machine. Which makes it seem the authors didn't think too much about the other applications.

I wonder if maybe many programmers just aren't aware of this approach.

Especially with security sensitive parsers of external input (e.g. network message) I absolutely cringe when I see hand-crafted, irregular parsers, twiddling pointers directly into buffers in C. It's 2014....

I've seen a PhD that extended the Java language with support for specifying the allowed state transitions on objects. This allowed the compiler to do extra checks - sanity checks on the FSM and on users of the code. I think the author has moved on to other things though, I think.

I'd really like to see languages provide more support for more restricted abstractions, like FSMs, that can be layered over the code and be more easily checked for problems than that code.

A longstanding GnuTLS certificate validation botch

Posted Mar 10, 2014 18:59 UTC (Mon) by zblaxell (subscriber, #26385) [Link] (3 responses)

The second form is better than the first, but the first was bad to start with. C code with gotos is better than the first one.

Even then, the transformation doesn't reduce complexity. It just moves complexity around the program in a way that may make practical risks worse.

In the first case, there are lots of branches in the top-level function, but the preconditions of each function are stronger. This means f10() doesn't have to recheck f1()'s work, and neither do f9(), f8(), etc. A bug in f3() that fails to detect a violation of f4()'s preconditions could ruin our day, but f4() is simpler in this case, and if we get f3() wrong we'll probably get f4() wrong at the same time anyway.

In the second case, there are fewer branches at the top level, but the invariants, preconditions, and postconditions of the functions are weaker. f10() has to cope with everything f1() let through, and so does f9(), f8(), etc. This can breed code duplication and maintenance mistakes, and increases the attack surface considerably. Imagine fixing six slightly different string-quoting bugs because every function after f3() has to check for conditions that f3() already identified as part of an error state. We can avoid this failure mode by having every function check *errors and return early, but then we've just created a variation on the first case that the compiler can't optimize as easily, and that requires wrapper functions around libraries that don't use this convention.

Either one has a linearly expanding set of states inside example(), assuming you don't make a dumb semantic mistake. There's pretty much no difference from a formal coverage testing point of view.

A longstanding GnuTLS certificate validation botch

Posted Mar 10, 2014 19:45 UTC (Mon) by ibukanov (guest, #3942) [Link] (2 responses)

> In the second case, there are fewer branches at the top level, but the invariants, preconditions, and postconditions of the functions are weaker.

Preconditions and postconditions stay the same. What is different is a reaction to a violated precondition. In the first case that follows a typical C pattern the reaction is to bailout. In the second case the reaction is to recover via guessing what would be the right data while tainting the results.

> Imagine fixing six slightly different string-quoting bugs because every function after f3() has to check for conditions that f3() already identified as part of an error state.

There is no error state besides a pointer to error indicator and logging facilities. If f3() is responsible for enforcing the quotation, then in case of errors it should just change the data while tainting the calculation so the data as seen by the rest of code contain the proper quote. This is similar to what some parsers do to report multiple errors when they rewrite lookahead buffer as a part of recovery.

A longstanding GnuTLS certificate validation botch

Posted Mar 10, 2014 20:44 UTC (Mon) by vonbrand (guest, #4458) [Link]

It might well be dangerous to have e.g. f7() working on data known bad from before, some checks will have to be repeated in such a case (and f7() becomes more complex and fragile). And using this to be able to say that tests covered x% of f7() is nonsense, what is interesting isn't coverage in cases that don't do anything (because it failed before).

A longstanding GnuTLS certificate validation botch

Posted Mar 10, 2014 21:14 UTC (Mon) by zblaxell (subscriber, #26385) [Link]

I don't see how early bailout is a "C pattern." Touching tainted data from the Internet is risky in every language. We want to stop doing it as soon as we can determine a negative result to reduce our attack surface (unless we are defending against a timing attack). In a language that isn't C we might throw an exception or use some other idiom instead of gotos or cleanup functions, but we'd still stop processing early to avoid exposing further code to attack.

Yes, we can do all sorts of wonderful analysis of invalid certificates if we keep going through all the parsing stages; however, at the end of the day a malformed certificate is still invalid, and needs only to be rejected.

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 21:53 UTC (Wed) by Karellen (subscriber, #67644) [Link] (10 responses)

As I read it, even with a language that supports them, these are not the sort of errors you would handle with exceptions.

The code is checking for various conditions, with the checks returning success or failure, and the code itself needing to propagate success or failure. Having a function called check_if_ca() throw an exception from the checks it's doing, would be like having a string comparison function that threw if the strings weren't equal, or a character classification function throwing if the character you passed it didn't fall into the right category. It's just not the way you ought to handle that sort of "error".

You can very well argue that having strcmp() return 0 for a match, and non-0 for failure, where isdigit() returns non-zero for a match and 0 for a failure, is confusing, inconsistent, and ought to be changed. You might even be right. But (IMNSHO) changing them to throw on failure would be an even bigger mistake.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 0:38 UTC (Thu) by jwakely (subscriber, #60262) [Link] (3 responses)

> As I read it, even with a language that supports them, these are not the sort of errors you would handle with exceptions.

Maybe not, but destructors still make it easier to get cleanup correct, whether exiting a function by returning or by throwing an exception, and it's that kind of "must be performed before returning" cleanup that the "goto cleanup" and "goto fail" jumps are performing.

> Having a function called check_if_ca() throw an exception from the checks it's doing, would be like having a string comparison function that threw if the strings weren't equal, or a character classification function throwing if the character you passed it didn't fall into the right category. It's just not the way you ought to handle that sort of "error".

As you implied by putting "error" in quotes, neither the strcmp case nor isdigit case is an error at all.

> You can very well argue that having strcmp() return 0 for a match, and non-0 for failure,

The results "compares less than" and "compares greater than" are not "failure", they're two of the three valid results from strcmp. None of those results indicates an error. Similarly for isdigit, a false result is not an error.

> But (IMNSHO) changing them to throw on failure would be an even bigger mistake.

Well yes, obviously. An exception means something went wrong, not "the answer to your question is no". Using an exception to answer whether a character is a digit or not would be dumb. The result of strcmp is not "pass" or "fail" so throwing an exception makes no sense at all. Throwing an exception might make sense if the argument to isdigit is not representable as unsigned char, or if either argument to strcmp is null (and indeed C++ implementations are already allowed to do that because such arguments produce undefined behaviour).

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 7:03 UTC (Thu) by Karellen (subscriber, #67644) [Link] (2 responses)

Maybe not, but destructors still make it easier to get cleanup correct, whether exiting a function by returning or by throwing an exception, and it's that kind of "must be performed before returning" cleanup that the "goto cleanup" and "goto fail" jumps are performing.

You're absolutely right, and I totally agree... but I can't quite figure how that relevant here. AIUI the problem here has very little (if nothing) to do with cleanup, and is down to confusion over whether success is indicated with a return value of 0 or 1.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 14:05 UTC (Thu) by jwakely (subscriber, #60262) [Link] (1 responses)

Is it really confusion about the correct value to return, or just failing to set the correct value in the relevant variable?

Part of the problem seems to be that the "goto cleanup" style requires a single exit from the function, so you have to be sure to set the correct return value before reaching that single exit point.

If you have destructors (or cleanup attributes, or whatever) that run automatically on exit from the function you just "return 0;" immediately as soon as you like, instead of setting a value, then hoping it doesn't get changed again before the end of the function, then running the cleanup code, and then returning that value.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 14:34 UTC (Thu) by dlang (guest, #313) [Link]

> Is it really confusion about the correct value to return

Yes, there are multiple common standards

0 = success
!0 = success
>=0 = success

which one you want depends on how many variations of success and not success you have and want to differentiate between.

As I understand this bug, different parts of code used different standards, and some function did the equivalent of:
myfunction() {
return otherfunction()
}
when otherfunction used one success standard but the caller of myfunction expected a different standard

so otherfunction() thought it was reporting a failure, but the caller of myfunction interpreted it as success.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 0:59 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (3 responses)

If I were to write it in a safer language (say…Haskell), I'd have something like:

sslChecks :: [CertChain -> Cert -> Reader SslContext (Maybe String)]
sslChecks = [validCert, acceptableAlgorithm, trustedChain, ...]

sslCheck :: CertChain -> Cert -> Reader SslContext (Maybe String)
sslCheck chain cert = (liftM mconcat . sequence) . map ($ cert) . map ($ chain) $ sslChecks

where a failure just bails out of the code at the end (the Reader monad stores the options any checks might care about). No style, error checking, boilerplate, or whatever to worry about. Just write a check, put it in the right place in the lift of checks to make and return an error string if necessary.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 1:08 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Bleh…and that's wrong since mconcat isn't set up for "Nothing" to be the success :/ . I guess adding an Monoid instance for Either a () would work better.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 1:49 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (1 responses)

That version of sslCheck will concatenate the error strings together end-to-end if multiple checks fail (Just "First ErrorSecond Error"). Did you intend something like this, which returns a list of errors?

sslCheck :: CertChain -> Cert -> Reader SslContext [String]
sslCheck chain cert = liftM catMaybes $ sequence $ sslChecks <*> pure chain <*> pure cert

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 2:00 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

That looks much better. I haven't done much Applicative work (which is how I tried it first) and missed the 'pure'. The original idea was to get just the first error message, but why not all :) .

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 12:28 UTC (Thu) by dgm (subscriber, #49227) [Link]

> You can very well argue that having strcmp() return 0 for a match, and non-0 for failure, where isdigit() returns non-zero for a match and 0 for a failure, is confusing, inconsistent, and ought to be changed.

I see where are you coming from, but I cannot agree. strcmp() is not for checking equality, even if you can use it for that. It's an order operator, because the sign of the return value matters, so the return value cannot be a boolean.

In any case the _only_ way to remove error paths -either implicit or explicit- from your code is to create complete functions (http://en.wikipedia.org/wiki/Functional_completeness) whenever possible.

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 21:31 UTC (Thu) by iabervon (subscriber, #722) [Link]

This actually is a reasonable situation for a semi-unexpected exception. The function is supposed to check if a cert is a CA or not, and the bug happens when the cert is mangled. It's like having strcmp throw an exception if one of the strings isn't nul-terminated in its allocation. It's likely that the intended behavior of the function is to treat garbage as not a trusted CA cert, but getting an exception would be secure, if not necessarily convenient.

A longstanding GnuTLS certificate validation botch

Posted Mar 8, 2014 15:20 UTC (Sat) by jbailey (subscriber, #16890) [Link] (4 responses)

C++ in open source was truly miserable until about a decade ago. The standard was still being implemented. The code generated was pretty awful. The ABI kept changing.

Essentially, it wasn't worth the maintenance hassle of using the language, and many of the benefits couldn't even be used yet.

Now it's somewhat different, but the stigma still persists.

A longstanding GnuTLS certificate validation botch

Posted Mar 8, 2014 23:26 UTC (Sat) by nix (subscriber, #2304) [Link] (3 responses)

If by 'a decade ago' you mean 'about 1998', then yes. But that's nearly two decades ago now. :)

A longstanding GnuTLS certificate validation botch

Posted Mar 9, 2014 1:40 UTC (Sun) by jbailey (subscriber, #16890) [Link]

That may be the most graceful way anyone has ever called me old. Thanks. :)

A longstanding GnuTLS certificate validation botch

Posted Mar 9, 2014 1:56 UTC (Sun) by mjg59 (subscriber, #23239) [Link] (1 responses)

Didn't 3.4.0 bump the libstdc++ soname? That was 2004. Debian's last C++ ABI transition was 2005, IIRC - http://lwn.net/Articles/160330/ seems to agree.

A longstanding GnuTLS certificate validation botch

Posted Mar 10, 2014 7:35 UTC (Mon) by nix (subscriber, #2304) [Link]

... hm, there was gnu-versioned-namespace, in 2005, which is clearly not baked yet because its mangling was changing as recently as 2011.

But yes, you're right, the last major soname change to the non-gnu-versioned-namespace seems to have been in 2003, incorporated into 3.4.0; still years after I thought it was. Fallible human memory etc etc:

2003-01-23 Benjamin Kosnik <bkoz@redhat.com>

* configure.in (libtool_VERSION): To 6:0:0.

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 16:05 UTC (Wed) by smurf (subscriber, #17840) [Link]

It would be very interesting to add appropriate error messages to this code which could tell us (or whoever runs the code in question) that the bug is, or would have been, actively exploited.

A longstanding GnuTLS certificate validation botch

Posted Mar 5, 2014 16:49 UTC (Wed) by SEJeff (guest, #51588) [Link] (3 responses)

Kind of telling critique (rightfully so) of GnuTLS's code quality and overall design from Howard Chu, the chief architect behind OpenLDAP:

http://www.openldap.org/lists/openldap-devel/200802/msg00...

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 9:15 UTC (Thu) by nowster (subscriber, #67) [Link] (2 responses)

That was six years ago. Has that report been acted on in the meantime?

A longstanding GnuTLS certificate validation botch

Posted Mar 6, 2014 11:41 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

Given the age of this bug…I'm going to venture "no".

A longstanding GnuTLS certificate validation botch

Posted Mar 8, 2014 18:51 UTC (Sat) by ametlwn (subscriber, #10544) [Link]

Taking a look at the first complaint:

"It turns out that their corresponding set_subject_alt_name() API only takes a char * pointer as input, without a corresponding length. As such, this API will only work for string-form alternative names, and will typically break with IP addresses and other alternatives."

* gnutls_x509_crt_set_subject_alt_name:
* @crt: a certificate of type #gnutls_x509_crt_t
* @type: is one of the gnutls_x509_subject_alt_name_t enumerations
[...]
gnutls_x509_crt_set_subject_alt_name(gnutls_x509_crt_t crt,
gnutls_x509_subject_alt_name_t type,
const void *data,
unsigned int data_size,
unsigned int flags)

[...]
* Since: 2.6.0
[...]
/**
* gnutls_x509_subject_alt_name_t:
* @GNUTLS_SAN_DNSNAME: DNS-name SAN.
* @GNUTLS_SAN_RFC822NAME: E-mail address SAN.
* @GNUTLS_SAN_URI: URI SAN.
* @GNUTLS_SAN_IPADDRESS: IP address SAN.
* @GNUTLS_SAN_OTHERNAME: OtherName SAN.
* @GNUTLS_SAN_DN: DN SAN.
* @GNUTLS_SAN_OTHERNAME_XMPP: Virtual SAN, used by
* gnutls_x509_crt_get_subject_alt_othername_oid.
*
* Enumeration of different subject alternative names types.
*/
[...]

2.6.0 was released 2008-10-06, about 6 months after the abovementioned comment. So, looking at/quoting 6 year old comments indeed seems to be rather pointless.

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 16:00 UTC (Fri) by krake (guest, #55996) [Link] (7 responses)

That "can be reinterpreted as a boolean value" behavior is pretty annoying in C++ as well.

Especially in C++ since there is a explicit boolean type.

And none of the compilers seems to have at least a warning switch that would detect that, let alone generate and error.

One of the things I really miss coming from Java. If you want to check if an integer value is not null, there is an operator for that. There is even an operator for checking if a pointer is not null.

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 16:22 UTC (Fri) by khim (subscriber, #9252) [Link] (3 responses)

You can not use operator if you want to use said integer or pointer afterwards. Something like
  if (Renderer *renderer = getCurrentRenderer()) {
    // do some work with renderer
  }
will be impossible.

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 16:28 UTC (Fri) by krake (guest, #55996) [Link] (2 responses)

And why would I want to do that?

Even if anyone would want to do that for whatever reason, then they would simply not activate the check for actual boolean conditions.

While everyone not doing something like that would have additional type safety, either as warning but ideally as an error.

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 16:55 UTC (Fri) by jwakely (subscriber, #60262) [Link] (1 responses)

> And why would I want to do that?

Because it's a very useful idiom, very commonly used in real C++ programs.

It's useful for the same reasons that C++ and C99 don't require you to declare all variables at the start of a block: the variable is declared as soon as you need it (in the block following the condition) and is not in scope afterwards.

It gets even more useful with smart pointers and other RAII types:

if (std::shared_ptr<X> x = getX()) {
   // do stuff with *x
}
// x no longer in scope and resources already cleaned up

Importantly, std::shared_ptr is only contextually convertible to bool, so it can be tested in contexts such as if conditions, but bool b = x; will not compile, which gives the type safety that is missing from built-in pointer types. If built-in pointer types were contextually convertible to bool that would probably make everyone happy, except for a few programs foolishly relying on implicit (non-contextual) conversions.

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 17:01 UTC (Fri) by krake (guest, #55996) [Link]

Yes, sorry, I know about the use cases, I just don't find them really convincing.

In any case, allowing that for those who want is does not prohibit additional safety for those who prefer that.
Yet I am not aware of any compiler even having a warning flags for "not a boolean in condition"

Something like -Wimplicit-bool or --no-implicit-bool would be really nice for those of us who prefer to write boolean expression for boolean conditions.

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 17:45 UTC (Fri) by cesarb (subscriber, #6266) [Link] (2 responses)

The libreoffice project (which is mostly C++) has a clang plugin to warn on suspicious bool conversions. They have been using it lately to clean up an old pseudo-bool type (sal_Bool) from their codebase, and have already found a few bugs.

As an example of the kind of bug their plugins caught: "if (... && (x == SOME_CONSTANT || SOME_OTHER_CONSTANT))". This is an implicit conversion from int (or an enum) to bool, and it's also obviously wrong; the original programmer obviously meant "if (... && (x == SOME_CONSTANT || x == SOME_OTHER_CONSTANT))".

A longstanding GnuTLS certificate validation botch

Posted Mar 7, 2014 17:50 UTC (Fri) by krake (guest, #55996) [Link]

Oh, this is great!

Thanks a lot!

A longstanding GnuTLS certificate validation botch

Posted Mar 10, 2014 8:38 UTC (Mon) by jezuch (subscriber, #52988) [Link]

> if (... && (x == SOME_CONSTANT || SOME_OTHER_CONSTANT))

Looks like they were programming in Perl 6 before Perl 6 even existed :)

A longstanding GnuTLS certificate validation botch

Posted Mar 10, 2014 21:26 UTC (Mon) by zblaxell (subscriber, #26385) [Link] (10 responses)

There's a lot of amusing language-bashing in here--everything from C++ RAII to toy languages is proposed as a solution, complete with buggy examples in the comments--but this was never a language problem. Both the Apple and GnuTLS bugs were a testing problem that happened to a C program, but the connection to C as language or culture ends there.

There needs to be a set of test inputs in the test suite that exercise every instruction of the code in every meaningfully distinct state so they can all be verified. A basic code coverage test should have caught these bugs easily as soon as someone noticed how hard it was (i.e. impossible) to form a test input capable of touching code on both sides of some of those gotos.

gcov tries to do coverage analysis for C code (although it's a huge pain to use in practice). callgrind tries to do it for arbitrary C and C++ programs (although it has several problems and isn't much easier to use than gcov). Other languages have the instrumentation built in (and the bondage-and-discipline to make sure that every branch is on a separate line of code, so line-based code coverage analysis tools work). A crude but effective ad-hoc coverage tool can even be built out of preprocessor macros with a bit of cunning and discipline.

A longstanding GnuTLS certificate validation botch

Posted Mar 11, 2014 13:28 UTC (Tue) by ms-tg (subscriber, #89231) [Link] (9 responses)

> Both the Apple and GnuTLS bugs were a testing problem

Amen to that. Even for simple web applications, we now
test every known error case, we have integration and unit
level tests, we measure code coverage on each build on
Travis CI... how can this not be done for widely-used
SSL libraries?

> that happened to a C program, but the connection to C
> as language or culture ends there.

I wonder if this is true? Until I see one of these
libraries, out of the hundreds of vital C-language
libraries on my system, approach the level of 100%
code coverage testing that "toy language" cultures
enforce, I am going to have to suspect that this
may actually be a C culture issue.

A longstanding GnuTLS certificate validation botch

Posted Mar 11, 2014 15:37 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (8 responses)

>> that happened to a C program, but the connection to C
>> as language or culture ends there.

> I wonder if this is true?

In addition to the culture of limited testing you alluded to, I think there are some language issues here as well. C will let you do pretty much anything in a function, which is one of its strong points in certain kinds of code. The flip side to this, however, is that it means you have to _test_ for pretty much anything. You got an enum value... how do you know it's actually one of the defined values, when anyone can pass in whatever integer value they want? You got a pointer... how do you know that it points to valid memory of the correct type? How do you test for the absence of unexpected side-effects? Does the result depend on inputs other than the parameters?

In certain other languages (like Haskell) the type system ensures that thing like out-of-range enums, invalid pointers, and undeclared side-effects simply can't happen unless you go out of your way to bypass the system (e.g. with something like unsafeCoerce or unsafePerformIO, which set off major warning flags). If a function is declared with type "MyEnum -> MyDataStructure -> String" then you only need to test it on valid enum values and data structures; the result is guaranteed to be a well-formed string dependent only on the two parameters, and there won't be any side-effects. This makes testing far simpler even before you consider libraries like QuickCheck.

A longstanding GnuTLS certificate validation botch

Posted Mar 11, 2014 20:01 UTC (Tue) by ms-tg (subscriber, #89231) [Link] (7 responses)

> In addition to the culture of limited testing you alluded to,
> I think there are some language issues here as well

Yes, true. But I wonder if discussing type systems is also a
distraction from the more pressing issue here? After all, even
with all the help of Haskell's type system, you *will* still
have bugs.

It seems to me that the lack of rigorous testing was:
(a) The most immediate cause of these bugs
(b) More common in projects written in C

I find it frustrating that discussions of these issues continually
drift towards language wars, rather than towards modern ideas about
unit testing, software composability, test-driven development, and
code coverage tracking.

Aren't these the more pressing questions?
(1) Where are the GnuTLS unit tests, so I can review and add more?
(2) Where is the new regression test covering this bug?
(3) What is the command to run a code coverage tool on the test
suite, so that I can see what coverage is missing?

Say what you will about "toy" languages, but that is what would
happen in any halfway mature Ruby or Python or Javascript project,
and I'm happy to provide links to back that up.

Say what you will about the non-systems languages on the JVM, but
that is also what would happen in any halfway mature Scala, Java,
or Clojure project.

It's only in C, the systems language in which so many of these
vital libraries are written, that this is not the case. Isn't it
time to ask why?

A longstanding GnuTLS certificate validation botch

Posted Mar 11, 2014 21:13 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (6 responses)

> I find it frustrating that discussions of these issues continually
> drift towards language wars, rather than towards modern ideas about
> unit testing, software composability, test-driven development, and
> code coverage tracking.

These "language wars", as you put it, are pretty much all about modern ideas regarding unit testing, composability, test-driven development, and code coverage tracking. Specifically, they're about encouraging the development and use of languages which make such things easier and more reliable, and thus more likely to be implemented.

A longstanding GnuTLS certificate validation botch

Posted Mar 11, 2014 23:47 UTC (Tue) by pizza (subscriber, #46) [Link] (5 responses)

> Specifically, they're about encouraging the development and use of languages which make such things easier and more reliable, and thus more likely to be implemented.

I believe the point nybble41 was attempting to make is that test-driven development (and other such "modern" ideas) is independent of the underlying language used. (Look at GPSd for a case of TDD applied to a primarily-C project)

No amount of "encouragement" will make higher-level languages suitable for system/low-level tasks -- the features (reflection/introspection, dynamic compilation, "package repository") that make those languages more easily testable at the module level make them unsuitable for low-level tasks.

A longstanding GnuTLS certificate validation botch

Posted Mar 12, 2014 3:28 UTC (Wed) by nybble41 (subscriber, #55106) [Link] (1 responses)

> I believe the point nybble41 was attempting to make is that test-driven development (and other such "modern" ideas) is independent of the underlying language used.

I think you meant "ms-tg" rather than "nybble41".

While I agree that these concepts are equally applicable to any language, even C, my point was that the proper choice of language can significantly reduce the testing burden by making stricter guarantees regarding the behavior of the program at compile-time. Speaking as someone who works with system-level C code on FAA-certified embedded systems, C just give you way too much rope to hang yourself with, and it shows in the amount of testing required to obtain full coverage.

As for not being able to do "system/low-level tasks" in high-level languages, I think the authors of House[1] and Kinetic[2] would disagree. While these two OS projects are not written entirely in Haskell, neither is the Linux kernel written entirely in C. Certain core operations require lower-level access to the system, via C and/or assembly, but drivers, network stacks, window systems, and command shells seem "low-level" and "system" enough to me.

[1] http://ogi.altocumulus.org/~hallgren/ICFP2005/house.pdf
[2] http://intoverflow.wordpress.com/kinetic/

A longstanding GnuTLS certificate validation botch

Posted Mar 12, 2014 17:50 UTC (Wed) by ms-tg (subscriber, #89231) [Link]

> While I agree that these concepts are equally applicable
> to any language, even C, my point was that the proper
> choice of language can significantly reduce the testing
> burden

Considering the *vast* quantity of C language code that makes up the modern software stack which was developed outside the culture of unit testing, can't I please persuade you (and others reading, perhaps) that by putting your language suggestions on a separate track, the community might optimize its efforts to introduce test coverage the vast bulk of our linux software?

For example, I suspect that the reason "C culture" seems impervious to adopting the lessons of test-driven development has a lot to do with the masses of developers who are interested in it, by following your advice, are moving to other languages and practicing it there.

In other words, by complecting the issue of unit testing and test coverage with the choice of language, are we not actively *contributing* to the continuing absence of these ideas from C culture, and thus from the bulk of our existing systems?

Food for thought, at least, I hope!

A longstanding GnuTLS certificate validation botch

Posted Mar 12, 2014 18:00 UTC (Wed) by ms-tg (subscriber, #89231) [Link]

> > Specifically, they're about encouraging the development
> > and use of languages which make such things easier and
> > more reliable, and thus more likely to be implemented.
>
> I believe the point nybble41^H^H^H^H ms-tg was attempting
> to make is that test-driven development (and other such
> "modern" ideas) is independent of the underlying language
> used. (Look at GPSd for a case of TDD applied to a
> primarily-C project)

Yes, thank you, exactly!

> No amount of "encouragement" will make higher-level
> languages suitable for system/low-level tasks -- the
> features (reflection/introspection, dynamic compilation,
> "package repository") that make those languages more
> easily testable at the module level make them unsuitable
> for low-level tasks.

This may be true, but I think it's worse than that.

Based on the comments here and elsewhere on the web, it
seems like there's a widespread message that one must
leave the C language in order to adopt modern ideas about
testing and test coverage!

This suggests a self-reinforcing phenomenon where the
majority of the folks interested in learning test-driven
development leave the C language to do it! And therefore,
the bulk of our existing C code remains inadequately
covered by automated tests, and continues to be written
in ways that make coverage difficult to add.

Couldn't one argue that the choice to advocate
for a modern type system (Haskell, ML, Rust, etc), when
the immediate issue is that C code has no test coverage,
is a textbook example of "The Perfect is the Enemy of the
Good"?

(apologies for the length of this rant ;)

A longstanding GnuTLS certificate validation botch

Posted Mar 12, 2014 18:10 UTC (Wed) by ms-tg (subscriber, #89231) [Link] (1 responses)

> (Look at GPSd for a case of TDD applied to a primarily-C project)

I think this is the HEAD version of the build file for GPSD:
http://git.savannah.gnu.org/cgit/gpsd.git/tree/SConstruct

Not for nothing, but:
1. Where is the build target that that runs all the tests?
2. Where is the target that generates coverage stats?
3. Is there a link to where Travis CI, or another automated
system, is running the tests on each commit?

Perhaps I am confused, but it doesn't appear to me that even GPSD is doing these things. Please help me out if I've missed it!

A longstanding GnuTLS certificate validation botch

Posted Mar 14, 2014 22:03 UTC (Fri) by jkt_ (guest, #90352) [Link]

Travis-CI doesn't check every commit, but every push - at least their zero-cost version tied into GitHub.

A longstanding GnuTLS certificate validation botch

Posted Mar 17, 2014 11:36 UTC (Mon) by DavidMoffatt (guest, #80219) [Link] (1 responses)

I don't get it! "First, ensuring that check_if_ca() returned zero" makes sense but why " != 1 rather than == 0"? When is a negative number == 0?
Did they do some kind of overload of "=="?

Could be get a repo URL and change # so we could see the diff?

A longstanding GnuTLS certificate validation botch

Posted Mar 17, 2014 12:23 UTC (Mon) by jwakely (subscriber, #60262) [Link]

Second paragraph of the article:
https://bugzilla.redhat.com/attachment.cgi?id=867911&...

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.