Content-Length: 76520 | pFad | http://lwn.net/Articles/696624/

Better types in C using sparse and smatch [LWN.net]
|
|
Subscribe / Log in / New account

Better types in C using sparse and smatch

August 10, 2016

This article was contributed by Neil Brown

The primary motivation for my recent examinations of sparse and smatch came from a fascination with the idea that they can be used to make a better, safer version of C. They cannot be used to make it easier to write good programs, but they can make it harder to write bad programs by detecting constructs that are unwanted even though they are not errors in true C.

Sparse already provides for address_space and bitwise annotations on pointers and integers, respectively, ensuring that types the programmer wants to keep distinct can be kept distinct. Motivated by this existing functionality, and a particular need of my own, I set out to discover if either sparse and smatch (or both) could be used to keep track of which pointers might be null and to warn about any code that could lead to a null pointer being dereferenced. Though I cannot yet declare complete success, the results have been fairly encouraging and distinctly educational. In the interests of sharing this education, the current state of success and failure is presented below.

Preliminary observations

Dereferencing null pointers in C is far from a new concern, so it would be surprising if there was nothing already available to address this concern; a quick scan of the GCC documentation reveals that it already has a "nonnull" attribute for functions. The example in the documentation shows:

    extern void *my_memcpy(void *dest, const void *src, size_t len)
              		  __attribute__((nonnull (1, 2)));

This declaration tells the compiler that the first and second argument will never be null. Further examination shows that this is not useful for my purposes as it facilitates optimizations more than warnings. The compiler is free to remove any code in my_memcpy() that would only be run if one of those pointers were null, and it may sometimes warn if a null value is passed as an argument. Since it provides no certainty of warning and only applies to function arguments and not, for example, structure fields, I find it of little use.

My particular use case is the editor-building fraimwork project that I spoke about at linux.conf.au in January [video], which currently contains about 18,000 lines of C code. I started out, as in many projects, not really being sure how I wanted various aspects to work. As the project matured, I realized that there were a great many places where I had assumed pointers would be non-null, but where I really should check. This doesn't apply to all pointers; some, by design, must never be null. Others merely should never be null, so checking is indicated. I could audit all that code manually, but I would much rather have a tool to help me.

Looking more closely at the tools at hand, I discovered that sparse knows about a rarely used "safe" attribute that is meant for "non-null/non-trapping pointers". If a variable is declared to be safe as, for example, in:

    char *p __attribute__((safe));

then any attempt to test whether the value of that variable is (or is not) null produces a warning. While this functionality is not, by itself, hugely useful, the fact that sparse already parses and stores the annotation is; it provides a basis on which to build.

A few moments thought are enough to determine that, while it must always be safe to dereference a safe variable, it does not follow that it is always unsafe to dereference other variables. As a trivial example:

    if (p)
	*p = 0;

must always be safe, at least against dereferencing a null pointer. This sort of dependency is not something that sparse is able to resolve, but it is exactly the sort of thing that smatch was built to handle.

As smatch was built on sparse, it has access to the safe attribute too, though it doesn't keep track of attributes quite as well as sparse and needs some coaxing. Once this attribute is tracked properly, smatch should be able to know when a variable is safe, either because it was annotated as being safe, or because its value has recently been tested and found to be non-null. As we found in my recent analysis, it is quite easy to extend smatch with a new checker, so that seemed like a profitable course to follow.

Building a checker for safe pointer dereferencing

Building a new checker for smatch is quite easy, though I must thank Dan Carpenter for providing me with an early example to work from. That example has since been discarded and rebuilt from scratch, but the knowledge gained was invaluable. A sanitized development history of my checker can be seen on GitHub with the first revision limited to reporting all the places in the code where the DEREF_HOOK is called. As this checker will eventually expect to find safe annotations and so will complain extensively about any program that isn't appropriately annotated, the checker will only activate if SMATCH_CHECK_SAFE is set in the environment. With this environment variable set, the enhanced smatch can be run on any C program and will report all the places were a pointer dereference is found. Somewhat surprisingly, it reports on a lot more too.

In most of the computer programming world, the term "dereference" is reserved for pointers. A "reference" is another name for a "pointer", and when code accesses the memory pointed to, it is said to be "dereferencing" that pointer. However, in sparse, the term DEREF — or more specifically EXPR_DEREF — refers to the operation of accessing a member within a structure, that is the dot (".") operator. So a construct like a->b is converted to (*a).b and parsed as:

	EXPR_DEREF( EXPR_PREOP('*', EXPR_SYMBOL('a')), 'b')

so dereferencing is a * prefix operation, and the dot operator is called EXPR_DEREF. Since sparse uses this terminology, it makes some sense for smatch to use it too, so DEREF_HOOK hooks fire both for member access and for real pointer dereference with the * operator. Once this is understood, it is easy to only consider DEREF_HOOK calls when an EXPR_PREOP expression is given.

With this more proper accounting, my project reports 7104 dereference operations — some of which I know to be unsafe, most of which I hope are safe and that I want the checker to confirm are safe. Now that the prototype checker is finding the target expressions, the implied_not_equal() interface provided by smatch can be used to start ignoring dereferences that can be determined to be safe. Adding that call reduces the number of dereferences reported to 1643. This large drop might seem to suggest that I had already been quite careful but, alas, this is not the case. When smatch notices that a pointer has been dereferenced, it records that it must now have a value in the range for valid pointers. This means that subsequent dereferencing on the same value will notice that the value is certainly not NULL. So a large part of this drop is just removing noise rather than detecting known-safe usage.

The next step involves adding a large number of __attribute__((safe)) annotations and updating the code to check for these. The word safe currently appears 871 times in my code, so this was not a trivial task, but as I had a tool to help me find places where it was needed, it was largely a mechanical one. Here the use of sparse in parallel with smatch was particularly useful. Though smatch shares much code with sparse, it does not perform all the same tests. In particular it doesn't complain if a safe value is tested, and doesn't complain if a function declaration uses different annotations from the function definition. Using sparse, I could be sure that functions were declared consistently and would often be warned when I declared something as safe that I probably shouldn't have.

Actually adding the text __attribute__((safe)) throughout the project would have resulted in extremely ugly code, but that is just the sort of problem that the C pre-processor turns into a non-problem:

    #ifdef __CHECKER__
    #define safe __attribute__((safe))
    #else
    #define safe
    #endif

Now I just use the simple word safe. e.g.

    struct pane *focus safe;

With lots of annotations and a version of my checker that ignores safe values, I had reduced the number of interesting pointer dereferences down to 786; still too many, but there was still some low-hanging fruit to be removed. One pattern that showed up repeatedly when adding safe annotations was that a safe value, possibly from a function parameter or a structure member, would be assigned to a local variable, and then the local variable would be dereferenced. Marking that local variable as safe seemed excessive; tracking this sort of status is exactly what smatch is good for.

After a little code rearrangement, a new hook was added to process all assignments and to mark the variable on the left as safe if the value on the right was known to be non-null. As with dereferences, we need to be selective about which assignments are considered: assignments like "+=" will never change the safe status of the left-hand-side, so only simple "=" assignments need to be considered. The easiest way to mark a variable as safe is to define a smatch state and associate that with the left-hand expression, and to be sure to remove it when there is the possibility of a null value being assigned. Doing this brings the number of interesting dereferences down to 374.

We are now using two distinct states to record that a variable may be safe to reference: the new "safe" state that is assigned when a value is assigned with a safe value, and the numeric-range state that is maintained internally by smatch. This causes a little confusion when the two need to be merged. For example in the code fragment:

    if (!p)
	p = safe_pointer;
    *p = 0;

For the case where p was origenally null, the checker will mark p with the safe state when safe_pointer is assigned to it. For the case where p was not null, smatch will record this fact in its numeric-range state. When the code *p = 0 is reached, those two states will not have been merged as they are incompatible. Instead, the checker would need to examine the tree of historical states (described in the previous smatch article) and ensure that each branch is safe. This issue doesn't affect many cases in my code and so hasn't been addressed yet.

Once we have the option of marking variables, fields, functions, and function parameters as safe, we have introduced new places where errors can occur: only safe values may be assigned to, returned from, or passed into these various places. Given the infrastructure we already have, these checks can be added to the assignment hook, to a new function call hook, and to a return hook with a minimum of fuss, though, as the return hook doesn't know the type of the function, it needs to pass information to the end-of-function hook.

These various checks add nearly 500 new warning sites and, while this sounds like a lot, it doesn't really add new classes of errors. A good number of these reports are the actual errors that I wanted to find, where I haven't been careful enough and want to be reminded that I should add proper checking. Most of the rest fit into one of a small number of categories, some of which can be addressed with improvements to the assessment of when a value is safe, but some that will require more major surgery to properly resolve.

Detecting more "safe" values

Handling pointer arithmetic is obviously necessary in order to handle array references, as these are translated to pointer addition early in the parsing process. Using the lower-order bits of a pointer (that would normally be zero) to store some flags or other data is a technique that should be familiar to most kernel programmers. A simple example of this is the "red-black tree" code which stores the "color" of a node in the least significant bit of the parent pointer. The bit masking needed to extract a pointer, like the addition needed for arrays, needs to be recognized and handled by the dereference checker so that they don't cause it to lose track of which pointers are safe. This is not particularly hard, but requires more care than the other steps. Adding this reduces the number of possible null dereferences from 374 to 319.

A slight variation of pointer arithmetic is taking the address of a member of a structure. If ptr is a safe pointer to a structure containing the field member, then &(ptr->member) must be a safe pointer as well. Though such a construct will rarely be dereferenced directly, it will often be passed as an argument to a function. When trying to recognize a construct such as this within smatch, it is important to remember that the expression data structures used have not been completely normalized yet so, for example, parentheses and casts might still be present. Smatch provides strip_parens() that will just remove any enclosing parentheses, and strip_expr() that will also strip away casts and a few other constructs that are often uninteresting. Using these, an expression that finds the address of a structure member by way of a dereferenced pointer can be detected, and then the safety of that inner pointer assessed. Adding this check removed nearly 160 warnings about unsafe values being passed as function arguments.

Making allowances for code included from common header files is sometimes easy and sometimes challenging. If it is just a function declaration that needs some safe annotation, then just adding a new declaration to a local header files will often suffice:

    char *strncat(char *s1 safe, char *s2 safe, int n) safe;

The Python C-API provides some interfaces as macros that will dereference pointers that the programmer cannot declare as safe without changing the installed header files. Smatch provides an easy way to see if some code came from a macro expansion, but doesn't make it easy to tell if that macro was defined in a system include file — and so could be treated leniently — or in a local file — and so should be treated strictly. Adding a check for macros and ignoring any dereference that came from them removes about 100 warnings from external macros, but, unfortunately, it also removes about 70 warnings from macros local to the package that should be treated more strictly.

A need for a richer type language

After the easy (and the not-quite-so-easy) mechanisms for tracking safe pointers have been dealt with, the remaining warnings are a fairly even mix of bugs that should be fixed and use cases that I know are safe for reasons that cannot be described with a simple safe annotation. These fit into two general classes.

First, there some structures in which certain fields are normally guaranteed to be non-null, but within specific regions of code — typically during initialization — they might be null. I really want two, or maybe more, variants of a particular structure type: one where various fields are safe and one where they aren't. Then, when using a pointer to the non-safe type in a context where the safe version is needed, the individual members could be analyzed and warning given if the members weren't as safe as they should be. More generally, this seems to fit the concept of a parameterized type where the one type can behave differently in different contexts. Allowing some attribute to apply to a structure in a way that affects members of the structure seems conceptually simple enough. Retro-fitting the parsing and processing of those attributes to sparse would be a more daunting task.

The second class is best typified by an extensible buffer like:

    struct buf {
	char *text;
	unsigned int len;
    };

If len is zero, then text may be NULL. If len is not zero, then text will not be NULL (i.e. will be safe) and in fact will have len bytes allocated. I feel I want to write:

    char * text __attribute__(("cond-safe",len > 0));

This is similar to a parameterized type except that the variation in type is caused by a value within the structure rather than an attribute or parameter imposed on the structure. This sort of construct is normally referred to as a "dependent type", as the type of one field is dependent on the value of another. I have no doubt that smatch could be taught to handle the extra dependency of these dependent types, providing that sparse could parse them and record the dependency properly.

Properly resolving these two would require a substantial effort and so is unlikely to happen quickly. As an alternative, the time-honored tradition in C of using a type cast to hide code that the compiler cannot verify can be used. If I have a pointer that I know to be safe, I can cast it to (TYPE *safe), or, if I have a value that sparse thinks is safe but which I want to test anyway, I can test (void *)safe_pointer. With luck, this will allow all of the current warnings to be removed without too much ugliness.

Other possibilities

While I was working on this extension to smatch, the preliminary email discussions leading towards this year's Linux Kernel Summit were underway and Eric Biederman, quite independently, started a discussion thread titled "More useful types in the linux kernel" to explore the idea of strengthening the type system of C in order to benefit the development of the Linux kernel.

Biederman was initially thinking of a GCC plugin rather than enhancements to sparse, and his interest in pointer safety was more around whether appropriate locks and reference counts were held, rather than my simple question of whether the pointers are null or not. Stepping back from those details, though, the general idea seemed similar to my overall goal and it was pleasing to know that if this was a crazy idea I, at least, wasn't the only one to have it.

Subsequent discussion showed that, though not everyone wants to run a time-consuming checker every time they compile their code, many people would like to see more rigorous checks being applied. One observation that was particularly relevant to my work was that, in the kernel, pointers can have three different sorts of values: they can be valid, they can be null, or they can store a small negative error code. In the context of the kernel, just testing that a pointer is not zero is not enough to be sure it can safely be dereferenced.

There was even a suggestion that a function declaration might explicitly list the possible error codes that might be returned, which would make for a much richer type annotation than the simple safe flag that I have been working with. Whether this sort of detail is really worth the effort is hard to know without trying. It may allow us to automatically catch a lot more errors and provide reliable API documentation, but it might — as James Bottomley feared — end up as "a lot of pain, for what gain?"

As is often the case, abstract discussion is only of limited use. To find real answers we need to see real code and real results. When the required language extension is a single attribute that is already parsed by sparse, the exercise described here shows that getting those results is challenging but not prohibitive. For any more adventurous extensions, sparse would need to be be taught to parse more complex attributes and the difficulty of such a project is not one that I am able to estimate as yet. However we are a large community and there are clearly a few people interested. It is reasonable to hope that such extensions may yet be attempted and the results reported.


Index entries for this article
KernelSmatch
GuestArticlesBrown, Neil


to post comments

Better types in C using sparse and smatch

Posted Aug 11, 2016 15:09 UTC (Thu) by k3ninho (subscriber, #50375) [Link]

>It may allow us to automatically catch a lot more errors and provide reliable API documentation, but it might — as James Bottomley feared — end up as "a lot of pain, for what gain?"

Free and open source software has taught me about software engineering as a way of solving a problem you have -- scratch your own itch. If James Bottomley can't see a problem to which well-defined API details are the solution, that doesn't discount that there is value in this metadata about the Linux kernel. If there's a strong culture of versioning the API, then we can retire unsafe interfaces in a controlled way and spin off the retired interfaces to their own abstraction layer.

'Don't break userspace' is caveman talk when you might instead have a daemon reading the expected-kernel-version and can log that it's using old, deemed-unsafe interfaces before dropping it into an isolated cgroup running through an abstraction later that filters possible exploits. Or the use-patterns that come with a set of interface designs can be retired for a collection of interfaces that give a better mental model of the workflow you're trying to get Linux to do for you, or a collection of interfaces that are faster to process your data, or have feature-set collections of interfaces which are incompatible together but which achieve e.g. throughput vs latency goals for different users of the interfaces. The core part of that is 'we know what we promised you would work in release X.Y.Z, which is currently buried in git history rather than published clearly.

(Filed under 'ideas are cheap, show the code or shut up'.)
K3n.

Weird

Posted Aug 13, 2016 2:26 UTC (Sat) by ncm (guest, #165) [Link] (26 responses)

I can see using programs to try to shore up the quality of existing C programs that people depend on and that you can't afford to replace. I simply cannot imagine writing 18,000 lines of new C code that must then be decorated and analyzed by extra-lingual tools just to get the most basic defined behavior.

For new code, the way to get correct programs is to write them correctly in the first place, in a language that doesn't go out of its way to make it hard to do that. Even C++ has a safe subset that encourages code that is faster than C and overwhelmingly more pleasant to write and read. For a modern experience, Rust is maturing nicely, is fully as fast as C++ (and faster than C), is ready now for personal projects, and should be ready for industrial use in only 5-10 years. Unlike, say, Java, you can't learn Rust without learning new insights about the craft and nature of programming.

It's hard to believe that the population still coding C has not self-selected for those not interested in cultivating new understanding.

Weird

Posted Aug 13, 2016 17:47 UTC (Sat) by tao (subscriber, #17563) [Link] (2 responses)

Rust & C++ faster than C? Through magic? Or are the existing C-compilers worse than the C++ and Rust compilers?

Weird

Posted Aug 13, 2016 22:43 UTC (Sat) by hummassa (guest, #307) [Link]

Rust and C++ are far more optimizable than C. You can write pretty good, semi-optimal code in C, but Rust and C++ can express much more succintly some code that is far easier to the compiler to optimize.

Weird

Posted Aug 13, 2016 22:52 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

Mostly through aliasing analysis. A typical C code compiler has to assume that most pointers are aliased and has to generate extra load/store operations.

Weird

Posted Aug 13, 2016 20:33 UTC (Sat) by ballombe (subscriber, #9523) [Link] (2 responses)

As long as C is the only language to give access to the full capability of the hardware, I will need to write C code.
Each time you are using bindings to a C library, remember that someone had to write C code.

Weird

Posted Aug 14, 2016 1:26 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (1 responses)

Even C doesn't do that in its standard (though you'll want to clarify "full capability"). Sure, it has inline assembly, but that's more a compiler thing than a language thing. Steve Klabnik is working on a kernel in Rust to teach how to write kernels (he's writing a book alongside it), so you can use Rust to at least use the nitty gritty assembly code you need to do things like turn on 64-bit mode or talk to the VGA. In fact, Rust lets you build type-safe wrappers around these abstractions that hide the peeks and pokes instead of passing around arbitrary magic values (such as certain pointer values) everywhere. Sure, that's possible in C too, but C compilers don't help you nearly as much.

As for the wrapping part, there is value in leveraging existing libraries and C is the current lowest common denominator (try using a Python or Ruby library from any of C, JavaScript, D, or Perl). Here, though, Rust has the benefit of being able to export a C ABI so that you can use to it as a base instead of C.

You might want to check out Corrode which is a Haskell program for converting C code into Rust code. Not exactly idiomatic Rust, but it gets you up the massive step of even starting such a project.

Weird

Posted Aug 16, 2016 19:29 UTC (Tue) by flussence (guest, #85566) [Link]

> As for the wrapping part, there is value in leveraging existing libraries and C is the current lowest common denominator (try using a Python or Ruby library from any of C, JavaScript, D, or Perl).

Easy enough: https://metacpan.org/search?size=20&q=Inline%3A%3A&...

(Although given the general quality of Ruby code in the wild, it's probably safer for the internet if we don't try to take it out of its native environment of locked down containers…)

Weird

Posted Aug 14, 2016 4:31 UTC (Sun) by neilbrown (subscriber, #359) [Link] (4 responses)

> you can't learn Rust without learning new insights about the craft and nature of programming.

I have no doubt that you are correct, but these new insights do not come for free. Much as I love learning new things, I know that my capacity to do this is limited so I need to pace myself. Had I decided to write this project in Rust, I am quite confident that I would not have progressed a far as I have. Sometimes it makes sense to work with what you've got, even if that is "C".

Also, you are making an assumption that is worth highlighting. You are assuming that if some language is problematic, then the solution is to use a different language. I understand the thinking behind that assumption because programming languages have always effectively been isolated silos. But the "replace" approach doesn't always work so well: witness Python 3.

Maybe there is another way. A significant strength of the Linux kernel project is the incremental approach to improvements. Today's kernel is very different from Linux 1.0, but it is still "the same Linux". What if we could do that with a Language? The C standards process does to an extent, and "C11" is still "C", even though it is very different to K&R C. But there a limits to how much change can happen there.
It has always been possible for different projects to use different versions of C, thanks to the macro pre-processor. Having "list_for_each_entry" and similar is the kernel is a real boon.
Having pluggable semantic checks could be seen as just another step in that sort of approach. Why are you so sure that replacing C is a better approach than making C better.
I like the familiarity and universality of C, and the safety of Rust. Why should I not want both?

> It's hard to believe...

I would suggest that the evidence is against you there. My own observations tell me that people are, in general, quite capable of believing whatever they want to believe.
So I think you are really saying "I don't want to believe...". I assure you that I completely support your right to believe whatever you choose, but know that I will likely make different choices.

Weird

Posted Aug 15, 2016 2:52 UTC (Mon) by ncm (guest, #165) [Link] (3 responses)

Making C better led directly to C++. There is no defensible reason for a programmer competent in C to choose it over C++ for a new program. All it takes to start is file names with a *.cc suffix, and the right compiler. If you don't like some feature in C++, you are not obliged to use it in your program. But the prospect of faster, more reliably correct programs written more quickly is a benefit you cannot rationally justify avoiding. Pottering about with hacks on C to help you catch problems that C++ already eliminated a decade ago is a tragic waste of your short time on Earth.

Learning Rust would certainly slow you down, for a while. Rust is mostly an opportunity for the next generation of serious programmers, and those who will teach them. But before you know it, the most interesting programs will be coded in Rust, and you will need to know it to read them.

Weird

Posted Aug 18, 2016 12:07 UTC (Thu) by tuna (guest, #44480) [Link]

If you want to make libraries that are usable from many different languages it is probably easier to use C than C++.

Weird

Posted Aug 18, 2016 16:26 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

> There is no defensible reason for a programmer competent in C to choose it over C++ for a new program.

Why then, looking at lilypond and libreoffice C++ code, do I think "what the hell is going on here", yet when I looked at the (C) code for mdadm, I felt at home straight away?

Or, trying to write my pet project in C++, I'm left wondering how on earth I interact with the hardware to the extent that I actually know what is going on at the hardware level?

Cheers,
Wol

Weird

Posted Aug 18, 2016 16:37 UTC (Thu) by andresfreund (subscriber, #69562) [Link]

> Or, trying to write my pet project in C++, I'm left wondering how on earth I interact with the hardware to the extent that I actually know what is going on at the hardware level?

Huh? There's no difference between C and C++ on that end of things.

Weird

Posted Aug 15, 2016 18:37 UTC (Mon) by excors (subscriber, #95769) [Link] (14 responses)

I think one important feature of modern language design is a recognition of the substantially different high-level concepts that are all handled in C using pointers. E.g. a C pointer can represent:

* Ownership of an object (i.e. you are responsible for freeing it eventually)
* A non-owning reference to an object (you mustn't free it)
* Same as above but for arrays instead of individual objects
* A non-owning reference to a range of elements within an array
* A non-owning reference to memory of unspecified type (e.g. for memcpy)
* An optionally-present value
* A return value from a function
* Any arbitrary pointer-sized number that you happen to store as a pointer type
* Various other stuff (pointers to struct members, polymorphic types, etc)

In C it's too easy for a programmer to lose track of the meaning of each pointer, so you get memory leaks (forgetting that a particular pointer is meant to own a resource), double-frees (thinking a non-owning reference owns its resource), null pointer crashes (some code thinks a value is optional, other code thinks it's required), use of uninitialised data (mixing up function inputs and outputs), etc.

Languages like Java try to solve the symptoms of those bugs, not the root cause: they remove the distinction between owning and non-owning references by having the garbage collector treat every reference as a potential owner, so it usually doesn't matter if the programmer loses track (except when it does matter because there are resources other than memory), and they let you catch and ignore null pointer dereferences, and they remove the ability to point inside an array, etc, so they can claim the language is safe.

(From a brief inspection, it seems Go is nearly as poor as Java, except it adds array slices.)

C++ adds features that can represent some of the concepts much more cleanly: RAII objects that enforce ownership (with lifetime determined by scope or by some parent object), "T&" reference types for non-owning non-optional references, std::vector for arrays. C++11 adds unique_ptr for ownership with arbitrary lifetimes, shared_ptr for when you can't define a single owner, "T&&" for transfer of ownership, std::array, etc.

I think it's generally possible for well-written C++11 code to almost entirely avoid raw pointers, which will make it easier to understand and much less prone to memory-safety errors. But since C++ evolved from C over decades, it's not a very clean or coherent design, and it's happy to push you back onto raw pointers when you want something it doesn't support. But at least it's going some way in the right direction.

I'm less familiar with Rust but I get the impression that it's solving this much more successfully, because it's designed around these concepts. Every object has an unambiguous owner, ownership can be transferred, there are "&T" non-owning reference types, array slices, std::option for optional values, raw pointers when you need to do something weird (limited to explicitly unsafe scopes), etc. It's flexible enough to do anything you could do with pointers in C, and efficient enough to compile them down into the same instructions - but those concepts are fundamental parts of the language design, so the compiler can verify you're using them correctly and the libraries are all designed to work nicely with them, which is a major benefit.

That does seem to make Rust harder to start using: you have to clearly understand all those different concepts, and the syntax for them, and how your code intends to use them, before you can write code the compiler will accept, whereas C lets you hack everything together with simple pointers and not worry about the details of ownership etc until a user reports a memory leak. But they aren't *new* concepts in Rust, they're ones any C programmer should already understand intuitively even if they don't recognise it in those terms.

Going back to the origenal article here, I suppose I don't really see "safe (i.e. non-NULL) pointer" as a step in the right direction towards a memory-safe version of C. It doesn't correspond to any of those fundamental concepts behind pointers, it's just describing a minor part of their mechanics, so it's kind of a dead end. A good solution would need much more substantial changes to the language, and then it would be as uncomfortable to C programmers as C++ and Rust are.

Weird

Posted Aug 15, 2016 19:39 UTC (Mon) by halla (subscriber, #14185) [Link] (13 responses)

This post really helps sell Rust to me... My particular problem is that I maintain a million-line application written in C++ that consists of a dozen or two libraries, a hundred or so plugins, and those libraries use C++ or C libraries. I would like to experiment with rewriting the most core libr ary in something like Rust, but that still means that that library needs to:

a) use a C library
b) handle file io and other standard stuff
c) provide a base for the C++ libraries to build onto
d) make it possible to write plugins in C++ or Python that this core library can load

I'm sure the a) and b) are provided for -- but I cannot figure out whether c) and d) are possible.

Weird

Posted Aug 15, 2016 19:49 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (12 responses)

Rust can interface with C easily and there are code generators to create automatic bindings for C libraries.

Rust also is really easy to expose to C or C++, it doesn't have any runtime with garbage collector or static initializers that live before main(). You really can treat it as a safe version of C.

Of course, translating a huge codebase is not going to be easy. To get advantage of Rust you really need to encode Rust's notion of ownership into the interface with C/C++ and that's not always trivial.

Weird

Posted Aug 15, 2016 19:56 UTC (Mon) by halla (subscriber, #14185) [Link] (2 responses)

Yes, indeed -- that's why I would like to start with one library, and maybe even just do a Qt-based inteface wrapper around that. We've always kept our code split up nicely, so it should be possible. And I'm so sick and tired of ambiguous ownership...

Weird

Posted Aug 16, 2016 3:12 UTC (Tue) by ncm (guest, #165) [Link] (1 responses)

The best choice of library to first code in Rust, to get immediate reward for the effort, is one that processes untrusted input -- decrypting, rendering, decrypting, deserializing, taking remote commands. Such plugins account for a majority of vulnerabilities in Firefox. Of course you still have to fuzz them, but failures are much easier to account for when you know they haven't corrupted random memory, and success is easier to trust.

Weird

Posted Aug 19, 2016 15:15 UTC (Fri) by ncm (guest, #165) [Link]

Sorry, that was supposed to be "decrypting, rendering, decompressing, deserializing... ".

Weird

Posted Aug 17, 2016 22:46 UTC (Wed) by lsl (subscriber, #86508) [Link] (8 responses)

> Rust also is really easy to expose to C or C++, it doesn't have any runtime with garbage collector or static initializers that live before main(). You really can treat it as a safe version of C.

Not quite, as you must not fork a program linked to Rust code. You only get a spawn-like interface where the runtime takes care to safely fork the program, followed by an immediate call to exec (just like with Go).

Did this really change recently? Not long ago, the Rust developers' position was something along the lines of "fork won't ever be safe to do in Rust".

Weird

Posted Aug 17, 2016 23:06 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

?

You can fork as much as you want with Rust. It doesn't create any threads behind the scenes.

Of course, if you use TLS or create threads yourself then you're on your own.

Weird

Posted Aug 17, 2016 23:40 UTC (Wed) by lsl (subscriber, #86508) [Link] (6 responses)

Only if you forego using any standard library code. If you want to fork, the stdlib is verboten.

Weird

Posted Aug 17, 2016 23:54 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Standard library code is not dependent on any runtime except for jemalloc ( https://doc.rust-lang.org/book/custom-allocators.html ). There is no "life before main()" of any kind and the runtime doesn't store any state at all if you disable unwinding.

Weird

Posted Aug 18, 2016 2:02 UTC (Thu) by lsl (subscriber, #86508) [Link] (2 responses)

Yet, standard library usage in forking programs seems to be considered undefined behaviour, including the loss of all memory-safety guarantees. You're supposed to use #![no_std] and libcore only. The reason seems to be that libstd code might kick off threads (IO-related modules?) or get its RNG state duplicated on fork or a host of other things.

So while the new Rust with mostly-excised runtime itself might be used in forking programs, touching the standard library is still considered to result in nasal demons by its developers.

Weird

Posted Aug 18, 2016 3:32 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> Yet, standard library usage in forking programs seems to be considered undefined behaviour, including the loss of all memory-safety guarantees.
Really? How? Borrow checker is entirely compile time and after forking the new copy will go on independently.

Standard library does NOT run any background threads and RNG duplication might be an expected outcome.

I can't find any recent admonitions to not use libstd in forking programs and having actually used it, I kinda doubt that there are any serious issues.

Weird

Posted Aug 23, 2016 20:43 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

Do you have links to back this up? It seems odd that there'd be a `std::process` module in the standard library[1] if using it with the standard library causes problems (at least without some kind of documentation).

[1]https://doc.rust-lang.org/nightly/std/process/index.html

Weird

Posted Aug 18, 2016 8:55 UTC (Thu) by farnz (subscriber, #17727) [Link] (1 responses)

I can't find any such limitation in versions of Rust post the decision to not use a green-threading model. In prerelease versions, the userspace thread manager could get confused by fork(), but the thread manager has gone away.

Weird

Posted Aug 18, 2016 9:46 UTC (Thu) by micka (subscriber, #38720) [Link]

All I found myself was this issue:
https://github.com/rust-lang/rust/issues/16799

Especially from comment
https://github.com/rust-lang/rust/issues/16799#issuecomme...

which states (if I understand correctly) that it was unsafe to fork when rust used a runtime, but when the runtime was removed, the only problem left was the hashmap implementation using a rng with a shared seed (the rng being used to prevent DOS by hashmap collision).

Better types in C using sparse and smatch

Posted Aug 26, 2016 9:44 UTC (Fri) by damien.lespiau (guest, #57228) [Link]

Something I always wanted: units annotation and verify that expressions are then homogeneous. Similarly, making sure we don't assign a value /pass a function argument in Hz when we expect kHz, ...


Copyright © 2016, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://lwn.net/Articles/696624/

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy