why rust Trustworthy Concurrent Systems Programing pdf pdf

  Jim Blandy

Why Rust?

  Why Rust? by Jim Blandy Copyright © 2015 O’Reilly Media. All rights reserved.

  Printed in the United States of America.

Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA

95472.

  

O’Reilly books may be purchased for educational, business, or sales promotional use.

Online editions are also available for most titles ( ). For

more information, contact our corporate/institutional sales department:

800-998-9938 or corporate@oreilly.com.

  Editors: Meghan Blanchette and Rachel Proofreader: Melanie Yarbrough Roumeliotis David Futato

  Interior Designer: Production Editor: Melanie Yarbrough Randy Comer Cover Designer:

  Charles Roumeliotis Copyeditor: Illustrator: Rebecca Demarest September 2015: First Edition Revision History for the First Edition 2015-09-02: First Release 2015-09-014: Second Release

  

The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Why Rust?, the

cover image, and related trade dress are trademarks of O’Reilly Media, Inc.

While the publisher and the author have used good faith efforts to ensure that the

information and instructions contained in this work are accurate, the publisher and

the author disclaim all responsibility for errors or omissions, including without limi‐

tation responsibility for damages resulting from the use of or reliance on this work.

Use of the information and instructions contained in this work is at your own risk. If

any code samples or other technology this work contains or describes is subject to

open source licenses or the intellectual property rights of others, it is your responsi‐

bility to ensure that your use thereof complies with such licenses and/or rights.

  978-1-491-92730-4 [LSI]

Table of Contents

Why Rust?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

  2 6 18 42

Why Rust?

  These are the problems Rust was made to address. Rust is a new systems programming language designed by Mozilla. Like C and C++, Rust gives the developer fine control over the use of memory, and maintains a close relationship between the primi‐ tive operations of the language and those of the machines it runs on, helping developers anticipate their code’s costs. Rust shares the ambitions Bjarne Stroustrup articulates for C++ in his paper “Abstraction and the C++ machine model”:

  Systems programming languages have come a long way in the 50 years since we started using high-level languages to write operating systems, but two thorny problems in particular have proven difficult to crack:

  • It’s difficult to write secure code. It’s common for security exploits to leverage bugs in the way C and C++ programs han‐ dle memory, and it has been so at least since the Morris virus, the first Internet virus to be carefully analyzed, took advantage of a buffer overflow bug to propagate itself from one machine to the next in 1988.
  • It’s very difficult to write multithreaded code, which is the only way to exploit the abilities of modern machines. Each new gen‐ eration of hardware brings us, instead of faster processors, more of them; now even midrange mobile devices have multiple cores. Taking advantage of this entails writing multithreaded code, but even experienced programmers approach that task with caution: concurrency introduces broad new classes of bugs, and can make ordinary bugs much harder to reproduce.

  

In general, C++ implementations obey the zero-overhead principle:

What you don’t use, you don’t pay for. And further: What you do

use, you couldn’t hand code any better.

  To these Rust adds its own goals of memory safety and data-race- free concurrency. The key to meeting all these promises is Rust’s novel system of own‐ ership, moves, and borrows, checked at compile time and carefully designed to complement Rust’s flexible static type system. The own‐ ership system establishes a clear lifetime for each value, making garbage collection unnecessary in the core language, and enabling sound but flexible interfaces for managing other sorts of resources like sockets and file handles. These same ownership rules also form the foundation of Rust’s trustworthy concurrency model. Most languages leave the relation‐ ship between a mutex and the data it’s meant to protect to the com‐ ments; Rust can actually check at compile time that your code locks the mutex while it accesses the data. Most languages admonish you to be sure not to use a data structure yourself after you’ve sent it via a channel to another thread; Rust checks that you don’t. Rust is able to prevent data races at compile time. Mozilla and Samsung have been collaborating on an experimental new web browser engine named Servo, written in Rust. Servo’s needs and Rust’s goals are well matched: as programs whose primary use is handling untrusted data, browsers must be secure; and as the Web is the primary interactive medium of the modern Net, browsers must perform well. Servo takes advantage of Rust’s sound concur‐ rency support to exploit as much parallelism as its developers can find, without compromising its stability. As of this writing, Servo is roughly 100,000 lines of code, and Rust has adapted over time to meet the demands of development at this scale.

Type Safety

  But what do we mean by “type safety”? Safety sounds good, but what exactly are we being kept safe from? Here’s the definition of “undefined behavior” from the 1999 stan‐ dard for the C programming language, known as “C99”:

  3.4.3 undefined behavior

  

behavior, upon use of a nonportable or erroneous program con‐

struct or of erroneous data, for which this International Standard

imposes no requirements

  Consider the following C program:

  int main(int argc, char **argv) { unsigned long a[1]; a[3] = 0x7ffff7b36cebUL; return 0; }

  According to C99, because this program accesses an element off the

  a

  end of the array , its behavior is undefined, meaning that it can do anything whatsoever. On my computer, this morning, running this program produced the output: undef: Error: .netrc file is readable by others.

  undef: Remove password or make file unreadable by others.

  .netrc Then it crashes. I don’t even have a file. main

  The machine code the C compiler generated for my function

  a

  happens to place the array on the stack three words before the

  0x7ffff7b36cebUL a[3]

  return address, so storing in changes poor

  main

  ’s return address to point into the midst of code in the C stan‐

  

.netrc

  dard library that consults one’s file for a password. When my

  main main

  returns, execution resumes not in ’s caller, but at the machine code for these lines from the library:

  warnx(_("Error: .netrc file is readable by others.")); warnx(_("Remove password or make file unreadable by others.")); goto bad;

  In allowing an array reference to affect the behavior of a subsequent

  return

  statement, my C compiler is fully standards-compliant. An “undefined” operation doesn’t just produce an unspecified result: it is allowed to cause the program to do anything at all.

  The C99 standard grants the compiler this carte blanche to allow it to generate faster code. Rather than making the compiler responsi‐ ble for detecting and handling odd behavior like running off the end of an array, the standard makes the C programmer responsible for ensuring those conditions never arise in the first place. Empirically speaking, we’re not very good at that. The 1988 Morris virus had various ways to break into new machines, one of which entailed tricking a server into executing an elaboration on the tech‐ nique shown above; the “undefined behavior” produced in that case was to download and run a copy of the virus. (Undefined behavior is often sufficiently predictable in practice to build effective security exploits from.) The same class of exploit remains in widespread use today. While a student at the University of Utah, researcher Peng Li modified C and C++ compilers to make the programs they trans‐ lated report when they executed certain forms of undefined behav‐ ior. He found that nearly all programs do, including those from well-respected projects that hold their code to high standards. In light of that example, let’s define some terms. If a program has been written so that no possible execution can exhibit undefined behavior, we say that program is well defined. If a language’s type system ensures that every program is well defined, we say that lan‐ guage is type safe. C and C++ are not type safe: the program shown above has no type errors, yet exhibits undefined behavior. By contrast, Python is type safe. Python is willing to spend processor time to detect and handle out-of-range array indices in a friendlier fashion than C:

  >>> a = [0] >>> a[3] = 0x7ffff7b36ceb Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: list assignment index out of range >>>

  Python raised an exception, which is not undefined behavior: the

  a[3]

  Python documentation specifies that the assignment to should

  IndexError

  raise an exception, as we saw. As a type-safe language, Python assigns a meaning to every operation, even if that meaning is just to raise an exception. Java, JavaScript, Ruby, and Haskell are also type safe: every program those languages will accept at all is well defined.

  

Note that being type safe is mostly independent of

whether a language checks types at compile time or at

run time: C checks at compile time, and is not type

safe; Python checks at runtime, and is type safe. Any

practical type-safe language must do at least some

checks (array bounds checks, for example) at runtime. It is ironic that the dominant systems programming languages, C and C++, are not type safe, while most other popular languages are. Given that C and C++ are meant to be used to implement the foun‐ dations of a system, entrusted with implementing security bound‐ aries and placed in contact with untrusted data, type safety would seem like an especially valuable quality for them to have.

  This is the decades-old tension Rust aims to resolve: it is both type safe and a systems programming language. Rust is designed for implementing those fundamental system layers that require perfor‐ mance and fine-grained control over resources, yet still guarantees the basic level of predictability that type safety provides. We’ll look at how Rust manages this unification in more detail in later parts of this report. Type safety might seem like a modest promise, but it starts to look like a surprisingly good deal when we consider its consequences for multithreaded programming. Concurrency is notoriously difficult to use correctly in C and C++; developers usually turn to concurrency only when single-threaded code has proven unable to achieve the performance they need. But Rust’s particular form of type safety guarantees that concurrent code is free of data races, catching any misuse of mutexes or other synchronization primitives at compile time, and permitting a much less adversarial stance towards exploit‐ ing parallelism. We’ll discuss this more in the final section of the report.

  

Rust does provide for unsafe code, functions or lexical

blocks that the programmer has marked with the

unsafe keyword, within which some of Rust’s type

rules are relaxed. In an unsafe block, you can use unre‐

stricted pointers, treat blocks of raw memory as if they

contained any type you like, call any C function you

want, use inline assembly language, and so on.

Whereas in ordinary Rust code the compiler guaran‐

tees your program is well defined, in unsafe blocks it

becomes the programmer’s responsibility to avoid

undefined behavior, as in C and C++. As long as the

programmer succeeds at this, unsafe blocks don’t affect

the safety of the rest of the program. Rust’s standard

library uses unsafe blocks to implement features that

are themselves safe to use, but which the compiler isn’t

able to recognize as such on its own.

The great majority of programs do not require unsafe

code, and Rust programmers generally avoid it, since it

must be reviewed with special care. The rest of this

report covers only the safe portion of the language.

Reading Rust

  Before we get into the details of Rust’s semantics, let’s take a look at Rust’s syntax and types. For the most part, Rust tries to avoid origi‐ nality; much will be familiar, so we’ll focus on what’s unusual. The types are worth some close attention, since they’re the key not only to Rust’s performance and safety, but also to making the language palatable and expressive.

  Here’s a function that returns the greatest common divisor of two numbers:

  fn gcd(mut n: u64, mut m: u64) -> u64 { assert!(n != 0 && m != 0); while m != 0 { if m < n { let t = m; m = n; n = t; } m = m % n; } n } If you have experience with C, C++, Java, or JavaScript, you’ll proba‐ bly be able to fake your way through most of this. The interesting parts in brief:

  fn

  • >
    • The keyword introduces a function definition. The token after the argument list indicates the return type.

  mut

  • Variables are immutable by default in Rust; the keyword

  n m

  marks our parameters and as mutable, so we can assign to them.

  • In a variable or parameter declaration, the name being declared isn’t nestled inside the syntax of the type, as it would be in C and C++. A Rust declaration has a name followed by a type, with a colon as a separator.

  u64 i32

  • A value is an unsigned 64-bit integer; is the type of 32-

  

f32 f64

  bit signed integers; and and are the usual floating-point

  

isize usize

  types. Rust also has and types, which are 32-bit integers on 32-bit machines and 64-bit integers on 64-bit machines, in signed and unsigned varieties.

  ! assert!

  • The in the use of marks that as a macro invocation, rather than a function call. Rust has a flexible macro system that is carefully integrated into the language’s grammar. (Unfortu‐ nately, we don’t have space to do more than mention it in this report.)
  • The type of a numeric literal like is inferred from context; in

  gcd u64

  our function, those are zeros. You can specify a literal’s

  1729i16

  type by providing a suffix: is a signed 16-bit integer. If neither inference nor suffix determines a literal’s type, Rust

  i32 assigns it the type . let

  • The keyword introduces local variables. Rust infers types within functions, so there’s no need for us to state a type for our

  t u64 temporary variable : Rust infers that it must be . if while

  • The conditions of and expressions need no parenthe‐ sis, but curly brackets are required around the expressions they control.

  return

  • Rust has a statement, but we didn’t need one to return our value here. In Rust, a block surrounded by curly braces can be an expression; its value is that of the last expression it con‐ tains. The body of our function is such a block, and its last
n if

  expression is , so that’s our return value. Likewise, is an expression whose value is that of the branch that was taken.

  ?:

  Rust has no need for a separate conditional operator as in C;

  if-else one just writes the structure right into the expression.

  There’s much more, but hopefully this covers enough of the syntax to get you oriented. Now let’s look at a few of the more interesting aspects of Rust’s type system: generics, enumerations, and traits.

Generics

  It is very common for functions in Rust to be generic—that is, to operate on an open-ended range of argument types, rather than just a fixed selection, much as a function template does in C++. For

  std::cmp::min

  example, here is the function from Rust’s standard library, which returns the lesser of its two arguments. It can operate on integers of any size, strings, or really any type in which one value can be said to be less than another:

  fn min<T: Ord>(a: T, b: T) -> T { if a <= b { a } else { b } }

  <T: Ord>

  Here, the text after the function’s name marks it as a generic function: we’re defining it not just for one specific type, but

  T

  for any type , which we’ll use as the type of our arguments and

  T : Ord

  return value. By writing , we’ve said that not just any type

  T Ord

  will do: must be a type that is , meaning that it supports a com‐

  Ord

  parison ordering all values of that type. If a type is , we can use

  <= Ord

  the operator on its values. is an example of a trait, which we’ll cover in detail below.

  min

  With this definition, we can apply to values of any type we want, as long as the type orders its values:

  min(10i8, 20) == 10; // T is i8 min(10, 20u32) == 10; // T is u32

min("abc", "xyz") == "abc"; // strings are Ord, so this works

  T min

  Since the definition uses for both arguments, calls to must pass two values of the same type:

  min(10i32, "xyz"); // error: mismatched types. min

  The C++ analogue of our function would be: template<typename T> T min(T a, T b) { return a <= b ? a : b; } min

  However, the analogy isn’t exact: where the Rust stipulates that

  T Ord

  its argument type must be , the C++ function template says

  T min

  nothing about its requirements for . In C++, for each call to , the compiler must take the specific argument type at hand, substi‐

  T min tute it for in ’s definition, and see if the result is meaningful. min

  Rust can check ’s definition in its own right, once, and can check

  min

  a call to using only the function’s stated type: if the arguments

  Ord

  have the same type, and that type is , the call is well typed. This allows Rust to produce error messages that locate problems more precisely than those you can expect from a C++ compiler. Rust’s design also forces programmers to state their requirements up front, which has its benefits and drawbacks. One can have generic types as well as functions:

  struct Range<Idx> { start: Idx, end: Idx, } std::ops::Range

  This is the type from Rust’s standard library,

  0..10

  which represents the value of range expressions like ; these appear in iterations, expressions denoting portions of arrays and

  min

  strings, and so on. As in the definition of our generic function ,

  <Idx> Range

  the text after the name indicates that we’re defining a

  Idx

  structure that is generic in one type, , which we use as the type of

  start end the structure’s and fields. Range

  Making generic allows us to handle all these expressions as

  Range<T> T

  values for different types :

  • 10i32..10 // a Range<i32>
  • 2.0..0.25f64 // a Range<f64> 200..800 // a Range<T>, for the integer type T // determined from context

  Rust has a more general expression syntax for writing instances of any struct type. For example, the last range above could also be writ‐ ten:

  Range { start: 200, end: 800 } Rust compiles generic functions by producing a copy of their code specialized for the exact types they’re applied to, much as C++ gen‐ erates specialized code for function template instantiations. As a result, generic functions are as performant as the same code written with specific types used in place of the type variables: the compiler can inline method calls, take advantage of other aspects of the type, and perform other optimizations that depend on the types.

Enumerations

  enum

  Rust’s enumerated types are a departure from C and C++ types, but users of functional languages will recognize them as algebraic datatypes. A Rust enumerated type allows each variant to carry a dis‐ tinct set of data values along with it. For example, the standard

  Option

  library provides an type, defined as follows:

  enum Option<T> { None, Some(T) }

  T Option<T>

  This says that, for any type , an value may be either of

  

None Some(v)

  two variants: , which carries no value; or , which carries

  v T

  the value of type . Enumerated types resemble unions in C and

  enum

  C++, but a Rust remembers which alternative is live, prevent‐ ing you from writing to one variant of the enum and then reading another. C and C++ programmers usually accomplish the same pur‐

  enum

  pose by pairing a union type with an type, calling the combina‐ tion a “tagged union.”

  Option

  Since is a generic type, you can use it with any value type you want. For example, here’s a function that returns the quotient of two numbers, but declines to divide by zero:

  fn safe_div(n: i32, d: i32) -> Option<i32> { if d == 0 { return None; } return Some(n / d); } n d i32

  This function takes two arguments and , both of type , and

  

Option<i32> None Some(q)

  returns an , which is either or for some

  q safe_div None

  signed 32-bit integer . If the divisor is zero, returns ;

  Some(the quotient) otherwise it does the division and returns . The only way to retrieve a value carried by an enumerated type is to

  match

  check which variant it is, and handle each case, using a state‐

  safe_div

  ment. For example, we can call like this:

  match safe_div(num, denom) { None => println!("No quotient."), Some(v) => println!("quotient is {}", v) } match switch

  You can read the here as something like a statement

  Option<T> safe_div

  that checks which variant of returned. The

  Some v

  branch assigns the value the variant carries to the variable ,

  match None

  which is local to its branch of the statement. (The var‐ iant carries no values, so it doesn’t set any local variables.)

  match

  In some cases a full-blown statement is more than we need, so

  if let

  Rust offers several alternatives with varying ergonomics. The

  while let

  and statements use matching as the condition for

  Option

  branching or looping; and the type itself provides several

  match convenience methods, which use statements under the hood.

  Rust’s standard libraries make frequent use of enumerations, to great effect; we’ll see two more real-world examples later in the section on memory safety.

Traits

  min

  When we defined our generic function above, we didn’t simply

  min<T>(a: T, b: T) -> T

  define . One could read that as “the lesser

  T

  of two values of any type ,” but not every type is well-ordered. It’s not meaningful to ask, say, which of two network sockets is the

  min<T: Ord>(...) min

  lesser. Instead we defined , indicating that only works on types whose values fall in some order relative to each

  Ord

  other. Here, the constraint is a trait: a collection of functionality that a type can implement.

  Ord

  The trait itself is pretty involved, so let’s look at a simpler (but

  IntoIterator

  quite useful) example: the standard library’s and

  Iterator

  traits. Suppose we have a table of the names of the seasons in the United States’ Pacific Northwest:

  let seasons = vec!["Spring", "Summer", "Bleakness"];

  • Types implementing the
  • Types implementing the

  HashMap

  E

  in

  for V in E { ... }

  , a type must implement

  

IntoIterator

.

  The standard library’s container types like

  Vec

  ,

  , and

  

Iterator

  LinkedList

  all implement

  

IntoIterator

  out of the box. But as an example, let’s look at what it would take to implement iteration for our

  Vec<&str> type ourselves.

  Here’s the definition of the

  Iterator

  trait from the standard library:

  trait Iterator { type Item; fn next(&mut self) -> Option<Self::Item>; fn size_hint(&self) -> (usize, Option<usize>) { ... } fn count(self) -> usize { ... } fn last(self) -> Option<Self::Item> { ... }

fn nth(&mut self, n: usize) -> Option<Self::Item> { ... }

// ... some thirty-odd other methods omitted ... } There’s a lot there, but only the first two items actually concern us.

  traversing them in whatever way is natural. To be permitted as the

  method that returns an

  This declares

  Spring Summer Bleakness

  seasons

  to be a value of type

  Vec<&str>

  , a vector of references to statically allocated strings. Here’s a loop that prints the contents of

  seasons

  :

  for elt in seasons { println!("{}", elt); }

  Perhaps obviously, this prints:

  Rust’s

  into_iter

  for

  loop isn’t limited to vectors: it can iterate over any type that meets a few key requirements. Rust captures those requirements as two traits:

  

Iterator

  trait can produce a sequence of values for a

  for

  loop to iterate over, and decide when to exit the loop.

  Iterator values hold the loop state.

  IntoIterator

  trait have an

  This definition says that, in order to implement this trait, a type must provide at least two things:

  Item

  • Its type: the type of value the iteration produces. When iterating over a vector, this would be the type of the vector’s ele‐ ments.

  next Option<Item> Some(v)

  • A method, which returns : either ,

  

v None

  where is the next value in the iteration, or if we should exit the loop.

  

self

  When defining methods, the argument is special: it refers to

  this the value on which we’re invoking the method, like in C++. Iterator next &mut self

  The trait’s method takes a argument, meaning that it takes its self value by reference, and is allowed to modify it. A method can also take its self value by shared reference

  &self

  ( ), which does not permit modification, or by value (simply

  self ). next Iterator

  Other than , all the methods in have default defini‐

  { ... }

  tions (shown as above, omitting their code) which build on

  Item next

  the and definitions we provide, so we don’t need to write them ourselves (although we could if we liked).

  Iterator

  To implement for our vector of strings, we must first define a type to represent the current loop state: the vector we’re iterating over, and the index of the element whose value we should produce in the next iteration:

  struct StrVecIter { v: Vec<&'static str>, i: usize }

  &'static str

  The type is a reference to a string literal, like the names of the seasons in our example. (We’ll cover lifetimes like

  'static

  in more detail later, but for now, take it to mean that our vectors hold only string literals, not dynamically allocated strings.)

  StrVecIter

  Now that we have the type to hold our loop state, we

  Iterator

  can implement the trait for it:

  impl Iterator for StrVecIter { type Item = &'static str; fn next(&mut self) -> Option<&'static str> { if self.i >= self.v.len() { return None;

  • A type
  • A type

  into_iter

  Item type as our own.

  into_iter

  , which produces a value of our

  IntoIter type.

  Here’s how we could implement

  IntoIterator

  for our type

  Vec<&str>

  :

  impl IntoIterator for Vec<&'static str> { type Item = &'static str; type IntoIter = StrVecIter; fn into_iter(self) -> StrVecIter { return StrVecIter { v: self, i: 0 }; } }

  This defines the

  method for

  Iterator

  Vec<&str>

  to construct a value of the

  StrVecIter

  type we defined above, pointing to our vec‐ tor and ready to start iteration at the first element; accordingly,

  StrVecIter

  is our

  IntoIter

  type. And finally, our

  Item

  type is

  &str

  , with the same

  , which holds the loop state. This must imple‐ ment

  : each iteration of the loop gets a string. We could improve on this definition by passing it the vector by ref‐

  is the value for the next iteration, or

  } self.i += 1; return Some(self.v[self.i - 1]); } }

  We’ve provided an

  Item

  type: each iteration gets another

  &'static str

  value from the vector. And we’ve provided a

  next

  method, which produces either

  Some(s)

  , where

  s

  None

  IntoIter

  , indicating that we should exit the loop. This is all we need: all the other methods appearing in the

  Iterator

  trait defi‐ nition will fall back to their default definitions. With that in place, we can implement the

  IntoIterator

  trait. Here’s the trait’s definition, from the standard library:

  trait IntoIterator { type Item; type IntoIter: Iterator<Item=Self::Item>; fn into_iter(self) -> Self::IntoIter; }

  This says that any type implementing

  IntoIterator

  must provide:

  Item

  , the type of the values produced for each iteration of the loop.

  • A method

  for

  erence, not by value; as written, the loop will move the vector

  StrVecIter

  into the value, meaning that it can no longer be used after we’ve iterated over it. We can fix this readily by having

  StrVecIter

  borrow the vector instead of taking it by value; we’ll cover borrowed references later in the report. Like functions and types, trait implementations can be generic. Rust’s standard library uses a single implementation of

  IntoIterator

  to handle vectors of any type:

  impl<T> IntoIterator for Vec<T> { type Item = T; type IntoIter = IntoIter<T>; fn into_iter(self) -> IntoIter<T> { ... } }

  Iterators are a great example of Rust’s commitment to zero-cost

  for

  abstractions. While Rust’s loop requires the type representing

  Iterator

  the loop state to implement the trait, this doesn’t imply

  for

  that any sort of virtual dispatch is taking place each time the

  next

  loop invokes the iterator’s method. As long as the compiler knows the exact type of the iterator value, it can inline the type’s

  next

  method, and we’ll get the same machine code we’d expect from a handwritten loop.

  Iterator

  Implementing does more than just allow us to connect to

  for Iterator

  loops. The default method definitions on offer a nice collection of operations on sequences of values. For example, since

  IntoIterator

  ranges implement , here’s a function that sums the

  Iterator fold

  integers in the range 1..n using ’s method:

  fn triangle(n: i32) -> i32 { (0..n+1).fold(0, |sum, i| sum + i) }

  |sum, i| sum + i

  Here, the expression is a Rust closure: an anony‐

  sum i sum

  mous function that takes two arguments, and , and returns

  • i fold fold

  . We pass this closure as ’s second argument; calls it for each value the iterator produces, passing the running total and the iterator’s value as arguments. The closure’s return value is taken as

  

fold

  the new running total, which returns when the iteration is complete. for fold

  As with the loop, this is a zero-cost abstraction: the

  triangle

  method can be inlined into , and the closure can be inlined

  fold

  into . The machine code generated for this definition is as good as that for the same loop written out by hand. Traits usually appear in Rust code as bounds on type parameters,

  Ord T

  just as the trait bounded the type variable in our definition of

  min

  earlier. Since Rust compiles generic functions by specializing them to the actual types they’re being applied to, the compiler always knows exactly which implementation of the bounding traits to use. It can inline method definitions, and in general optimize the code for the types at hand. However, you can also use traits to refer to values whose specific type isn’t determined until runtime. Here, Rust must use dynamic dispatch to find the traits’ implementations, retrieving the relevant method definition from a table at runtime, much as C++ does when calling a virtual member function. For example, the following function reads four bytes from an input

  stream

  stream , and compares them against a given sequence of bytes. One might use a function like this to check the “magic num‐ ber” bytes at the beginning of a binary file:

  use std::io::Read; use std::io::Result; fn check_magic(stream: &mut Read, magic: &[u8])

  • > Result<bool> { let mut buffer = [0; 4]; if try!(stream.read(&mut buffer)) < 4 { return Ok(false); } return Ok(&buffer == magic); }

  std::io::Read

  The standard library defines as a trait with methods

  std::istream for reading from a stream of bytes, akin to in C++. read

  This trait’s method accepts a buffer, tries to fill it with bytes from the stream, and returns the number of bytes it transferred on success, or an error code on failure.

  stream &mut Read

  Our argument’s type, , is interesting: rather than being a mutable reference to some specific type, it is a mutable refer‐

  Read

  ence to a value of any type that implements . This sort of refer‐ ence is called a trait object, and supports all the trait’s methods and operations. This allows us to use a reference to any value that imple‐

  Read check_magic ments as the first argument to .

  

At runtime, Rust represents a trait object as a pair of

pointers: one to the value itself, and the other to a table

of implementations of the trait’s methods for that val‐

ue’s type. Our call to stream.read consults this table to

find the read implementation for stream’s true type,

and calls that, passing along the trait object’s pointer to

the value as the self argument.

  Trait objects allow data structures to hold values of mixed types, where the set of possible types is open-ended. For example, the fol‐ lowing function takes a vector of values, and joins them all into a string, knowing nothing about their types other than that they

  ToString

  implement the trait :

  fn join(v: &Vec<&ToString>, sep: char) -> String { let mut s = String::new(); for i in 0..v.len() { s.push_str(&v[i].to_string()); if i + 1 < v.len() { s.push(sep); } } s }

  We can pass this a vector containing an arbitrary mix of types:

  assert_eq!(join(&vec![&0,

&std::net::Ipv4Addr::new(192,168,0,1),

&"trilobite"], ','), "0,192.168.0.1,trilobite");

  When used in this way, traits are analogous to C++’s abstract base classes or Java’s interfaces: they use dynamic dispatch to allow code to operate on values whose types can vary at runtime. But the anal‐ ogy doesn’t extend much further:

  • When traits serve as bounds on type parameters to generic functions, there’s no dynamic dispatch involved. This is the most common use of traits in Rust.
  • Whereas a type’s base classes and interfaces are fixed when it is defined, the set of traits a type implements is not. You can
define your own traits and implement them on types defined elsewhere; you can even implement your traits on primitive

  i32 types like .

Memory Safety in Rust

  Now that we’ve sketched Rust’s syntax and types, we’re ready to look at the heart of the language, the foundation for Rust’s claims to memory safety and trustworthy concurrency. We’ll focus on three key promises Rust makes about every program that passes its compile-time checks:

  • No null pointer dereferences. Your program will not crash because you tried to dereference a null pointer.
  • No dangling pointers. Every value will live as long as it must.

  Your program will never use a heap-allocated value after it has been freed.

  • No buffer overruns. Your program will never access elements beyond the end or before the start of an array.

  Rust uses a different technique to back up each of these promises, and each technique has its own impact on the way you write pro‐ grams. Let’s take a look at each one in turn.

No Null Pointer Dereferences

  The simplest way to ensure that one never dereferences a null pointer is to never permit one to be created in the first place; this is the approach Rust takes. There is no equivalent in Rust to Java’s

  null nullptr NULL

  , C++’s , or C’s . Rust does not convert integers to pointers, so one can’t just use , as in C++. The default value of a pointer is irrelevant: Rust requires you to initialize each variable before using it. And so on: the language simply doesn’t provide a way to introduce null pointers. But all those other languages include explicit support for null point‐ ers for a good reason: they’re extremely useful. A lookup function can return a null pointer to indicate that the requested value was not found. A null pointer can indicate the absence of optional informa‐ tion. Functions that always return a pointer to an object under nor‐ mal circumstances can return null to indicate failure. The problem with null pointers is that it’s easy to forget to check for them. And since null pointers are often used to signal rare condi‐ tions like running out of memory, missing checks can go unnoticed for a long time, leaving us with the worst possible combination: checks that are easy to forget, used in a way that makes the omission usually asymptomatic. If we could ensure that our programs never neglect to check for null pointers, then we could have our optional values and our error results, without the crashes.

  Rust’s solution is to separate the concept of a pointer (a kind of value that refers to some other value in memory) from the concept of an optional value, handling the latter using the standard library’s

  Option

  enumerated type, presented earlier. When we need an

  P

  optional argument, return value, or so on of some pointer type , we

  Option<P>

  use . A value of this type is not a pointer; trying to deref‐ erence it or call methods on it is a type error. Only if we check which

  match Some(p)

  variant we have in hand using a statement and find

  

p

  can we extract the actual pointer —which is now guaranteed not to be null, simply because pointers are never null. The punchline is that, under the hood, this all turns back into null pointers in the compiled code. The Rust language permits the com‐

  enum Option piler to choose any representation it likes for types like . Option<P> None

  In the case of , the compiler chooses to represent as

  

Some(p) p

  a zero value, and as the pointer value , since the one can’t be mistaken for the other. Thus, after the compiler has ensured that all the necessary checks are present in the source code, it translates it to the same representation at the machine level that C++ would use for nullable pointers. For example, here’s a definition of a singly linked list whose nodes

  T

  carry a value of some type : struct Node<T> { value: T, next: Option<Box<Node<T>>> } type List<T> = Option<Box<Node<T>>>;

  Box Box<T>

  The type is Rust’s simplest form of heap allocation: a is

  T

  an owning pointer to a heap-allocated value of type . When the box is dropped, the value in the heap it refers to is freed along with it. So

  Option<Box<Node<T>>> None

  is either , indicating the end of the list,

  Some(b) b or where is a box pointing to the next node.