Rust: From & str to Cow

Original author: Joe Wilm
  • Transfer

One of the first things I wrote in Rust was a &strfield structure . As you know, the borrowing analyzer did not allow me to do many things with it and severely limited the expressiveness of my APIs. This article aims to demonstrate the problems that arise when storing raw & str links in structure fields and their solutions. In the process, I'm going to show some intermediate API that increases the usability of such structures, but at the same time reduces the efficiency of the generated code. In the end, I want to provide an implementation that will be both expressive and highly effective.


Let's imagine that we are making some kind of library to work with the example.com site API, and we will sign each call with a token, which we define as follows:


// Token для example.io API
pub struct Token<'a> {
    raw: &'a str,
}

Then we implement a function newthat will create an instance of the token from &str.


impl<'a> Token<'a> {
    pub fn new(raw: &'a str) -> Token<'a> {
        Token { raw: raw }
    }
}

Such a naive token only works well for static lines &'static strthat are directly embedded in the binary. However, let's say that the user does not want to embed the secret key in the code, or he wants to load it from some secret store. We could write code like this:


// Вообразим, что такая функция существует
let secret: String = secret_from_vault("api.example.io");
let token = Token::new(&secret[..]);

Such an implementation has a big limitation: the token cannot survive the secret key, which means that it cannot leave this area of ​​the stack.
But what if Tokenit stores Stringinstead &str? This will help us get rid of the parameter indicating the lifetime of the structure, turning it into an owning type.


Let's make changes to Token and the new function.


struct Token {
    raw: String,
}
impl Token {
    pub fn new(raw: String) -> Token {
        Token { raw: raw }
    }
}

All places where provided Stringmust be corrected:


// Это работает сейчас
let token = Token::new(secret_from_vault("api.example.io"))

However, it harms usability &'str. For example, such code will not compile:


// не собирается
let token = Token::new("abc123");

The user of this API will need to explicitly convert &'strto String.


let token = Token::new(String::from("abc123"));

You can try to use &strnew Stringin the function instead , hiding it String::fromin the implementation, but in the case Stringit will be less convenient and require additional allocation of memory on the heap. Let's see how it looks.


// функция new выглядит как-то так
impl Token {
    pub fn new(raw: &str) -> Token {
        Token(String::from(raw))
    }
}
// &str может передана беспрепятственно
let token = Token::new("abc123");
// По-прежнему можно использовать String, но необходимо пользоваться срезами
// и функция new должна будет скопировать данные из них
let secret = secret_from_vault("api.example.io");
let token = Token::new(&secret[..]); // неэффективно!

However, there is a way to force new to accept arguments of both types without the need for memory allocation in the case of passing a String.


Meet Into


There is a trait in the standard library Intothat will help solve our problem with new. The type definition looks like this:


pub trait Into {
    fn into(self) -> T;
}

A function is intodefined quite simply: it takes away self(something that implements Into) and returns a type value T. Here is an example of how this can be used:


impl Token {
    // Создание нового токена
    //
    // Может принимать как &str так и String
    pub fn new(raw: S) -> Token
        where S: Into
    {
        Token { raw: raw.into() }
    }
}
// &str
let token = Token::new("abc123");
// String
let token = Token::new(secret_from_vault("api.example.io"));

A lot of interesting things are happening here. First, the function has a generic rawtype argument S; the where line restricts the possible types Sto those that implement the type . Since the standard library already provides for and , then our case is already being processed for it without additional gestures. [1] Although it has now become much more convenient to use this API, there is still a noticeable flaw in it: the transfer to requires allocation of memory to store as .Into
Into&strString
&strnewString


Type Cow will save us [2]


The standard library has a special container called std :: borrow :: Cow ,
which allows us to keep convenience on the one hand and allow the structure to own type values ​​on the other .Into&str


Here is a scary looking Cow definition:


pub enum Cow<'a, B> where B: 'a + ToOwned + ?Sized {
    Borrowed(&'a B),
    Owned(B::Owned),
}

Let's understand this definition:


Cow<'a, B>It has two generalized parameters: lifetime 'aand some generic type B, which has the following limitations: 'a + ToOwned + ?Sized.
Let's look at them in more detail:


  • Type Bcannot have a life time shorter than'a
  • ToOwned- Bmust implement a type ToOwnedthat allows you to transfer the borrowed data into possession, making a copy of it.
  • ?Sized- The type size Bmay not be known at compile time. This does not matter in our case, but it means that trait objects can be used together with Cow.

There are two options for the values ​​that the container is capable of storing Cow.


  • Borrowed(&'a B)- A reference to some type object B, while the container's lifetime is exactly the same as the value associated with it B.
  • Owned(B::Owned) - The container owns an associated type value B::Owned

enum Cow<'a, str> {
    Borrowed(&'a str),
    Owned(String),
}

In short, it Cow<'a, str>will either be &strwith a lifetime 'a, or it will be a representation Stringthat is not connected with this lifetime.
That sounds cool for our type Token. He will be able to store both &str, and String.


struct Token<'a> {
    raw: Cow<'a, str>
}
impl<'a> Token<'a> {
    pub fn new(raw: Cow<'a, str>) -> Token<'a> {
        Token { raw: raw }
    }
}
// создание этих токенов
let token = Token::new(Cow::Borrowed("abc123"));
let secret: String = secret_from_vault("api.example.io");
let token = Token::new(Cow::Owned(secret));

Now it Tokencan be created both from the owning type and from the borrowed type, but using the API has become less convenient.
Intocan make the same improvements for ours Cow<'a, str>, as I did for simple Stringearlier. The final implementation of the token looks like this:


struct Token<'a> {
    raw: Cow<'a, str>
}
impl<'a> Token<'a> {
    pub fn new(raw: S) -> Token<'a>
        where S: Into>
    {
        Token { raw: raw.into() }
    }
}
// создаем токены.
let token = Token::new("abc123");
let token = Token::new(secret_from_vault("api.example.io"));

Now the token can be transparently created both from &strand from String. Token-related lifetimes are no longer an issue for
data created on the stack. You can even send a token between threads!


let raw = String::from("abc");
let token_owned = Token::new(raw);
let token_static = Token::new("123");
thread::spawn(move || {
    println!("token_owned: {:?}", token_owned);
    println!("token_static: {:?}", token_static);
}).join().unwrap();

However, an attempt to send a token with a non-static link lifetime will fail.


// Сделаем ссылку с нестатическим временем жизни
let raw = String::from("abc");
let s = &raw[..];
let token = Token::new(s);
// Это не будет работать
thread::spawn(move || {
    println!("token: {:?}", token);
}).join().unwrap();

Indeed, the example above does not compile with an error:


error: `raw` does not live long enough

If you crave more examples, please take a look at the PagerDuty API client , which uses Cow heavily.


Thank you for reading!


Notes

1


If you go looking for implementations for & str and String, you will not find them. This is because there is a generalized Into implementation for all types that implement the From trait; it looks like this.Into


impl Into for T where U: From {
    fn into(self) -> U {
        U::from(self)
    }
}

2


Translator's note: in the original article, not a word is said about the principle of Cow or Copy on write semantics.
If, in short, when creating a copy of the container, the real data is not copied, the real separation is only done when trying to change the value stored inside the container.


Also popular now: