Rust: From & str to Cow
- Transfer
One of the first things I wrote in Rust was a &str
field structure . As you know, the borrowing analyzer did not allow me to do many things with it and severely limited the expressiveness of my APIs. This article aims to demonstrate the problems that arise when storing raw & str links in structure fields and their solutions. In the process, I'm going to show some intermediate API that increases the usability of such structures, but at the same time reduces the efficiency of the generated code. In the end, I want to provide an implementation that will be both expressive and highly effective.
Let's imagine that we are making some kind of library to work with the example.com site API, and we will sign each call with a token, which we define as follows:
// Token для example.io API
pub struct Token<'a> {
raw: &'a str,
}
Then we implement a function new
that will create an instance of the token from &str
.
impl<'a> Token<'a> {
pub fn new(raw: &'a str) -> Token<'a> {
Token { raw: raw }
}
}
Such a naive token only works well for static lines &'static str
that are directly embedded in the binary. However, let's say that the user does not want to embed the secret key in the code, or he wants to load it from some secret store. We could write code like this:
// Вообразим, что такая функция существует
let secret: String = secret_from_vault("api.example.io");
let token = Token::new(&secret[..]);
Such an implementation has a big limitation: the token cannot survive the secret key, which means that it cannot leave this area of the stack.
But what if Token
it stores String
instead &str
? This will help us get rid of the parameter indicating the lifetime of the structure, turning it into an owning type.
Let's make changes to Token and the new function.
struct Token {
raw: String,
}
impl Token {
pub fn new(raw: String) -> Token {
Token { raw: raw }
}
}
All places where provided String
must be corrected:
// Это работает сейчас
let token = Token::new(secret_from_vault("api.example.io"))
However, it harms usability &'str
. For example, such code will not compile:
// не собирается
let token = Token::new("abc123");
The user of this API will need to explicitly convert &'str
to String.
let token = Token::new(String::from("abc123"));
You can try to use &str
new String
in the function instead , hiding it String::from
in the implementation, but in the case String
it will be less convenient and require additional allocation of memory on the heap. Let's see how it looks.
// функция new выглядит как-то так
impl Token {
pub fn new(raw: &str) -> Token {
Token(String::from(raw))
}
}
// &str может передана беспрепятственно
let token = Token::new("abc123");
// По-прежнему можно использовать String, но необходимо пользоваться срезами
// и функция new должна будет скопировать данные из них
let secret = secret_from_vault("api.example.io");
let token = Token::new(&secret[..]); // неэффективно!
However, there is a way to force new to accept arguments of both types without the need for memory allocation in the case of passing a String.
Meet Into
There is a trait in the standard library Into
that will help solve our problem with new. The type definition looks like this:
pub trait Into {
fn into(self) -> T;
}
A function is into
defined quite simply: it takes away self
(something that implements Into
) and returns a type value T
. Here is an example of how this can be used:
impl Token {
// Создание нового токена
//
// Может принимать как &str так и String
pub fn new(raw: S) -> Token
where S: Into
{
Token { raw: raw.into() }
}
}
// &str
let token = Token::new("abc123");
// String
let token = Token::new(secret_from_vault("api.example.io"));
A lot of interesting things are happening here. First, the function has a generic raw
type argument S
; the where line restricts the possible types S
to those that implement the type .
Since the standard library already provides for and , then our case is already being processed for it without additional gestures. [1]
Although it has now become much more convenient to use this API, there is still a noticeable flaw in it: the transfer to requires allocation of memory to store as .Into
Into
&str
String
&str
new
String
Type Cow will save us [2]
The standard library has a special container called std :: borrow :: Cow ,
which allows us to keep convenience on the one hand and allow the structure to own type values on the other .Into
&str
Here is a scary looking Cow definition:
pub enum Cow<'a, B> where B: 'a + ToOwned + ?Sized {
Borrowed(&'a B),
Owned(B::Owned),
}
Let's understand this definition:
Cow<'a, B>
It has two generalized parameters: lifetime 'a
and some generic type B
, which has the following limitations: 'a + ToOwned + ?Sized
.
Let's look at them in more detail:
- Type
B
cannot have a life time shorter than'a
ToOwned
-B
must implement a typeToOwned
that allows you to transfer the borrowed data into possession, making a copy of it.?Sized
- The type sizeB
may not be known at compile time. This does not matter in our case, but it means that trait objects can be used together withCow
.
There are two options for the values that the container is capable of storing Cow
.
Borrowed(&'a B)
- A reference to some type objectB
, while the container's lifetime is exactly the same as the value associated with itB
.Owned(B::Owned)
- The container owns an associated type valueB::Owned
enum Cow<'a, str> {
Borrowed(&'a str),
Owned(String),
}
In short, it Cow<'a, str>
will either be &str
with a lifetime 'a
, or it will be a representation String
that is not connected with this lifetime.
That sounds cool for our type Token
. He will be able to store both &str
, and String
.
struct Token<'a> {
raw: Cow<'a, str>
}
impl<'a> Token<'a> {
pub fn new(raw: Cow<'a, str>) -> Token<'a> {
Token { raw: raw }
}
}
// создание этих токенов
let token = Token::new(Cow::Borrowed("abc123"));
let secret: String = secret_from_vault("api.example.io");
let token = Token::new(Cow::Owned(secret));
Now it Token
can be created both from the owning type and from the borrowed type, but using the API has become less convenient. Into
can make the same improvements for ours Cow<'a, str>
, as I did for simple String
earlier. The final implementation of the token looks like this:
struct Token<'a> {
raw: Cow<'a, str>
}
impl<'a> Token<'a> {
pub fn new(raw: S) -> Token<'a>
where S: Into>
{
Token { raw: raw.into() }
}
}
// создаем токены.
let token = Token::new("abc123");
let token = Token::new(secret_from_vault("api.example.io"));
Now the token can be transparently created both from &str
and from String
. Token-related lifetimes are no longer an issue for
data created on the stack. You can even send a token between threads!
let raw = String::from("abc");
let token_owned = Token::new(raw);
let token_static = Token::new("123");
thread::spawn(move || {
println!("token_owned: {:?}", token_owned);
println!("token_static: {:?}", token_static);
}).join().unwrap();
However, an attempt to send a token with a non-static link lifetime will fail.
// Сделаем ссылку с нестатическим временем жизни
let raw = String::from("abc");
let s = &raw[..];
let token = Token::new(s);
// Это не будет работать
thread::spawn(move || {
println!("token: {:?}", token);
}).join().unwrap();
Indeed, the example above does not compile with an error:
error: `raw` does not live long enough
If you crave more examples, please take a look at the PagerDuty API client , which uses Cow heavily.
Thank you for reading!
1
If you go looking for implementations for & str and String, you will not find them. This is because there is a generalized Into implementation for all types that implement the From trait; it looks like this.Into
impl Into for T where U: From {
fn into(self) -> U {
U::from(self)
}
}
2
Translator's note: in the original article, not a word is said about the principle of Cow or Copy on write semantics.
If, in short, when creating a copy of the container, the real data is not copied, the real separation is only done when trying to change the value stored inside the container.