6 things you can do with the Cow πŸ„ in Rust πŸ¦€

The Cow type is a mystery even for some intermediate-level Rust developers. Despite being defined as a simple two-variant enum

pub enum Cow<‘a, B>
where
B: ‘a + ToOwned + ?Sized,
{
Borrowed(&’a B),
Owned(<B as ToOwned>:…


This content originally appeared on DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’» and was authored by Konstantin Grechishchev

The Cow type is a mystery even for some intermediate-level Rust developers. Despite being defined as a simple two-variant enum

pub enum Cow<'a, B> 
where
    B: 'a + ToOwned + ?Sized, 
{
    Borrowed(&'a B),
    Owned(<B as ToOwned>::Owned),
}

, it challenges the developers to understand the ownership and lifetimes, as well yet another mystery Borrow and ToOwned traits. As a result, programmers avoid using Cow, which often leads to extra memory allocations (which are not cheap) and less efficient software.

What are the situations when you might consider using Cow? Why does it have such a strange name? Let's try to find some answers today!

A function rarely modifying the data

Let's start with the most common and straightforward use case for Cow type. It is a good illustration of the situation when most developers (including me!) encounter the Cow for the first time.

Consider the following function accepting and modifying the borrowed data (in this case &str):

fn remove_whitespaces(s: &str) -> String {
    s.to_string().replace(' ', "")
}

fn main() {
    let value = remove_whitespaces("Hello world!");
    println!("{}", value);
}

As you can see, it does nothing but removes all white spaces from the string. What is wrong with it? What if in 99.9% of calls the string contains no white spaces? Or slight modification of the method when spaces should be removed based on some other condition.

In such cases, we could avoid to_string() call and creation an unnecessary copy of the string. However, if we are to implement such logic, we can use neither String no &str type: the first one forces the memory allocation and the last is immutable.

This is the moment when Cow plays its role. We can return Cow::Owned when the string is modified and Cow::Borrowed(s) otherwise:

use std::borrow::Cow;

fn remove_whitespaces(s: &str) -> Cow<str> {
    if s.contains(' ') {
        Cow::Owned(s.to_string().replace(' ', ""))
    } else {
        Cow::Borrowed(s)
    }
}

fn main() {
    let value = remove_whitespaces("Hello world!");
    println!("{}", value);
}

The nice thing about Cow<str> is that it could always be dereferenced into &str later or converted into String by calling into_owned. The into_owned only allocates the memory if the string was originally borrowed.

A struct optionally owning the data

We often need to store references inside the structs. If we have no such need, you are likely ending up cloning data unnecessarily.

Consider

struct User<'a> {
    first_name: &'a str,
    last_name: &'a str,
}

Would not it be nice to be able to create a user with a static lifetime User<'static> owning its own data? This way we could implement the method do_something_with_user(user) accepting the same struct regardless of whether the data is cloned or borrowed. Unfortunately, the only way to create User<'static> is by using &'static str.

But what if we have a String? We can solve the problem by storing not &'a str, but Cow<'a, str> inside the struct:

use std::borrow::Cow;

struct User<'a> {
    first_name: Cow<'a, str>,
    last_name: Cow<'a, str>,
}

This way, we can construct both owned and borrowed version of the User struct:

impl<'a> User<'a> {

    pub fn new_owned(first_name: String, last_name: String) -> User<'static> {
        User {
            first_name: Cow::Owned(first_name),
            last_name: Cow::Owned(last_name),
        }
    }

    pub fn new_borrowed(first_name: &'a str, last_name: &'a str) -> Self {
        Self {
            first_name: Cow::Borrowed(first_name),
            last_name: Cow::Borrowed(last_name),
        }
    }


    pub fn first_name(&self) -> &str {
        &self.first_name
    }
    pub fn last_name(&self) -> &str {
        &self.last_name
    }
}


fn main() {
    // Static lifetime as it owns the data
    let user: User<'static> = User::new_owned("James".to_owned(), "Bond".to_owned());
    println!("Name: {} {}", user.first_name, user.last_name);

    // Static lifetime as it borrows 'static data
    let user: User<'static> = User::new_borrowed("Felix", "Leiter");
    println!("Name: {} {}", user.first_name, user.last_name);

    let first_name = "Eve".to_owned();
    let last_name = "Moneypenny".to_owned();

    // Non-static lifetime as it borrows the data
    let user= User::new_borrowed(&first_name, &last_name);
    println!("Name: {} {}", user.first_name, user.last_name);
}

A clone on write struct

The examples above illustrate only one side of the Cow: the ability to represent the data which borrowed or owned status is figured in not in compile time, but in runtime.

But why was it named Cow then? Cow stands for copy on write. The examples above illustrate only one side of the Cow: the ability to represent the data which borrowed or owned status is figured in not in compile time, but in runtime.

The true power of Cow comes with to_mut method. If the Cow is owned, it simply returns the pointer to the underlying data, however if it is borrowed, the data is first cloned to the owned from.

It allows you to implement an interface based on the structures, lazily storing the references to the data and cloning it only if (and for the first time) the mutation is required.

Consider the code which receives the buffer of data in the form of &[u8]. We would like to pass it over some logic, conditionally modifying the data (e.g. appending a few bytes) and consume the buffer as &[u8]. Similar to the example above, we can't keep the buffer as &[u8] as we won't be able to modify it, but converting it to Vec would lead to the copy being made every time.

We can achieve the required behavior by representing the data as Cow<[u8]>:

use std::borrow::Cow;

struct LazyBuffer<'a> {
    data: Cow<'a, [u8]>,
}

impl<'a> LazyBuffer<'a> {

    pub fn new(data: &'a[u8]) -> Self {
        Self {
            data: Cow::Borrowed(data),
        }
    }

    pub fn data(&self) -> &[u8] {
        &self.data
    }

    pub fn append(&mut self, data: &[u8]) {
        self.data.to_mut().extend(data)
    }
}

This way we can pass borrowed data around without cloning up until the moment when (and if) we need to modify it:

fn main() {
    let data = vec![0u8; 10];

    // No memory copied yet
    let mut buffer = LazyBuffer::new(&data);
    println!("{:?}", buffer.data());

    // The data is cloned
    buffer.append(&[1, 2, 3]);
    println!("{:?}", buffer.data());

    // The data is not cloned on further attempts
    buffer.append(&[4, 5, 6]);
    println!("{:?}", buffer.data());
}

Keep your own type inside it

Most likely you would end up using Cow<str> or Cow<[u8]>, but there are cases when you might want to store your own type inside it.

In order to use the Cow with a user defined type, you would need to implemented owned and borrowed version of it. The owned and borrowed version must by tied together by the following trait boundaries:

  • Owned version should implement the Borrow trait to produced a reference to the borrowed type
  • The borrowed version should implement ToOwned trait to produce the owned type.

Implementation of the the Borrow trait is tricky and often unsafe. Indeed, in order for the fn borrow(&self) -> &Borrowed; function to return a reference to Borrowed typed, this reference should either be stored inside &self or produced unsafely.

The above often means that the borrowed type is an unsized (also know as dynamically sized type. Their size is not known at compile time, so they can only exist as a pointer or a reference.

Have you ever wondered why we use &str everywhere and nearly never use str? You can't find the definition of the str type in the standard library, it is a primitive type (part of the language). Since str is a dynamically sized type, it can only be instantiated through a pointer type, such as &str. Trait object dyn T is another example of the dynamically sized type.

Imagine you would like to implement your own version of String and str type.

use std::borrow::{Borrow, Cow};
use std::ops::Deref;

#[derive(Debug)]
struct MyString {
    data: String
}

#[derive(Debug)]
struct MyStr {
    data: str,
}

Since str is unsized, so is MyStr. You can then bound MyString and MyStr same way as String and str are bounded:

impl Borrow<MyStr> for MyString {
    fn borrow(&self) -> &MyStr {
        unsafe { &*(self.data.as_str() as *const str as *const MyStr) }
    }
}

impl ToOwned for MyStr {
    type Owned = MyString;

    fn to_owned(&self) -> MyString {
        MyString {
            data: self.data.to_owned()
        }
    }
}

The unsafe pointer case inside the borrow method has probably drawn your attention. While looking scary, it is the usual pattern in the standard library (have a look at e.g. Path type implementation). Since MyStr is a single field struct, it is guarantied to have zero cost compile-time representation. It means we can safely cast the valid pointer to str to the pointer to MyStr and then convert it to a reference.

We could also optionally implement the Deref trait for convenience and store MyString and MyStr into cow as well, taking all advantages provided.

impl Deref for MyString {
    type Target = MyStr;

    fn deref(&self) -> &Self::Target {
        self.borrow()
    }
}


fn main()  {
    let data = MyString { data: "Hello world".to_owned() };

    let borrowed_cow: Cow<'_, MyStr> = Cow::Borrowed(&data);
    println!("{:?}", borrowed_cow);

    let owned_cow: Cow<'_, MyStr> = Cow::Owned(data);
    println!("{:?}", owned_cow);
}

Borrow the type as dyn Trait

As mentioned above, the trait object is another example of dynamically sized type. Somewhat surprising, we can use Cow in a similar manner to implement dynamic dispatch, similarly to Box<dyn Trait> and Arc<dyn Trait>.

Consider the following trait and struct implementations:

use std::borrow::{Borrow, Cow};
use std::fmt::Debug;
use std::ops::Deref;

trait MyTrait: Debug {
    fn data(&self) -> &str;
}

#[derive(Debug)]
struct MyString {
    data: String
}

impl MyTrait for MyString {
    fn data(&self) -> &str {
        &self.data
    }
}

As MyString implements MyTrait, we can borrow &MyString as &dyn MyTrait:

impl<'a> Borrow<dyn MyTrait + 'a> for MyString {
    fn borrow(&self) -> &(dyn MyTrait + 'a) {
        self
    }
}

We can also convert any MyTrait implementation to MyString:

impl ToOwned for dyn MyTrait {
    type Owned = MyString;

    fn to_owned(&self) -> MyString {
        MyString {
            data: self.data().to_owned()
        }
    }
}

Since we have defined Borrow and ToOwned, we can now put MyString into Cow<dyn MyTrait>:

fn main()  {
    let data = MyString { data: "Hello world".to_owned() };

    let borrowed_cow: Cow<'_, dyn MyTrait> = Cow::Borrowed(&data);
    println!("{:?}", borrowed_cow);

    let owned_cow: Cow<'_, dyn MyTrait> = Cow::Owned(data);
    println!("{:?}", owned_cow);
}

The above could be useful to implement, e.g. the mutable vector of the trait objects:

fn main()  {
    let data = MyString { data: "Hello world".to_owned() };
    let cow1: Cow<'_, dyn MyTrait> = Cow::Borrowed(&data);

    let data = MyString { data: "Hello world".to_owned() };
    let cow2: Cow<'_, dyn MyTrait> = Cow::Owned(data);

    let mut vector: Vec<Cow<'_, dyn MyTrait>> = vec![cow1, cow2];
}

Implement safe wrapper over FFI type

The above MyString example is exciting but somewhat artificial. Let's consider the real-life pattern when you would like to store your own type inside the Cow.

Imagine you are using the C library in your rust project. Let's say you receive a buffer of data from the C code in the form of the pointer *const u8 and length usize. Say you would like to pass the data around the layer of the rust logic, possibly modifying it (does it trigger you to think about Cow?). Finally, you might want to access the data (modified or not) in rust as &[u8] or pass into another C function as the pointer *const u8 and length usize.(Here we assume that this C function would not release the memory. If this assumption surprises you, consider reading 7 ways to pass a string between πŸ¦€ Rust and C article)

As we would like to avoid cloning the data unnecessarily, we would represent the buffer as the following struct:

use std::borrow::{Borrow, Cow};
use std::fmt::{Debug, Formatter};
use std::ops::Deref;

struct NativeBuffer {
    pub ptr: *const u8,
    pub len: usize
}

This struct does not own its data, it borrows it from the C pointer with an unknown lifetime.

For convince only, we can implement the traits to access the buffer as &[u8] slice and print it:

impl Borrow<[u8]> for NativeBuffer {
    fn borrow(&self) -> &[u8] {
        unsafe {
            std::slice::from_raw_parts(self.ptr, self.len)
        }
    }
}

impl Deref for NativeBuffer {
    type Target = [u8];

    fn deref(&self) -> &Self::Target {
        self.borrow()
    }
}

impl Debug for NativeBuffer {
    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
        let data: &[u8] = self.borrow();
        write!(f, "NativeBuffer {{ data: {:?}, len: {} }}", data, self.len)
    }
}

In order to store the NativeBuffer in the Cow we first need to define the owning version of it:

#[derive(Debug)]
struct OwnedBuffer {
    owned_data: Vec<u8>,
    native_proxy: NativeBuffer,
}

impl ToOwned for NativeBuffer {
    type Owned = OwnedBuffer;

    fn to_owned(&self) -> OwnedBuffer {
        let slice: &[u8] = self.borrow();
        let owned_data = slice.to_vec();
        let native_proxy = NativeBuffer {
            ptr: owned_data.as_ptr(),
            len: owned_data.len()
        };
        OwnedBuffer {
            owned_data,
            native_proxy,
        }
    }
}

The trick is to borrow the data as a slice and convert it to Vec. We also need to store the NativeBuffer inside OwnedBuffer. It contains a pointer to the data inside the vector and the length of it, so we could implement the Borrow trait:

impl Borrow<NativeBuffer> for OwnedBuffer {
    fn borrow(&self) -> &NativeBuffer {
        &self.native_proxy
    }
}

We can now define the method to mutate the data:

impl OwnedBuffer {

    pub fn append(&mut self, data: &[u8]) {
        self.owned_data.extend(data);
        self.native_proxy = NativeBuffer {
            ptr: self.owned_data.as_ptr(),
            len: self.owned_data.len()
        };
    }
}

It is important to ensure to keep the native buffer pointers up to date.

We can finally put our borrowed buffer in the Cow and implement the conditional mutation logic, for example:

fn main() {
    // Simulates the data coming across FFI (from C)
    let data = vec![1, 2, 3];
    let ptr = data.as_ptr();
    let len = data.len();

    let native_buffer = NativeBuffer { ptr, len};
    let mut buffer = Cow::Borrowed(&native_buffer);
    // NativeBuffer { data: [1, 2, 3], len: 3 }
    println!("{:?}", buffer);

    // No data cloned
    assert_eq!(buffer.ptr, ptr);
    assert_eq!(buffer.len, len);

    if buffer.len > 1 {
        buffer.to_mut().append(&[4, 5, 6]);
        // OwnedBuffer { owned_data: [1, 2, 3, 4, 5, 6], native_proxy: NativeBuffer { data: [1, 2, 3, 4, 5, 6], len: 6 } }
        println!("{:?}", buffer);

        // Data is cloned
        assert_ne!(buffer.ptr, ptr);
        assert_eq!(buffer.len, len + 3);
    }

    let slice: &[u8] = &buffer;
    // [1, 2, 3, 4, 5, 6]
    println!("{:?}", slice);
}

The buffer is only cloned if the length of it is bigger than 1.

Summary

I sincerely hope that this post helped to demystify the Cow type and increase its adoption among the rust community! If you like the article, please put your reaction up and consider reading my other posts!


This content originally appeared on DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’» and was authored by Konstantin Grechishchev


Print Share Comment Cite Upload Translate Updates
APA

Konstantin Grechishchev | Sciencx (2022-10-02T18:37:02+00:00) 6 things you can do with the Cow πŸ„ in Rust πŸ¦€. Retrieved from https://www.scien.cx/2022/10/02/6-things-you-can-do-with-the-cow-%f0%9f%90%84-in-rust-%f0%9f%a6%80/

MLA
" » 6 things you can do with the Cow πŸ„ in Rust πŸ¦€." Konstantin Grechishchev | Sciencx - Sunday October 2, 2022, https://www.scien.cx/2022/10/02/6-things-you-can-do-with-the-cow-%f0%9f%90%84-in-rust-%f0%9f%a6%80/
HARVARD
Konstantin Grechishchev | Sciencx Sunday October 2, 2022 » 6 things you can do with the Cow πŸ„ in Rust πŸ¦€., viewed ,<https://www.scien.cx/2022/10/02/6-things-you-can-do-with-the-cow-%f0%9f%90%84-in-rust-%f0%9f%a6%80/>
VANCOUVER
Konstantin Grechishchev | Sciencx - » 6 things you can do with the Cow πŸ„ in Rust πŸ¦€. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2022/10/02/6-things-you-can-do-with-the-cow-%f0%9f%90%84-in-rust-%f0%9f%a6%80/
CHICAGO
" » 6 things you can do with the Cow πŸ„ in Rust πŸ¦€." Konstantin Grechishchev | Sciencx - Accessed . https://www.scien.cx/2022/10/02/6-things-you-can-do-with-the-cow-%f0%9f%90%84-in-rust-%f0%9f%a6%80/
IEEE
" » 6 things you can do with the Cow πŸ„ in Rust πŸ¦€." Konstantin Grechishchev | Sciencx [Online]. Available: https://www.scien.cx/2022/10/02/6-things-you-can-do-with-the-cow-%f0%9f%90%84-in-rust-%f0%9f%a6%80/. [Accessed: ]
rf:citation
» 6 things you can do with the Cow πŸ„ in Rust πŸ¦€ | Konstantin Grechishchev | Sciencx | https://www.scien.cx/2022/10/02/6-things-you-can-do-with-the-cow-%f0%9f%90%84-in-rust-%f0%9f%a6%80/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.