Using str and String interchangably

喜你入骨 提交于 2020-02-03 04:46:47

问题


Suppose I'm trying to do a fancy zero-copy parser in Rust using &str, but sometimes I need to modify the text (e.g. to implement variable substitution). I really want to do something like this:

fn main() {
    let mut v: Vec<&str> = "Hello there $world!".split_whitespace().collect();

    for t in v.iter_mut() {
        if (t.contains("$world")) {
            *t = &t.replace("$world", "Earth");
        }
    }

    println!("{:?}", &v);
}

But of course the String returned by t.replace() doesn't live long enough. Is there a nice way around this? Perhaps there is a type which means "ideally a &str but if necessary a String"? Or maybe there is a way to use lifetime annotations to tell the compiler that the returned String should be kept alive until the end of main() (or have the same lifetime as v)?


回答1:


Rust has exactly what you want in form of a Cow (Clone On Write) type.

use std::borrow::Cow;

fn main() {
    let mut v: Vec<_> = "Hello there $world!".split_whitespace()
                                             .map(|s| Cow::Borrowed(s))
                                             .collect();

    for t in v.iter_mut() {
        if t.contains("$world") {
            *t.to_mut() = t.replace("$world", "Earth");
        }
    }

    println!("{:?}", &v);
}

as @sellibitze correctly notes, the to_mut() creates a new String which causes a heap allocation to store the previous borrowed value. If you are sure you only have borrowed strings, then you can use

*t = Cow::Owned(t.replace("$world", "Earth"));

In case the Vec contains Cow::Owned elements, this would still throw away the allocation. You can prevent that using the following very fragile and unsafe code (It does direct byte-based manipulation of UTF-8 strings and relies of the fact that the replacement happens to be exactly the same number of bytes.) inside your for loop.

let mut last_pos = 0; // so we don't start at the beginning every time
while let Some(pos) = t[last_pos..].find("$world") {
    let p = pos + last_pos; // find always starts at last_pos
    last_pos = pos + 5;
    unsafe {
        let s = t.to_mut().as_mut_vec(); // operating on Vec is easier
        s.remove(p); // remove $ sign
        for (c, sc) in "Earth".bytes().zip(&mut s[p..]) {
            *sc = c;
        }
    }
}

Note that this is tailored exactly to the "$world" -> "Earth" mapping. Any other mappings require careful consideration inside the unsafe code.




回答2:


std::borrow::Cow, specifically used as Cow<'a, str>, where 'a is the lifetime of the string being parsed.

use std::borrow::Cow;

fn main() {
    let mut v: Vec<Cow<'static, str>> = vec![];
    v.push("oh hai".into());
    v.push(format!("there, {}.", "Mark").into());

    println!("{:?}", v);
}

Produces:

["oh hai", "there, Mark."]


来源:https://stackoverflow.com/questions/31240091/using-str-and-string-interchangably

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!