Concatenate a vector of vectors of strings

耗尽温柔 提交于 2019-12-10 15:15:16

问题


I'm trying to write a function that receives a vector of vectors of strings and returns all vectors concatenated together, i.e. it returns a vector of strings.

The best I could do so far has been the following:

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
    let vals : Vec<&String> = vecs.iter().flat_map(|x| x.into_iter()).collect();
    vals.into_iter().map(|v: &String| v.to_owned()).collect()
}

However, I'm not happy with this result, because it seems I should be able to get Vec<String> from the first collect call, but somehow I am not able to figure out how to do it.

I am even more interested to figure out why exactly the return type of collect is Vec<&String>. I tried to deduce this from the API documentation and the source code, but despite my best efforts, I couldn't even understand the signatures of functions.

So let me try and trace the types of each expression:

- vecs.iter(): Iter<T=Vec<String>, Item=Vec<String>>
- vecs.iter().flat_map(): FlatMap<I=Iter<Vec<String>>, U=???, F=FnMut(Vec<String>) -> U, Item=U>
- vecs.iter().flat_map().collect(): (B=??? : FromIterator<U>)
- vals was declared as Vec<&String>, therefore 
      vals == vecs.iter().flat_map().collect(): (B=Vec<&String> : FromIterator<U>). Therefore U=&String.

I'm assuming above that the type inferencer is able to figure out that U=&String based on the type of vals. But if I give the expression the explicit types in the code, this compiles without error:

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
    let a: Iter<Vec<String>> = vecs.iter();
    let b: FlatMap<Iter<Vec<String>>, Iter<String>, _> = a.flat_map(|x| x.into_iter());
    let c = b.collect();
    print_type_of(&c);
    let vals : Vec<&String> = c;
    vals.into_iter().map(|v: &String| v.to_owned()).collect()
}

Clearly, U=Iter<String>... Please help me clear up this mess.

EDIT: thanks to bluss' hint, I was able to achieve one collect as follows:

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
    vecs.into_iter().flat_map(|x| x.into_iter()).collect()
}

My understanding is that by using into_iter I transfer ownership of vecs to IntoIter and further down the call chain, which allows me to avoid copying the data inside the lambda call and therefore - magically - the type system gives me Vec<String> where it used to always give me Vec<&String> before. While it is certainly very cool to see how the high-level concept is reflected in the workings of the library, I wish I had any idea how this is achieved.

EDIT 2: After a laborious process of guesswork, looking at API docs and using this method to decipher the types, I got them fully annotated (disregarding the lifetimes):

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
    let a: Iter<Vec<String>> = vecs.iter();
    let f : &Fn(&Vec<String>) -> Iter<String> = &|x: &Vec<String>| x.into_iter();
    let b: FlatMap<Iter<Vec<String>>, Iter<String>, &Fn(&Vec<String>) -> Iter<String>> = a.flat_map(f);
    let vals : Vec<&String> = b.collect();
    vals.into_iter().map(|v: &String| v.to_owned()).collect()
}

回答1:


I'd think about: why do you use iter() on the outer vec but into_iter() on the inner vecs? Using into_iter() is actually crucial, so that we don't have to copy first the inner vectors, then the strings inside, we just receive ownership of them.

We can actually write this just like a summation: concatenate the vectors two by two. Since we always reuse the allocation & contents of the same accumulation vector, this operation is linear time.

To minimize time spent growing and reallocating the vector, calculate the space needed up front.

fn concat_vecs(vecs: Vec<Vec<String>>) -> Vec<String> {
    let size = vecs.iter().fold(0, |a, b| a + b.len());
    vecs.into_iter().fold(Vec::with_capacity(size), |mut acc, v| {
        acc.extend(v); acc
    })
}

If you do want to clone all the contents, there's already a method for that, and you'd just use vecs.concat() /* -> Vec<String> */


The approach with .flat_map is fine, but if you don't want to clone the strings again you have to use .into_iter() on all levels: (x is Vec<String>).

vecs.into_iter().flat_map(|x| x.into_iter()).collect()

If instead you want to clone each string you can use this: (Changed .into_iter() to .iter() since x here is a &Vec<String> and both methods actually result in the same thing!)

vecs.iter().flat_map(|x| x.iter().map(Clone::clone)).collect()



来源:https://stackoverflow.com/questions/31126048/concatenate-a-vector-of-vectors-of-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!