I turn a regex into a HashSet
after doing some filtering. I am trying to use it with Rayon, but I can't figure out how to make Rayon work with an existing iterator without converting it to a vector first. Is this possible?
let re = Regex::new("url=\"(?P<url>.+?)\"").unwrap();
let urls: HashSet<String> = re.captures_iter(&contents)
.map(|m| Url::parse(m.name("url").unwrap().as_str()))
.filter(|parsed_url| parsed_url.is_ok())
.map(|parsed_url| parsed_url.unwrap())
.filter(|parsed_url| parsed_url.has_host())
.map(|parsed_url| parsed_url.into_string())
.collect();
This answer is outdated for the last version of rayon. See the other answer for a possible solution. It may or may not apply to your usecase.
Minimal reproduction:
extern crate rayon;
use rayon::prelude::*;
fn main() {
let v = vec![1_i32, 2, 3, 4].into_iter();
// no method named `par_iter` found for type `std::vec::IntoIter<i32>`
let _ = v.par_iter().sum();
}
You cannot do that. Here are all the implementors of this feature, that are:
- BinaryHeap
- BTreeMap
- BTreeSet
- HashMap
- HashSet
- LinkedList
- VecDeque
- Option
- Range
- Result
- Slice/Array
I think that the reason why you cannot parallelize them is because iterators are lazy. An iterator is basically a current item Option<Item>
and a next()
method. You cannot split it in two parts to execute them in different threads.
This is possible now with ParallelBridge
:
use rayon::iter::ParallelBridge;
use rayon::prelude::ParallelIterator;
use std::sync::mpsc::channel;
let rx = {
let (tx, rx) = channel();
tx.send("one!");
tx.send("two!");
tx.send("three!");
rx
};
let mut output: Vec<&'static str> = rx.into_iter().par_bridge().collect();
output.sort_unstable();
assert_eq!(&*output, &["one!", "three!", "two!"]);
来源:https://stackoverflow.com/questions/48922420/how-do-i-use-rayon-with-an-existing-iterator