How can I read a file line-by-line, eliminate duplicates, then write back to the same file?

二次信任 提交于 2020-07-09 11:36:10

问题


I want to read a file, eliminate all duplicates and write the rest back into the file - like a duplicate cleaner. Vec because a normal array has a fixed size but my .txt is flexible (am I doing this right?).

Read, lines in Vec + delete duplices: Missing write back to file.

use std::io;

fn main() {
    let path = Path::new("test.txt");
    let mut file = io::BufferedReader::new(io::File::open(&path, R));

    let mut lines: Vec<String> = file.lines().map(|x| x.unwrap()).collect();
    // dedup() deletes all duplicates if sort() before
    lines.sort();
    lines.dedup();

    for e in lines.iter() {
        print!("{}", e.as_slice());
    }
}

Read + write to file (untested but should work I guess). Missing lines to Vec because it doesn't work without BufferedReader as it seems (or I'm doing something else wrong, also a good chance).

use std::io;

fn main() {
    let path = Path::new("test.txt");
    let mut file = match io::File::open_mode(&path, io::Open, io::ReadWrite) {
        Ok(f) => f,
        Err(e) => panic!("file error: {}", e),
    };  
    let mut lines: Vec<String> = file.lines().map(|x| x.unwrap()).collect();
    lines.sort();
    // dedup() deletes all duplicates if sort() before
    lines.dedup();

    for e in lines.iter() {
        file.write("{}", e);
    }
} 

So .... how do I get those 2 together? :)


回答1:


Ultimately, you are going to run into a problem: you are trying to write to the same file you are reading from. In this case, it's safe because you are going to read the entire file, so you don't need it after that. However, if you did try to write to the file, you'd see that opening a file for reading doesn't allow writing! Here's the code to do that:

use std::{
    fs::File,
    io::{BufRead, BufReader, Write},
};

fn main() {
    let mut file = File::open("test.txt").expect("file error");
    let reader = BufReader::new(&mut file);

    let mut lines: Vec<_> = reader
        .lines()
        .map(|l| l.expect("Couldn't read a line"))
        .collect();

    lines.sort();
    lines.dedup();

    for line in lines {
        file.write_all(line.as_bytes())
            .expect("Couldn't write to file");
    }
}

Here's the output:

% cat test.txt
    a
    a
    b
    a
                                                                                                                                                                                                                                     % cargo run
thread 'main' panicked at 'Couldn't write to file: Os { code: 9, kind: Other, message: "Bad file descriptor" }', src/main.rs:12:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

You could open the file for both reading and writing:

use std::{
    fs::OpenOptions,
    io::{BufRead, BufReader, Write},
};

fn main() {
    let mut file = OpenOptions::new()
        .read(true)
        .write(true)
        .open("test.txt")
        .expect("file error");

    // Remaining code unchanged
}

But then you'd see that (a) the output is appended and (b) all the newlines are lost on the new lines because BufRead doesn't include them.

We could reset the file pointer back to the beginning, but then you'd probably leave trailing stuff at the end (deduplicating is likely to have less bytes written than read). It's easier to just reopen the file for writing, which will truncate the file. Also, let's use a set data structure to do the deduplication for us!

use std::{
    collections::BTreeSet,
    fs::File,
    io::{BufRead, BufReader, Write},
};

fn main() {
    let file = File::open("test.txt").expect("file error");
    let reader = BufReader::new(file);

    let lines: BTreeSet<_> = reader
        .lines()
        .map(|l| l.expect("Couldn't read a line"))
        .collect();

    let mut file = File::create("test.txt").expect("file error");

    for line in lines {
        file.write_all(line.as_bytes())
            .expect("Couldn't write to file");

        file.write_all(b"\n").expect("Couldn't write to file");
    }
}

And the output:

% cat test.txt
a
a
b
a
a
b
a
b

% cargo run
% cat test.txt
a
b

The less-efficient but shorter solution is to read the entire file as one string and use str::lines:

use std::{
    collections::BTreeSet,
    fs::{self, File},
    io::Write,
};

fn main() {
    let contents = fs::read_to_string("test.txt").expect("can't read");
    let lines: BTreeSet<_> = contents.lines().collect();

    let mut file = File::open("test.txt").expect("can't create");
    for line in lines {
        writeln!(file, "{}", line).expect("can't write");
    }
}

See also:

  • What's the de-facto way of reading and writing files in Rust 1.x?
  • What is the best variant for appending a new line in a text file?


来源:https://stackoverflow.com/questions/27871299/how-can-i-read-a-file-line-by-line-eliminate-duplicates-then-write-back-to-the

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!