Can't build a rusqlite transaction inside loop: use of moved value and cannot borrow as mutable more than once at a time

问题

In order to speed up insertions into a SQLite DB using rusqlite, I want to build a transaction inside a for loop and only commit every N iterations.

The following code compiles but it builds a single transaction and commits it all in one go:

use rusqlite::{Connection, Result, NO_PARAMS};

fn main() -> Result<()> {
    let mut conn = Connection::open_in_memory()?;

    conn.execute(
        "CREATE TABLE entry (
            id   INTEGER PRIMARY KEY,
            data INTEGER
        )",
        NO_PARAMS,
    )?;

    let tx = conn.transaction()?;
    for i in 0..20 {
        tx.execute("INSERT INTO entry (data) VALUES (?1)", &[i])?;
    }
    tx.commit()?;

    Ok(())
}

My use case would require building a transaction with several millions of inserts, so what I would like to do instead is to accumulate on the transaction and when it reaches transaction_size commit it and start over with a new transaction. A non-compiling version would look like this:

let transaction_size = 5;
let tx = conn.transaction()?;
for i in 0..20 {
    if (i % transaction_size) == (transaction_size - 1) {
        tx.commit()?;
        let tx = conn.transaction()?;
    }
    tx.execute("INSERT INTO entry (data) VALUES (?1)", &[i])?;
}

The borrow checker won't allow this for two reasons.

error[E0382]: use of moved value: `tx`
  --> src/main.rs:18:13
   |
15 |     let tx = conn.transaction()?;
   |         -- move occurs because `tx` has type `rusqlite::transaction::Transaction<'_>`, which does not implement the `Copy` trait
...
18 |             tx.commit()?;
   |             ^^ value moved here, in previous iteration of loop

error[E0499]: cannot borrow `conn` as mutable more than once at a time
  --> src/main.rs:19:22
   |
15 |     let tx = conn.transaction()?;
   |              ---- first mutable borrow occurs here
...
19 |             let tx = conn.transaction()?;
   |                      ^^^^ second mutable borrow occurs here
20 |         }
21 |         tx.execute("INSERT INTO entry (data) VALUES (?1)", &[i])?;
   |         -- first borrow later used here

The first complaint makes sense to me. The second not so much, as the following will compile (but I'm inserting only one row per transaction):

for i in 0..20 {
    let tx = conn.transaction()?;
    tx.execute("INSERT INTO entry (data) VALUES (?1)", &[i])?;
    tx.commit()?;
}

I've tried using a let tx = if cond { tx.commit()?; conn.transaction()? } inside the loop but you need an else clause for it to type check.

I can't figure out how to achieve my goal while making the compiler happy. Perhaps there is some way of doing it with unsafe features but I'm quite new to Rust.

EDIT

I forgot to mention that I would like to considert my iterator as single use.

Using the idea of separating the logic for building the transaction into do_batch from @Sébastien Renauld I made this version that will accumulate the data that has to be added into the transaction with a mutable vector. It then builds and commits the transaction in chunks of size transaction_size.

use rusqlite::{Connection, Result, Transaction, NO_PARAMS};
use std::vec::Vec;

fn do_batch<'a>(tx: &Transaction<'a>, transaction_accum: &Vec<i32>) -> Result<()> {
    for i in transaction_accum.iter() {
        tx.execute("INSERT INTO entry (data) values (?1)", &[i])?;
    }
    Ok(())
}

fn main() -> Result<()> {
    let mut conn = Connection::open_in_memory()?;

    conn.execute(
        "CREATE TABLE entry (
            id   INTEGER PRIMARY KEY,
            data INTEGER
        )",
        NO_PARAMS,
    )?;

    let transaction_size = 5;
    let mut transaction_accum: Vec<i32> = Vec::new();
    for i in 1..20 {
        transaction_accum.push(i);

        if (i % transaction_size) == (transaction_size - 1) {
            let tx = conn.transaction()?;
            do_batch(&tx, &transaction_accum)?;
            transaction_accum.clear();
            tx.commit()?;
        }
    }
    Ok(())
}

EDIT 2

After yet another suggestion by @Sébastien Renauld I stumbled upon the itertools crate which will let you chunk the output from an iterator which gives the following nice and clean solution. My only worry about it is that in order to make the chunks the whole iterator is realized under the cover when calling chunks. Is this the case?

use rusqlite::{Connection, Result, Transaction, NO_PARAMS};
use std::vec::Vec;
use itertools::Itertools;


fn do_batch<'a>(tx: &Transaction<'a>, transaction_accum: &Vec<i32>) -> Result<()> {
    for i in transaction_accum.iter() {
        tx.execute("INSERT INTO entry (data) values (?1)", &[i])?;
    }
    Ok(())
}

fn main() -> Result<()> {
    let mut conn = Connection::open_in_memory()?;

    conn.execute(
        "CREATE TABLE entry (
            id   INTEGER PRIMARY KEY,
            data INTEGER
        )",
        NO_PARAMS,
    )?;

    let transaction_size = 5;
    let my_iter = 1..20; // this is really a WalkDir from the walkdir crate
    for chunk in &my_iter.into_iter().chunks(transaction_size) {
        let tx = conn.transaction()?;
        do_batch(&tx, &chunk.collect())?;
        tx.commit()?;
    }
    Ok(())
}

回答1:

This is a SQL question more than it is a Rust one, but I'll explain both why you're running into this, and how it shows up in Rust.

This all stems from a basic misconception about transactional databases, and it applies to every single RDBMS out there supporting transactions. The point of a transaction is to open what can be seen as a separate slate on the server; you then do state changes on that, like add or delete rows, and then you turn your separate slate into the "real" state of the server. Depending on what DB engine you're using, this will materialize differently, but for our purposes today with your question, this analogy will do.

Instead of doing this, you are opening your transaction, doing one insert and then immediately handing the slate back with commit(). Notice its signature:

fn commit(self) -> Result<()>

Just as we would expect, commit() takes self, not &mut self. By committing (or rolling back), you are telling the server that you are done with this transaction.

To fix this, you need to decide how you want to go about it in terms of the database. Batching is a good idea, which you've already found, but you need to make sure you can afford to have a failure of one batch and repeat. As such, we're going to split things up a bit.

First, we're going to build our batch builder. We'll need this, particularly if we ever intend to replay a batch:

fn do_batch<'a>(tx: &mut Transaction<'a>) -> Result<(), rusqlite::Error> {
    for i in 0..20 {
        tx.execute("INSERT INTO entry (data) values (?1", &[i])?;
    }
    Ok(())
}

Then, we build the structure around it:

fn do_tx(mut conn: Connection) -> Result<(), rusqlite::Error> {
    for i in 0..20 {
        // Open the TX
        let mut tx = conn.transaction()?;
        do_batch(&mut tx)?;
        // Do your error handling here. If the batch fails, you want to decide whether to retry or abort.
        tx.commit()?;
    }
    Ok(())
}

It is always worth separating concerns if possible, and it is always worth passing a transaction around if you need it; that's what they are there for. Let your functions build the batch, then handle the commit/rollback behavior in an overarching structure of some sort.

As you mentioned in the comments, you are walking a tree. For the purpose of this, I'm just going to assume you've already flattened your iterator (i.e. your N-dimensional tree is represented by a 1-dimensional iterator), and that this iterator lives under tree_walker.

There is currently no chunks() method defined on an iterator, which is what you would need. For brevity, we're just going to collect() then use Vec::chunks(). For most workloads this shouldn't be a problem, but if you find this allocation too large in size, you can reimplement it yourself relatively easily.

use rusqlite::Error;
use rusqlite::{Connection, Transaction};

fn do_batch<'a>(tx: &Transaction<'a>, transaction_accum: &[i32]) -> Result<(), rusqlite::Error> {
    for i in transaction_accum.iter() {
        tx.execute("INSERT INTO entry (data) values (?1)", &[i])?;
    }
    Ok(())
}
fn commit(
    mut conn: Connection,
    tree_walker: impl Iterator<Item = i32>,
    batch_size: usize,
) -> Result<(), rusqlite::Error> {
    let collected: Vec<i32> = tree_walker.collect();
    collected
        .chunks(batch_size)
        .fold(Ok(()), |current, elements| {
            current.and_then(|_| {
                let tx = conn.transaction()?;
                do_batch(&tx, &elements)?;
                tx.commit()
            })
        })
}

回答2:

There is an important misconception on line 6 in the following snippet:

let transaction_size = 5;
let tx = conn.transaction()?;
for i in 0..20 {
    if (i % transaction_size) == (transaction_size - 1) {
        tx.commit()?;
        let tx = conn.transaction()?; // <-- HERE
    }
    tx.execute("INSERT INTO entry (data) VALUES (?1)", &[i])?;
}

This line does not replace the tx variable that was created on line 2, but instead it creates a new variable named tx that shadows the first one for the duration of the if block and that gets dropped at the end of it. So when you get to the tx.execute, you are back trying to use the transaction that you already committed instead of the new transaction.

What you want is:

let transaction_size = 5;
let mut tx = conn.transaction()?; // <-- Note the `mut` so that we can change it later to a new one
for i in 0..20 {
    if (i % transaction_size) == (transaction_size - 1) {
        tx.commit()?;
        tx = conn.transaction()?; // <-- No `let` -> replace the existing `tx`
    }
    tx.execute("INSERT INTO entry (data) VALUES (?1)", &[i])?;
}
tx.commit()?; // <- Don't forget to commit the last transaction.

来源：https://stackoverflow.com/questions/58088362/cant-build-a-rusqlite-transaction-inside-loop-use-of-moved-value-and-cannot-bo

标签

sqlite

rust

borrow-checker