How do I lock read/write to MySQL tables so that I can select and then insert without other programs reading/writing to the database?

前端未结

关注

 5  973

梦毁少年i

I am running many instances of a webcrawler in parallel.

Each crawler selects a domain from a table, inserts that url and a start time into a log table, and then sta

相关标签:

5条回答

别那么骄傲

2020-12-14 06:48
I got some inspiration from @Eljakim's answer and started this new thread where I figured out a great trick. It doesn't involve locking anything and is very simple.
```
INSERT INTO crawlLog (companyId, timeStartCrawling)
SELECT id, now()
FROM companies
WHERE id NOT IN
(
    SELECT companyId
    FROM crawlLog AS crawlLogAlias
)
LIMIT 1
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
渐次进展

2020-12-14 06:49
You probably don't want to lock the table. If you do that you'll have to worry about trapping errors when the other crawlers try to write to the database - which is what you were thinking when you said "...terribly complex and relies on many other things."

Instead you should probably wrap the group of queries in a MySQL transaction (see http://dev.mysql.com/doc/refman/5.0/en/commit.html) like this:
```
START TRANSACTION;
SELECT @URL:=url FROM tablewiththeurls WHERE uncrawled=1 ORDER BY somecriterion LIMIT 1;
INSERT INTO loggingtable SET url=@URL;
COMMIT;
```
Or something close to that.

[edit] I just realized - you could probably do everything you need in a single query and not even have to worry about transactions. Something like this:
```
INSERT INTO loggingtable (url) SELECT url FROM tablewithurls u LEFT JOIN loggingtable l ON l.url=t.url WHERE {some criterion used to pick the url to work on} AND l.url IS NULL.
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
日久生厌

2020-12-14 06:52
You can lock tables using the MySQL LOCK TABLES command like this:
```
LOCK TABLES tablename WRITE;

# Do other queries here

UNLOCK TABLES;
```
See:

http://dev.mysql.com/doc/refman/5.5/en/lock-tables.html
0 讨论(0)
发布评论:

提交评论
- 加载中...
深忆病人

2020-12-14 06:59
Well, table locks are one way to deal with that; but this makes parallel requests impossible. If the table is InnoDB you could force a row lock instead, using SELECT ... FOR UPDATE within a transaction.
```
BEGIN;

SELECT ... FROM your_table WHERE domainname = ... FOR UPDATE

# do whatever you have to do

COMMIT;
```
Please note that you will need an index on domainname (or whatever column you use in the WHERE-clause) for this to work, but this makes sense in general and I assume you will have that anyway.
0 讨论(0)
发布评论:

提交评论
- 加载中...
伪装坚强ぢ

2020-12-14 07:04
I wouldn't use locking, or transactions.

The easiest way to go is to INSERT a record in the logging table if it's not yet present, and then check for that record.

Assume you have tblcrawels (cra_id) that is filled with your crawlers and tblurl (url_id) that is filled with the URLs, and a table tbllogging (log_cra_id, log_url_id) for your logfile.

You would run the following query if crawler 1 wants to start crawling url 2:
```
INSERT INTO tbllogging (log_cra_id, log_url_id) 
SELECT 1, url_id FROM tblurl LEFT JOIN tbllogging on url_id=log_url 
WHERE url_id=2 AND log_url_id IS NULL;
```
The next step is to check whether this record has been inserted.
```
SELECT * FROM tbllogging WHERE log_url_id=2 AND log_cra_id=1
```
If you get any results then crawler 1 can crawl this url. If you don't get any results this means that another crawler has inserted in the same line and is already crawling.
0 讨论(0)
发布评论:

提交评论
- 加载中...