1 场景描述
在分布式应用, 往往存在多个进程提供同一服务. 这些进程有可能在相同的机器上, 也有可能分布在不同的机器上. 如果这些进程共享了一些资源, 可能就需要分布式锁来锁定对这些资源的访问。
2 思路
进程需要访问共享数据时, 就在"/locks"节点下创建一个sequence类型的子节点, 称为thisPath. 当thisPath在所有子节点中最小时, 说明该进程获得了锁. 进程获得锁之后, 就可以访问共享资源了. 访问完成后, 需要将thisPath删除. 锁由新的最小的子节点获得.
有了清晰的思路之后, 还需要补充一些细节. 进程如何知道thisPath是所有子节点中最小的呢? 可以在创建的时候, 通过getChildren方法获取子节点列表, 然后在列表中找到排名比thisPath前1位的节点, 称为waitPath, 然后在waitPath上注册监听, 当waitPath被删除后, 进程获得通知, 此时说明该进程获得了锁.
3 算法
-
lock操作过程:
首先为一个lock场景,在zookeeper中指定对应的一个根节点,用于记录资源竞争的内容;
每个lock创建后,会lazy在zookeeper中创建一个node节点,表明对应的资源竞争标识。 (小技巧:
node节点为EPHEMERAL_SEQUENTIAL,自增长的临时节点);进行lock操作时,获取对应lock根节点下的所有子节点,也即处于竞争中的资源标识;
按照
Fair(公平)竞争的原则,按照对应的自增内容做排序,取出编号最小的一个节点做为lock的owner,判断自己的节点id是否就为owner id,如果是则返回,lock成功。如果自己非owner id,按照排序的结果
找到序号比自己前一位的id,关注它锁释放的操作(也就是exist watcher),形成一个链式的触发过程; -
unlock操作过程:
将自己id对应的节点删除即可,对应的下一个排队的节点就可以收到Watcher事件,从而被唤醒得到锁后退出; -
其中的几个关键点:
node节点选择为EPHEMERAL_SEQUENTIAL很重要。
自增长的特性,可以方便构建一个基于Fair特性的锁,前一个节点唤醒后一个节点,形成一个链式的触发过程。可以有效的避免"惊群效应"(一个锁释放,所有等待的线程都被唤醒),有针对性的唤醒,提升性能。选择一个EPHEMERAL临时节点的特性。因为和zookeeper交互是一个网络操作,不可控因素过多,比如网络断了,上一个节点释放锁的操作会失败。临时节点是和对应的session挂接的,session一旦超时或者异常退出其节点就会消失,类似于ReentrantLock中等待队列Thread的被中断处理。获取lock操作是一个阻塞的操作,而对应的Watcher是一个异步事件,所以需要使用互斥信号共享锁BooleanMutex进行通知,可以比较方便的解决锁重入的问题。(锁重入可以理解为多次读操作,锁释放为写抢占操作) -
注意:
使用EPHEMERAL会引出一个风险:在非正常情况下,
网络延迟比较大会出现session timeout,zookeeper就会认为该client已关闭,从而销毁其id标示,竞争资源的下一个id就可以获取锁。这时可能会有两个process同时拿到锁在跑任务,所以设置好session timeout很重要。同样
使用PERSISTENT同样会存在一个死锁的风险,进程异常退出后,对应的竞争资源id一直没有删除,下一个id一直无法获取到锁对象。
4 实现
1. DistributedLock.java源码:分布式锁

1 package com.king.lock;
2
3 import java.io.IOException;
4 import java.util.List;
5 import java.util.SortedSet;
6 import java.util.TreeSet;
7
8 import org.apache.commons.lang3.StringUtils;
9 import org.apache.zookeeper.*;
10 import org.apache.zookeeper.data.Stat;
11
12 /**
13 * Zookeeper 分布式锁
14 */
15 public class DistributedLock {
16
17 private static final int SESSION_TIMEOUT = 10000;
18
19 private static final int DEFAULT_TIMEOUT_PERIOD = 10000;
20
21 private static final String CONNECTION_STRING = "127.0.0.1:2180,127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183";
22
23 private static final byte[] data = {0x12, 0x34};
24
25 private ZooKeeper zookeeper;
26
27 private String root;
28
29 private String id;
30
31 private LockNode idName;
32
33 private String ownerId;
34
35 private String lastChildId;
36
37 private Throwable other = null;
38
39 private KeeperException exception = null;
40
41 private InterruptedException interrupt = null;
42
43 public DistributedLock(String root) {
44 try {
45 this.zookeeper = new ZooKeeper(CONNECTION_STRING, SESSION_TIMEOUT, null);
46 this.root = root;
47 ensureExists(root);
48 } catch (IOException e) {
49 e.printStackTrace();
50 other = e;
51 }
52 }
53
54 /**
55 * 尝试获取锁操作,阻塞式可被中断
56 */
57 public void lock() throws Exception {
58 // 可能初始化的时候就失败了
59 if (exception != null) {
60 throw exception;
61 }
62
63 if (interrupt != null) {
64 throw interrupt;
65 }
66
67 if (other != null) {
68 throw new Exception("", other);
69 }
70
71 if (isOwner()) {// 锁重入
72 return;
73 }
74
75 BooleanMutex mutex = new BooleanMutex();
76 acquireLock(mutex);
77 // 避免zookeeper重启后导致watcher丢失,会出现死锁使用了超时进行重试
78 try {
79 // mutex.lockTimeOut(DEFAULT_TIMEOUT_PERIOD, TimeUnit.MICROSECONDS);// 阻塞等待值为true
80 mutex.lock();
81 } catch (Exception e) {
82 e.printStackTrace();
83 if (!mutex.state()) {
84 lock();
85 }
86 }
87
88 if (exception != null) {
89 throw exception;
90 }
91
92 if (interrupt != null) {
93 throw interrupt;
94 }
95
96 if (other != null) {
97 throw new Exception("", other);
98 }
99 }
100
101 /**
102 * 尝试获取锁对象, 不会阻塞
103 *
104 * @throws InterruptedException
105 * @throws KeeperException
106 */
107 public boolean tryLock() throws Exception {
108 // 可能初始化的时候就失败了
109 if (exception != null) {
110 throw exception;
111 }
112
113 if (isOwner()) { // 锁重入
114 return true;
115 }
116
117 acquireLock(null);
118
119 if (exception != null) {
120 throw exception;
121 }
122
123 if (interrupt != null) {
124 Thread.currentThread().interrupt();
125 }
126
127 if (other != null) {
128 throw new Exception("", other);
129 }
130
131 return isOwner();
132 }
133
134 /**
135 * 释放锁对象
136 */
137 public void unlock() throws KeeperException {
138 if (id != null) {
139 try {
140 zookeeper.delete(root + "/" + id, -1);
141 } catch (InterruptedException e) {
142 Thread.currentThread().interrupt();
143 } catch (KeeperException.NoNodeException e) {
144 // do nothing
145 } finally {
146 id = null;
147 }
148 } else {
149 //do nothing
150 }
151 }
152
153 /**
154 * 判断某path节点是否存在,不存在就创建
155 * @param path
156 */
157 private void ensureExists(final String path) {
158 try {
159 Stat stat = zookeeper.exists(path, false);
160 if (stat != null) {
161 return;
162 }
163 zookeeper.create(path, data, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
164 } catch (KeeperException e) {
165 exception = e;
166 } catch (InterruptedException e) {
167 Thread.currentThread().interrupt();
168 interrupt = e;
169 }
170 }
171
172 /**
173 * 返回锁对象对应的path
174 */
175 public String getRoot() {
176 return root;
177 }
178
179 /**
180 * 判断当前是不是锁的owner
181 */
182 public boolean isOwner() {
183 return id != null && ownerId != null && id.equals(ownerId);
184 }
185
186 /**
187 * 返回当前的节点id
188 */
189 public String getId() {
190 return this.id;
191 }
192
193 // ===================== helper method =============================
194
195 /**
196 * 执行lock操作,允许传递watch变量控制是否需要阻塞lock操作
197 */
198 private Boolean acquireLock(final BooleanMutex mutex) {
199 try {
200 do {
201 if (id == null) { // 构建当前lock的唯一标识
202 long sessionId = zookeeper.getSessionId();
203 String prefix = "x-" + sessionId + "-";
204 // 如果第一次,则创建一个节点
205 String path = zookeeper.create(root + "/" + prefix, data, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
206 int index = path.lastIndexOf("/");
207 id = StringUtils.substring(path, index + 1);
208 idName = new LockNode(id);
209 }
210
211 if (id != null) {
212 List<String> names = zookeeper.getChildren(root, false);
213 if (names.isEmpty()) {
214 id = null; // 异常情况,重新创建一个
215 } else {
216 // 对节点进行排序
217 SortedSet<LockNode> sortedNames = new TreeSet<>();
218 for (String name : names) {
219 sortedNames.add(new LockNode(name));
220 }
221
222 if (!sortedNames.contains(idName)) {
223 id = null;// 清空为null,重新创建一个
224 continue;
225 }
226
227 // 将第一个节点做为ownerId
228 ownerId = sortedNames.first().getName();
229 if (mutex != null && isOwner()) {
230 mutex.unlock();// 直接更新状态,返回
231 return true;
232 } else if (mutex == null) {
233 return isOwner();
234 }
235
236 SortedSet<LockNode> lessThanMe = sortedNames.headSet(idName);
237 if (!lessThanMe.isEmpty()) {
238 // 关注一下排队在自己之前的最近的一个节点
239 LockNode lastChildName = lessThanMe.last();
240 lastChildId = lastChildName.getName();
241 // 异步watcher处理
242 Stat stat = zookeeper.exists(root + "/" + lastChildId, new Watcher() {
243 publicvoidprocess(WatchedEvent event) {
244 acquireLock(mutex);
245 }
246 });
247
248 if (stat == null) {
249 acquireLock(mutex);// 如果节点不存在,需要自己重新触发一下,watcher不会被挂上去
250 }
251 } else {
252 if (isOwner()) {
253 mutex.unlock();
254 } else {
255 id = null;// 可能自己的节点已超时挂了,所以id和ownerId不相同
256 }
257 }
258 }
259 }
260 } while (id == null);
261 } catch (KeeperException e) {
262 exception = e;
263 if (mutex != null) {
264 mutex.unlock();
265 }
266 } catch (InterruptedException e) {
267 interrupt = e;
268 if (mutex != null) {
269 mutex.unlock();
270 }
271 } catch (Throwable e) {
272 other = e;
273 if (mutex != null) {
274 mutex.unlock();
275 }
276 }
277
278 if (isOwner() && mutex != null) {
279 mutex.unlock();
280 }
281 return Boolean.FALSE;
282 }
283 }
2. BooleanMutex.java源码:互斥信号共享锁

1 package com.king.lock;
2
3 import java.util.concurrent.TimeUnit;
4 import java.util.concurrent.TimeoutException;
5 import java.util.concurrent.locks.AbstractQueuedSynchronizer;
6
7 /**
8 * 互斥信号共享锁
9 */
10 public class BooleanMutex {
11
12 private Sync sync;
13
14 public BooleanMutex() {
15 sync = new Sync();
16 set(false);
17 }
18
19 /**
20 * 阻塞等待Boolean为true
21 * @throws InterruptedException
22 */
23 public void lock() throws InterruptedException {
24 sync.innerLock();
25 }
26
27 /**
28 * 阻塞等待Boolean为true,允许设置超时时间
29 * @param timeout
30 * @param unit
31 * @throws InterruptedException
32 * @throws TimeoutException
33 */
34 public void lockTimeOut(long timeout, TimeUnit unit) throws InterruptedException, TimeoutException {
35 sync.innerLock(unit.toNanos(timeout));
36 }
37
38 public void unlock(){
39 set(true);
40 }
41
42 /**
43 * 重新设置对应的Boolean mutex
44 * @param mutex
45 */
46 public void set(Boolean mutex) {
47 if (mutex) {
48 sync.innerSetTrue();
49 } else {
50 sync.innerSetFalse();
51 }
52 }
53
54 public boolean state() {
55 return sync.innerState();
56 }
57
58 /**
59 * 互斥信号共享锁
60 */
61 private final class Sync extends AbstractQueuedSynchronizer {
62 private static final long serialVersionUID = -7828117401763700385L;
63
64 /**
65 * 状态为1,则唤醒被阻塞在状态为FALSE的所有线程
66 */
67 private static final int TRUE = 1;
68 /**
69 * 状态为0,则当前线程阻塞,等待被唤醒
70 */
71 private static final int FALSE = 0;
72
73 /**
74 * 返回值大于0,则执行;返回值小于0,则阻塞
75 */
76 protected int tryAcquireShared(int arg) {
77 return getState() == 1 ? 1 : -1;
78 }
79
80 /**
81 * 实现AQS的接口,释放共享锁的判断
82 */
83 protected boolean tryReleaseShared(int ignore) {
84 // 始终返回true,代表可以release
85 return true;
86 }
87
88 privatebooleaninnerState() {
89 return getState() == 1;
90 }
91
92 privatevoidinnerLock()throws InterruptedException {
93 acquireSharedInterruptibly(0);
94 }
95
96 privatevoidinnerLock(long nanosTimeout)throws InterruptedException, TimeoutException {
97 if (!tryAcquireSharedNanos(0, nanosTimeout))
98 throw new TimeoutException();
99 }
100
101 privatevoidinnerSetTrue() {
102 for (;;) {
103 int s = getState();
104 if (s == TRUE) {
105 return; // 直接退出
106 }
107 if (compareAndSetState(s, TRUE)) {// cas更新状态,避免并发更新true操作
108 releaseShared(0);// 释放一下锁对象,唤醒一下阻塞的Thread
109 }
110 }
111 }
112
113 privatevoidinnerSetFalse() {
114 for (;;) {
115 int s = getState();
116 if (s == FALSE) {
117 return; //直接退出
118 }
119 if (compareAndSetState(s, FALSE)) {//cas更新状态,避免并发更新false操作
120 setState(FALSE);
121 }
122 }
123 }
124 }
125 }
3. 相关说明:

4. 测试类:

1 package com.king.lock;
2
3 import java.util.concurrent.CountDownLatch;
4 import java.util.concurrent.ExecutorService;
5 import java.util.concurrent.Executors;
6
7 import org.apache.zookeeper.KeeperException;
8
9 /**
10 * 分布式锁测试
11 * @author taomk
12 * @version 1.0
13 * @since 15-11-19 上午11:48
14 */
15 public class DistributedLockTest {
16
17 public static void main(String [] args) {
18 ExecutorService executor = Executors.newCachedThreadPool();
19 final int count = 50;
20 final CountDownLatch latch = new CountDownLatch(count);
21 for (int i = 0; i < count; i++) {
22 final DistributedLock node = new DistributedLock("/locks");
23 executor.submit(new Runnable() {
24 public void run() {
25 try {
26 Thread.sleep(1000);
27 // node.tryLock(); // 无阻塞获取锁
28 node.lock(); // 阻塞获取锁
29 Thread.sleep(100);
30
31 System.out.println("id: " + node.getId() + " is leader: " + node.isOwner());
32 } catch (InterruptedException e) {
33 e.printStackTrace();
34 } catch (KeeperException e) {
35 e.printStackTrace();
36 } catch (Exception e) {
37 e.printStackTrace();
38 } finally {
39 latch.countDown();
40 try {
41 node.unlock();
42 } catch (KeeperException e) {
43 e.printStackTrace();
44 }
45 }
46
47 }
48 });
49 }
50
51 try {
52 latch.await();
53 } catch (InterruptedException e) {
54 e.printStackTrace();
55 }
56
57 executor.shutdown();
58 }
59 }
60 控制台输出:
61
62 id: x-239027745716109354-0000000248 is leader: true
63 id: x-22854963329433645-0000000249 is leader: true
64 id: x-22854963329433646-0000000250 is leader: true
65 id: x-166970151413415997-0000000251 is leader: true
66 id: x-166970151413415998-0000000252 is leader: true
67 id: x-166970151413415999-0000000253 is leader: true
68 id: x-166970151413416000-0000000254 is leader: true
69 id: x-166970151413416001-0000000255 is leader: true
70 id: x-166970151413416002-0000000256 is leader: true
71 id: x-22854963329433647-0000000257 is leader: true
72 id: x-239027745716109355-0000000258 is leader: true
73 id: x-166970151413416003-0000000259 is leader: true
74 id: x-94912557367427124-0000000260 is leader: true
75 id: x-22854963329433648-0000000261 is leader: true
76 id: x-239027745716109356-0000000262 is leader: true
77 id: x-239027745716109357-0000000263 is leader: true
78 id: x-166970151413416004-0000000264 is leader: true
79 id: x-239027745716109358-0000000265 is leader: true
80 id: x-239027745716109359-0000000266 is leader: true
81 id: x-22854963329433649-0000000267 is leader: true
82 id: x-22854963329433650-0000000268 is leader: true
83 id: x-94912557367427125-0000000269 is leader: true
84 id: x-22854963329433651-0000000270 is leader: true
85 id: x-94912557367427126-0000000271 is leader: true
86 id: x-239027745716109360-0000000272 is leader: true
87 id: x-94912557367427127-0000000273 is leader: true
88 id: x-94912557367427128-0000000274 is leader: true
89 id: x-166970151413416005-0000000275 is leader: true
90 id: x-94912557367427129-0000000276 is leader: true
91 id: x-166970151413416006-0000000277 is leader: true
92 id: x-94912557367427130-0000000278 is leader: true
93 id: x-94912557367427131-0000000279 is leader: true
94 id: x-239027745716109361-0000000280 is leader: true
95 id: x-239027745716109362-0000000281 is leader: true
96 id: x-166970151413416007-0000000282 is leader: true
97 id: x-94912557367427132-0000000283 is leader: true
98 id: x-22854963329433652-0000000284 is leader: true
99 id: x-166970151413416008-0000000285 is leader: true
100 id: x-239027745716109363-0000000286 is leader: true
101 id: x-239027745716109364-0000000287 is leader: true
102 id: x-166970151413416009-0000000288 is leader: true
103 id: x-166970151413416010-0000000289 is leader: true
104 id: x-239027745716109365-0000000290 is leader: true
105 id: x-94912557367427133-0000000291 is leader: true
106 id: x-239027745716109366-0000000292 is leader: true
107 id: x-94912557367427134-0000000293 is leader: true
108 id: x-22854963329433653-0000000294 is leader: true
109 id: x-94912557367427135-0000000295 is leader: true
110 id: x-239027745716109367-0000000296 is leader: true
111 id: x-239027745716109368-0000000297 is leader: true
5 升级版
实现了一个分布式lock后,可以解决多进程之间的同步问题,但设计多线程+多进程的lock控制需求,单jvm中每个线程都和zookeeper进行网络交互成本就有点高了,所以基于DistributedLock,实现了一个分布式二层锁。
大致原理就是ReentrantLock 和 DistributedLock的一个结合:
单jvm的多线程竞争时,首先需要先拿到第一层的ReentrantLock的锁;拿到锁之后这个线程再去和其他JVM的线程竞争锁,最后拿到之后锁之后就开始处理任务;
锁的释放过程是一个反方向的操作,先释放DistributedLock,再释放ReentrantLock。 可以思考一下,如果先释放ReentrantLock,假如这个JVM ReentrantLock竞争度比较高,一直其他JVM的锁竞争容易被饿死。
1. DistributedReentrantLock.java源码:多进程+多线程分布式锁

1 package com.king.lock;
2
3 import java.text.MessageFormat;
4 import java.util.concurrent.locks.ReentrantLock;
5
6 import org.apache.zookeeper.KeeperException;
7
8 /**
9 * 多进程+多线程分布式锁
10 */
11 public class DistributedReentrantLock extends DistributedLock {
12
13 private static final String ID_FORMAT = "Thread[{0}] Distributed[{1}]";
14 private ReentrantLock reentrantLock = new ReentrantLock();
15
16 public DistributedReentrantLock(String root) {
17 super(root);
18 }
19
20 public void lock() throws Exception {
21 reentrantLock.lock();//多线程竞争时,先拿到第一层锁
22 super.lock();
23 }
24
25 public boolean tryLock() throws Exception {
26 //多线程竞争时,先拿到第一层锁
27 return reentrantLock.tryLock() && super.tryLock();
28 }
29
30 public void unlock() throws KeeperException {
31 super.unlock();
32 reentrantLock.unlock();//多线程竞争时,释放最外层锁
33 }
34
35 @Override
36 public String getId() {
37 return MessageFormat.format(ID_FORMAT, Thread.currentThread().getId(), super.getId());
38 }
39
40 @Override
41 public boolean isOwner() {
42 return reentrantLock.isHeldByCurrentThread() && super.isOwner();
43 }
44 }
2. 测试代码:

1 package com.king.lock;
2
3 import java.util.concurrent.CountDownLatch;
4 import java.util.concurrent.ExecutorService;
5 import java.util.concurrent.Executors;
6
7 import org.apache.zookeeper.KeeperException;
8
9 /**
10 * @author taomk
11 * @version 1.0
12 * @since 15-11-23 下午12:15
13 */
14 public class DistributedReentrantLockTest {
15
16 public static void main(String [] args) {
17 ExecutorService executor = Executors.newCachedThreadPool();
18 final int count = 50;
19 final CountDownLatch latch = new CountDownLatch(count);
20
21 final DistributedReentrantLock lock = new DistributedReentrantLock("/locks"); //单个锁
22 for (int i = 0; i < count; i++) {
23 executor.submit(new Runnable() {
24 public void run() {
25 try {
26 Thread.sleep(1000);
27 lock.lock();
28 Thread.sleep(100);
29
30 System.out.println("id: " + lock.getId() + " is leader: " + lock.isOwner());
31 } catch (Exception e) {
32 e.printStackTrace();
33 } finally {
34 latch.countDown();
35 try {
36 lock.unlock();
37 } catch (KeeperException e) {
38 e.printStackTrace();
39 }
40 }
41 }
42 });
43 }
44
45 try {
46 latch.await();
47 } catch (InterruptedException e) {
48 e.printStackTrace();
49 }
50
51 executor.shutdown();
52 }
53 }
6 最后
其实再可以发散一下,实现一个分布式的read/write lock,也差不多就是这个理了。大致思路:
- 竞争资源标示:
read_自增id , write_自增id; - 首先按照自增id进行排序,
如果队列的前边都是read标识,对应的所有read都获得锁。如果队列的前边是write标识,第一个write节点获取锁; - watcher监听:
read监听距离自己最近的一个write节点的exist,write监听距离自己最近的一个节点(read或者write节点);
来源:https://www.cnblogs.com/wxd0108/p/6264748.html
