
本文作者系 apache/dubbo-go 项目负责人,目前在 dubbogo 项目中已内置可用 sentinel-go,如果想单独使用可参考 [在 dubbo-go 中使用 sentinel] 一文,若有其他疑问可进 dubbogo 社区【钉钉群 23331795】进行沟通。
导读:本文主要分析阿里巴巴集团开源的流量控制中间件 Sentinel,其原生支持了 Java/Go/C++ 等多种语言,本文仅仅分析其 Go 语言实现。下文如无特殊说明,sentinel 指代 Sentinel-Go。

基本概念 Resource 和 Rule
1. Resource
// ResourceType represents classification of the resourcestype ResourceType int32const (ResTypeCommon ResourceType = iotaResTypeWebResTypeRPC)// TrafficType describes the traffic type: Inbound or Outboundtype TrafficType int32const (// Inbound represents the inbound traffic (e.g. provider)Inbound TrafficType = iota// Outbound represents the outbound traffic (e.g. consumer)Outbound)// ResourceWrapper represents the invocationtype ResourceWrapper struct {// global unique resource namename string// resource classificationclassification ResourceType// Inbound or OutboundflowType TrafficType}
2. Entry
// EntryOptions represents the options of a Sentinel resource entry.type EntryOptions struct {resourceType base.ResourceTypeentryType base.TrafficTypeacquireCount uint32slotChain *base.SlotChain}type EntryContext struct {entry *SentinelEntry// Use to calculate RTstartTime uint64Resource *ResourceWrapperStatNode StatNodeInput *SentinelInput// the result of rule slots checkRuleCheckResult *TokenResult}type SentinelEntry struct {res *ResourceWrapper// one entry bounds with one contextctx *EntryContextsc *SlotChain}
SentinelEntry.sc 值来自于 EntryOptions.slotChain,EntryOptions.slotChain 存储了全局 SlotChain 对象 api/slot_chain.go:globalSlotChain。
SlotChain,就是 sentinel 提供的所有的流控组件的集合,可以简单地认为每个流控组件就是一个 Slot,其详细分析见
[3.5 SlotChain]。
EntryOptions.acquireCount 实在无法让人望文生义,看过函数 core/api.go:WithAcquireCount() 的注释才明白:EntryOptions.acquireCount 是批量动作执行次数。如有的一次 RPC 请求中调用了服务端的一个服务接口,则取值 1【也是 EntryOptions.acquireCount 的默认取值】,如果调用了服务端的 3 个服务接口,则取值 3。所以建议改名为 EntryOptions.batchCount 比较好,考虑到最小改动原则,可以在保留 core/api.go:WithAcquireCount() 的同时增加一个同样功能的 core/api.go:WithBatchCount() 接口。相关
改进已经提交到 pr 263。
3. Rule
type TokenCalculateStrategy int32const (Direct TokenCalculateStrategy = iotaWarmUp)type ControlBehavior int32const (Reject ControlBehavior = iotaThrottling)// Rule describes the strategy of flow control, the flow control strategy is based on QPS statistic metrictype Rule struct {// Resource represents the resource name.Resource string `json:"resource"`ControlBehavior ControlBehavior `json:"controlBehavior"`// Threshold means the threshold during StatIntervalInMs// If StatIntervalInMs is 1000(1 second), Threshold means QPSThreshold float64 `json:"threshold"`MaxQueueingTimeMs uint32 `json:"maxQueueingTimeMs"`// StatIntervalInMs indicates the statistic interval and it's the optional setting for flow Rule.// If user doesn't set StatIntervalInMs, that means using default metric statistic of resource.// If the StatIntervalInMs user specifies can not reuse the global statistic of resource,// sentinel will generate independent statistic structure for this rule.StatIntervalInMs uint32 `json:"statIntervalInMs"`}
4. Flow
1)TrafficShapingController
TrafficShapingController,顾名思义,就是 流量塑形控制器,是流控的具体实施者。
// core/flow/traffic_shaping.go// TrafficShapingCalculator calculates the actual traffic shaping threshold// based on the threshold of rule and the traffic shaping strategy.type TrafficShapingCalculator interface {CalculateAllowedTokens(acquireCount uint32, flag int32) float64}type DirectTrafficShapingCalculator struct {threshold float64}func (d *DirectTrafficShapingCalculator) CalculateAllowedTokens(uint32, int32) float64 {return d.threshold}
TrafficShapingCalculator 接口用于计算限流的上限,如果不使用 warm-up 功能,可以不去深究其实现,其实体之一 DirectTrafficShapingCalculator 返回 Rule.Threshold【用户设定的限流上限】。
// TrafficShapingChecker performs checking according to current metrics and the traffic// shaping strategy, then yield the token result.type TrafficShapingChecker interface {DoCheck(resStat base.StatNode, acquireCount uint32, threshold float64) *base.TokenResult}type RejectTrafficShapingChecker struct {rule *Rule}func (d *RejectTrafficShapingChecker) DoCheck(resStat base.StatNode, acquireCount uint32, threshold float64) *base.TokenResult {metricReadonlyStat := d.BoundOwner().boundStat.readOnlyMetricif metricReadonlyStat == nil {return nil}curCount := float64(metricReadonlyStat.GetSum(base.MetricEventPass))if curCount+float64(acquireCount) > threshold {return base.NewTokenResultBlockedWithCause(base.BlockTypeFlow, "", d.rule, curCount)}return nil}
RejectTrafficShapingChecker 依据 Rule.Threshold 判定 Resource 在当前时间窗口是否超限,其限流结果 TokenResultStatus 只可能是 Pass 或者 Blocked。
ThrottlingChecker,它的目的是让请求匀速被执行,把一个时间窗口【譬如 1s】根据 threshold 再细分为更细的微时间窗口,在每个微时间窗口最多执行一次请求,其限流结果 TokenResultStatus 只可能是 Pass 或者 Blocked 或者 Wait,其相关意义分别为:
-
Pass:在微时间窗口内无超限,请求通过;
-
Wait:在微时间窗口内超限,被滞后若干时间窗口执行,在这段时间内请求需要等待; -
Blocked:在微时间窗口内超限,且等待时间超过用户设定的最大愿意等待时间长度【Rule.MaxQueueingTimeMs】,请求被拒绝。
type TrafficShapingController struct {flowCalculator TrafficShapingCalculatorflowChecker TrafficShapingCheckerrule *Rule// boundStat is the statistic of current TrafficShapingControllerboundStat standaloneStatistic}func (t *TrafficShapingController) PerformChecking(acquireCount uint32, flag int32) *base.TokenResult {allowedTokens := t.flowCalculator.CalculateAllowedTokens(acquireCount, flag)return t.flowChecker.DoCheck(resStat, acquireCount, allowedTokens)}
Direct + Reject 限流的场景下,这三个接口其实并无多大意义,其核心函数 TrafficShapingController.PerformChecking() 的主要流程是:
-
从 TrafficShapingController.boundStat 中获取当前 Resource 的 metrics 值【curCount】;
-
如果 curCount + batchNum(acquireCount) > Rule.Threshold,则 pass,否则就 reject。
TrafficShapingController 四个成员的意义
如下:
-
flowCalculator 计算限流上限;
-
flowChecker 执行限流 Check 动作;
-
rule 存储限流规则;
-
boundStat 存储限流的 Check 结果和时间窗口参数,作为下次限流 Check 动作判定的依据。
2)TrafficControllerMap
TrafficShapingController。
// TrafficControllerMap represents the map storage for TrafficShapingController.type TrafficControllerMap map[string][]*TrafficShapingController// core/flow/rule_manager.gotcMap = make(TrafficControllerMap)
core/flow/rule_manager.go:LoadRules() 会根据用户定义的 Rule 构造其对应的 TrafficShapingController 存入 tcMap,这个接口调用函数 generateStatFor(*Rule) 构造 TrafficShapingController.boundStat。
generateStatFor(*Rule) 的核心
代码如下:
func generateStatFor(rule *Rule) (*standaloneStatistic, error) {resNode = stat.GetOrCreateResourceNode(rule.Resource, base.ResTypeCommon)// default case, use the resource's default statisticreadStat := resNode.DefaultMetric()retStat.reuseResourceStat = trueretStat.readOnlyMetric = readStatretStat.writeOnlyMetric = nilreturn &retStat, nil}

Metrics
1. 原子时间轮 AtomicBucketWrapArray

1)BucketWrap
// BucketWrap represent a slot to record metrics// In order to reduce the usage of memory, BucketWrap don't hold length of BucketWrap// The length of BucketWrap could be seen in LeapArray.// The scope of time is [startTime, startTime+bucketLength)// The size of BucketWrap is 24(8+16) bytestype BucketWrap struct {// The start timestamp of this statistic bucket wrapper.BucketStart uint64// The actual data structure to record the metrics (e.g. MetricBucket).Value atomic.Value}
BucketWrap 为基础的 AtomicBucketWrapArray 会被多个 sentinel 流控组件使用,每个组件的流控参数不一,例如:
-
core/circuitbreaker/circuit_breaker.go:slowRtCircuitBreaker使用的slowRequestLeapArray的底层参数slowRequestCounter;
// core/circuitbreaker/circuit_breaker.gotype slowRequestCounter struct {slowCount uint64totalCount uint64}
-
core/circuitbreaker/circuit_breaker.go:errorRatioCircuitBreaker使用的errorCounterLeapArray的底层参数errorCounter。
// core/circuitbreaker/circuit_breaker.gotype errorCounter struct {errorCount uint64totalCount uint64}
// MetricBucket represents the entity to record metrics per minimum time unit (i.e. the bucket time span).// Note that all operations of the MetricBucket are required to be thread-safe.type MetricBucket struct {// Value of statisticcounter [base.MetricEventTotal]int64minRt int64}
// There are five events to record// pass + block == Totalconst (// sentinel rules check passMetricEventPass MetricEvent = iota// sentinel rules check blockMetricEventBlockMetricEventComplete// Biz error, used for circuit breakerMetricEventError// request execute rt, unit is millisecondMetricEventRt// hack for the number of eventMetricEventTotal)
2)AtomicBucketWrapArray
// atomic BucketWrap array to resolve race condition// AtomicBucketWrapArray can not append or delete element after initializingtype AtomicBucketWrapArray struct {// The base address for real data arraybase unsafe.Pointer// The length of slice(array), it can not be modified.length intdata []*BucketWrap}
2. 时间轮
1)leapArray
// Give a diagram to illustrate// Suppose current time is 888, bucketLengthInMs is 200ms,// intervalInMs is 1000ms, LeapArray will build the below windows// B0 B1 B2 B3 B4// |_______|_______|_______|_______|_______|// 1000 1200 1400 1600 800 (1000)// ^// time=888type LeapArray struct {bucketLengthInMs uint32sampleCount uint32intervalInMs uint32array *AtomicBucketWrapArray// update lockupdateLock mutex}
-
bucketLengthInMs 是漏桶长度,以毫秒为单位;
-
sampleCount 则是时间漏桶个数;
-
intervalInMs 是时间窗口长度,以毫秒为单位。
LeapArray 核心函数是 LeapArray.currentBucketOfTime(),其作用是根据某个时间点获取其做对应的时间桶 BucketWrap,代码
如下:
func (la *LeapArray) currentBucketOfTime(now uint64, bg BucketGenerator) (*BucketWrap, error) {if now <= 0 {return nil, errors.New("Current time is less than 0.")}idx := la.calculateTimeIdx(now)bucketStart := calculateStartTime(now, la.bucketLengthInMs)for { //spin to get the current BucketWrapold := la.array.get(idx)if old == nil {// because la.array.data had initiated when new la.array// theoretically, here is not reachablenewWrap := &BucketWrap{BucketStart: bucketStart,Value: atomic.Value{},}newWrap.Value.Store(bg.NewEmptyBucket())if la.array.compareAndSet(idx, nil, newWrap) {return newWrap, nil} else {runtime.Gosched()}} else if bucketStart == atomic.LoadUint64(&old.BucketStart) {return old, nil} else if bucketStart > atomic.LoadUint64(&old.BucketStart) {// current time has been next cycle of LeapArray and LeapArray dont't count in last cycle.// reset BucketWrapif la.updateLock.TryLock() {old = bg.ResetBucketTo(old, bucketStart)la.updateLock.Unlock()return old, nil} else {runtime.Gosched()}} else if bucketStart < atomic.LoadUint64(&old.BucketStart) {// TODO: reserve for some special case (e.g. when occupying "future" buckets).return nil, errors.New(fmt.Sprintf("Provided time timeMillis=%d is already behind old.BucketStart=%d.", bucketStart, old.BucketStart))}}}
-
获取时间点对应的时间桶 old; -
如果 old 为空,则新建一个时间桶,以原子操作的方式尝试存入时间窗口的时间轮中,存入失败则重新尝试; -
如果 old 就是当前时间点所在的时间桶,则返回; -
如果 old 的时间起点小于当前时间,则通过乐观锁尝试 reset 桶的起始时间等参数值,加锁更新成功则返回; -
如果 old 的时间起点大于当前时间,则系统发生了时间扭曲,返回错误。
2)BucketLeapArray
// The implementation of sliding window based on LeapArray (as the sliding window infrastructure)// and MetricBucket (as the data type). The MetricBucket is used to record statistic// metrics per minimum time unit (i.e. the bucket time span).type BucketLeapArray struct {data LeapArraydataType string}
3. Metric 数据读写
1)SlidingWindowMetric
// SlidingWindowMetric represents the sliding window metric wrapper.// It does not store any data and is the wrapper of BucketLeapArray to adapt to different internal bucket// SlidingWindowMetric is used for SentinelRules and BucketLeapArray is used for monitor// BucketLeapArray is per resource, and SlidingWindowMetric support only read operation.type SlidingWindowMetric struct {bucketLengthInMs uint32sampleCount uint32intervalInMs uint32real *BucketLeapArray}
2)ResourceNode
// SlidingWindowMetric represents the sliding window metric wrapper.// It does not store any data and is the wrapper of BucketLeapArray to adapt to different internal bucket// SlidingWindowMetric is used for SentinelRules and BucketLeapArray is used for monitor// BucketLeapArray is per resource, and SlidingWindowMetric support only read operation.type SlidingWindowMetric struct {bucketLengthInMs uint32sampleCount uint32intervalInMs uint32real *BucketLeapArray}
BaseStatNode.arr 是在 NewBaseStatNode() 中创建的,指针 SlidingWindowMetric.real 也指向它。
ResourceNode 则顾名思义,其代表了某资源和它的 Metrics 存储 ResourceNode.BaseStatNode。
resNodeMap 存储了所有资
源的 Metrics 指标数据。

限流流程
-
针对特定 Resource 构造其 EntryContext,存储其 Metrics、限流开始时间等,Sentinel 称之为 StatPrepareSlot;
-
依据 Resource 的限流算法判定其是否应该进行限流,并给出限流判定结果,Sentinel 称之为 RuleCheckSlot;
-
判定之后,除了用户自身根据判定结果执行相应的 action,Sentinel 也需要根据判定结果执行自身的 Action,以及把整个判定流程所使用的的时间 RT 等指标存储下来,Sentinel 称之为 StatSlot。

1. Slot
// StatPrepareSlot is responsible for some preparation before statistic// For example: init structure and so ontype StatPrepareSlot interface {// Prepare function do some initialization// Such as: init statistic structure、node and etc// The result of preparing would store in EntryContext// All StatPrepareSlots execute in sequence// Prepare function should not throw panic.Prepare(ctx *EntryContext)}// RuleCheckSlot is rule based checking strategy// All checking rule must implement this interface.type RuleCheckSlot interface {// Check function do some validation// It can break off the slot pipeline// Each TokenResult will return check result// The upper logic will control pipeline according to SlotResult.Check(ctx *EntryContext) *TokenResult}// StatSlot is responsible for counting all custom biz metrics.// StatSlot would not handle any panic, and pass up all panic to slot chaintype StatSlot interface {// OnEntryPass function will be invoked when StatPrepareSlots and RuleCheckSlots execute pass// StatSlots will do some statistic logic, such as QPS、log、etcOnEntryPassed(ctx *EntryContext)// OnEntryBlocked function will be invoked when StatPrepareSlots and RuleCheckSlots fail to execute// It may be inbound flow control or outbound cir// StatSlots will do some statistic logic, such as QPS、log、etc// blockError introduce the block detailOnEntryBlocked(ctx *EntryContext, blockError *BlockError)// OnCompleted function will be invoked when chain exits.// The semantics of OnCompleted is the entry passed and completed// Note: blocked entry will not call this functionOnCompleted(ctx *EntryContext)}
2. Prepare
// core/base/slot_chain.go// StatPrepareSlot is responsible for some preparation before statistic// For example: init structure and so ontype StatPrepareSlot interface {// Prepare function do some initialization// Such as: init statistic structure、node and etc// The result of preparing would store in EntryContext// All StatPrepareSlots execute in sequence// Prepare function should not throw panic.Prepare(ctx *EntryContext)}// core/stat/stat_prepare_slot.gotype ResourceNodePrepareSlot struct {}func (s *ResourceNodePrepareSlot) Prepare(ctx *base.EntryContext) {node := GetOrCreateResourceNode(ctx.Resource.Name(), ctx.Resource.Classification())// Set the resource node to the context.ctx.StatNode = node}
core/stat/node_storage.go:resNodeMap [type: map[string]*ResourceNode] 中,函数 GetOrCreateResourceNode 用于根据 Resource Name 从 resNodeMap 中获取其对应的 StatNode,如果不存在则创建一个 StatNode 并存入 resNodeMap。
3. Check
-
根据 Resource 名称获取其所有的 Rule 集合; -
遍历 Rule 集合,对 Resource 依次执行 Check,任何一个 Rule 判定 Resource 需要进行限流【Blocked】则返回,否则放行。
type Slot struct {}func (s *Slot) Check(ctx *base.EntryContext) *base.TokenResult {res := ctx.Resource.Name()tcs := getTrafficControllerListFor(res)result := ctx.RuleCheckResult// Check rules in orderfor _, tc := range tcs {r := canPassCheck(tc, ctx.StatNode, ctx.Input.AcquireCount)if r == nil {// nil means passcontinue}if r.Status() == base.ResultStatusBlocked {return r}if r.Status() == base.ResultStatusShouldWait {if waitMs := r.WaitMs(); waitMs > 0 {// Handle waiting action.time.Sleep(time.Duration(waitMs) * time.Millisecond)}continue}}return result}func canPassCheck(tc *TrafficShapingController, node base.StatNode, acquireCount uint32) *base.TokenResult {return canPassCheckWithFlag(tc, node, acquireCount, 0)}func canPassCheckWithFlag(tc *TrafficShapingController, node base.StatNode, acquireCount uint32, flag int32) *base.TokenResult {return checkInLocal(tc, node, acquireCount, flag)}func checkInLocal(tc *TrafficShapingController, resStat base.StatNode, acquireCount uint32, flag int32) *base.TokenResult {return tc.PerformChecking(resStat, acquireCount, flag)}
4. Exit
-
如果 RuleCheckSlot.Check() 判定 pass 通过则执行 StatSlot.OnEntryPassed(),否则 RuleCheckSlot.Check() 判定 reject 则执行 StatSlot.OnEntryBlocked();
-
如果 RuleCheckSlot.Check() 判定 pass 通过,则执行本次 Action;
-
如果 RuleCheckSlot.Check() 判定 pass 通过,则执行 SentinelEntry.Exit() --> SlotChain.ext() --> StatSlot.OnCompleted() 。
1)StatSlot.OnCompleted()
// core/flow/standalone_stat_slot.gotype StandaloneStatSlot struct {}func (s StandaloneStatSlot) OnEntryPassed(ctx *base.EntryContext) {res := ctx.Resource.Name()for _, tc := range getTrafficControllerListFor(res) {if !tc.boundStat.reuseResourceStat {if tc.boundStat.writeOnlyMetric != nil {tc.boundStat.writeOnlyMetric.AddCount(base.MetricEventPass, int64(ctx.Input.AcquireCount))}}}}func (s StandaloneStatSlot) OnEntryBlocked(ctx *base.EntryContext, blockError *base.BlockError) {// Do nothing}func (s StandaloneStatSlot) OnCompleted(ctx *base.EntryContext) {// Do nothing}
2)SlotChain.exit()
// core/base/slot_chain.gotype SlotChain struct {}func (sc *SlotChain) exit(ctx *EntryContext) {// The OnCompleted is called only when entry passedif ctx.IsBlocked() {return}for _, s := range sc.stats {s.OnCompleted(ctx)}}
3)SentinelEntry.Exit()
// core/base/entry.gotype SentinelEntry struct {sc *SlotChainexitCtl sync.Once}func (e *SentinelEntry) Exit() {e.exitCtl.Do(func() {if e.sc != nil {e.sc.exit(ctx)}})}
StatSlot.OnCompleted() 是在 Action 【如一次 RPC 的请求-响应 Invokation】完成之后调用的。如果有的组件需要计算一次 Action 的时间耗费 RT,就在其对应的 StatSlot.OnCompleted() 中依据 EntryContext.startTime 完成时间耗费计算。
5. SlotChain
SlotChain 实体存储
其所有的流控组件。
// core/base/slot_chain.go// SlotChain hold all system slots and customized slot.// SlotChain support plug-in slots developed by developer.type SlotChain struct {statPres []StatPrepareSlotruleChecks []RuleCheckSlotstats []StatSlot}// The entrance of slot chain// Return the TokenResult and nil if internal panic.func (sc *SlotChain) Entry(ctx *EntryContext) *TokenResult {// execute prepare slotsps := sc.statPresif len(sps) > 0 {for _, s := range sps {s.Prepare(ctx)}}// execute rule based checking slotrcs := sc.ruleChecksvar ruleCheckRet *TokenResultif len(rcs) > 0 {for _, s := range rcs {sr := s.Check(ctx)if sr == nil {// nil equals to check passcontinue}// check slot resultif sr.IsBlocked() {ruleCheckRet = srbreak}}}if ruleCheckRet == nil {ctx.RuleCheckResult.ResetToPass()} else {ctx.RuleCheckResult = ruleCheckRet}// execute statistic slotss := sc.statsruleCheckRet = ctx.RuleCheckResultif len(ss) > 0 {for _, s := range ss {// indicate the result of rule based checking slot.if !ruleCheckRet.IsBlocked() {s.OnEntryPassed(ctx)} else {// The block error should not be nil.s.OnEntryBlocked(ctx, ruleCheckRet.blockErr)}}}return ruleCheckRet}func (sc *SlotChain) exit(ctx *EntryContext) {if ctx == nil || ctx.Entry() == nil {logging.Error(errors.New("nil EntryContext or SentinelEntry"), "")return}// The OnCompleted is called only when entry passedif ctx.IsBlocked() {return}for _, s := range sc.stats {s.OnCompleted(ctx)}// relieve the context here}
SlotChain.Entry() 中执行 RuleCheckSlot.Check() 执行次数?相关改进已经提交到 pr 264【
补充,代码已合并,据负责人压测后回复 sentinel-go 效率整体提升 15%】。
1)globalSlotChain
globalSlotChain 用于
存储其所有的流控组件对象。相关代码示例如下。因本文只关注限流组件,所以下面只给出了限流组件的注册代码。
// api/slot_chain.gofunc BuildDefaultSlotChain() *base.SlotChain {sc := base.NewSlotChain()sc.AddStatPrepareSlotLast(&stat.ResourceNodePrepareSlot{})sc.AddRuleCheckSlotLast(&flow.Slot{})sc.AddStatSlotLast(&flow.StandaloneStatSlot{})return sc}var globalSlotChain = BuildDefaultSlotChain()
2)Entry
api/api.go:Entry() 中,globalSlotChain 会作为 E
ntryOptions 的 SlotChain 参数被使用。
// api/api.go// Entry is the basic API of Sentinel.func Entry(resource string, opts ...EntryOption) (*base.SentinelEntry, *base.BlockError) {options := entryOptsPool.Get().(*EntryOptions)options.slotChain = globalSlotChainreturn entry(resource, options)}
Sentinel Go repo: https://github.com/alibaba/sentinel-golang
企业用户欢迎进行登记:https://github.com/alibaba/Sentinel/issues/18
作者简介
于雨(github @AlexStocks),apache/dubbo-go 项目负责人,一个有十多年服务端基础架构研发一线工作经验的程序员,目前在蚂蚁金服可信原生部从事容器编排和 service mesh 工作。热爱开源,从 2015 年给 Redis 贡献代码开始,陆续改进过 Muduo/Pika/Dubbo/Dubbo-go 等知名项目。
本文分享自微信公众号 - 云服务圈(heidcloud)。
如有侵权,请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。
来源:oschina
链接:https://my.oschina.net/u/2896230/blog/4686340