问题
For the few previous days I have been trying to compose an Rx query to process a stream of events from a source and check absence of some IDs. The absence is defined so that there are a series of time windows (e.g. on all days from 9:00 to 17:00) during which there should be at maximum of, say, twenty minutes without an ID occurring in the stream. To further complicate matters, the time of absence should be defined per ID. For instance, assuming three kinds of events A, B and C appearing in a combined stream of events (A, A, B, C, A, C, B and so forth) it could be defined that
- A events are monitored from 9:00 to 10:00 on each day, maximum absence of events being 10 minutes.
- B events are monitored from 9:00 to 11:00 on each day, maximum absence of events being 5 minutes.
- C events are monitored from 12:00 to 15:00 on each day, maximum absence of events being 30 minutes.
I think I need to first partition the stream in to separate events by GroupBy and then process the resulting separate streams with the absence rules. I have mulled this a bit on Microsoft Rx forums already (a big thanks to Dave) and I have some working code to produce both the rules and do the absence checking, but I struggle, for instance, how to combine that with grouping.
So, without further speeches, the thus far hacked code:
//Some sample data bits representing the events.
public class FakeData
{
public int Id { get; set; }
public string SomeData { get; set; }
}
//Note the Now part in DateTime to zero the clock time and have only the date. The purpose is to create start-end pairs of times, e.g. 9:00-17:00.
//The alarm start and end time points should match themselves pairwise, could be pairs of values...
var maxDate = DateTime.Now.Date.AddHours(17).AddMinutes(0).AddSeconds(0).AddDays(14);
var startDate = DateTime.Now.Date.AddHours(9).AddMinutes(0).AddSeconds(0);
var alarmStartPeriods = Enumerable.Range(0, 1 + (maxDate - startDate).Days).Select(d => new DateTimeOffset(startDate.AddDays(d))).ToList();
var alarmEndPeriods = Enumerable.Range(0, 1 + (maxDate - startDate).Days).Select(d => new DateTimeOffset(startDate.AddDays(d)).AddHours(5)).ToList();
And a query to do absence checking without grouping them, which is one of my sticking points. <edit: Maybe I should group the time points into pairs and add an ID and use the resulting triplet in the query... </edit>
dataSource = from n in Observable.Interval(TimeSpan.FromMilliseconds(100))
select new FakeData
{
Id = new Random().Next(1, 5),
SomeData = DateTimeOffset.Now.ToString()
};
var startPointOfTimeChanges = alarmStartPeriods.ToObservable();
var endPointOfTimeChanges = alarmEndPeriods.ToObservable();
var durations = startPointOfTimeChanges.CombineLatest(endPointOfTimeChanges, (start, end) => new { start, end });
var maximumInactivityTimeBeforeAlarmSignal = TimeSpan.FromMilliseconds(250);
timer = (from duration in durations
select (from _ in Observable.Timer(DateTime.Now)
from x in dataSource.Throttle(maximumInactivityTimeBeforeAlarmSignal).TakeUntil(duration.end)
select x)).Switch();
timer.Subscribe(x => Debug.WriteLine(x.SomeData));
Questions:
- How should I try to GroupBy the incoming data by ID and still be able to define absence of events?
- One thing I noticed is that if the starting point of alarm period is in the past (e.g. the query was started at 10:00, when the rule says start monitoing on 9:00), the query won't start. The starting time should be pushed to present time, I suppose. Are there some standard ways to do it or should I just introduce a conditional?
Other questions I could think of that would be nice (to entertain myself :)):
- How to keep a tap on the latest event that has occurred per ID?
- How to change the variables dynamically (as Dave already alluded in the MS forums)?
- Then, in the end, batch the events and store somewhere (e.g. a database) as in this marvellous example in PeteGoo blog?
The other options I can think of are to use explicitly
System.Threading.Timers and ConcurrentDictionary, but one needs to keep on learning!
Regarding James' input an answer, here's quick explainer how it works and how I intended to use it.
Firstly, the observable will do nothing before a first event comes in. So, if the monitoring should start right away, some other Rx functionality needs to be added or a dummy event fired. Not a problem, I believe.
Secondly, a new timeout variable will be acquired from alarmInterval for any new ID. Here new being even one that has been absence for too long and has triggered an alarm.
I think this works well in that one can subscribe to this observable and do something with side-effects. Some examples would be like setting a flag, sending a signal and what business rules has one. Also, maintaining proper locking and so forth, it should be easy to provide new timespans as per predefined alarm rules, with separated absence period and time window.
I'll have to work on other concepts related to this to have a better grasp on things. But my main concerns were satisfied with this. Life's well and good. :-)
回答1:
EDITED - Improved the code, simplifying the SelectMany to use TakeLast.
I wrote a blog post on detecting disconnected clients - which would work just as well for your scenario here if you replace the timeToHold variable in the post with a function like alarmInterval below to get the throttle Timespan based on the client ID.
e.g.:
// idStream is an IObservable<int> of the input stream of IDs
// alarmInterval is a Func<int, TimeSpan> that gets the interval given the ID
var idAlarmStream = idStream
.GroupByUntil(key => key, grp => grp.Throttle(alarmInterval(grp.Key)))
.SelectMany(grp => grp.TakeLast(1));
This gives you the basic functionality of constant monitoring without looking at the active monitoring periods.
To get the monitor window functionality, I'd turn things around and filter the above output with WHERE that checks to see if the ID emitted falls in it's monitoring time window. This makes it is easier to deal with changing monitoring periods.
You could do something fancier by turning each monitoring window into a stream and combining those with the alert stream, but I'm not convinced of the benefits of the extra complexity.
The alarmInterval function will also give you an element of dynamic alarm intervals in that it can return new values, but these will only take effect after an alarm goes off for that ID thus ending its current group.
--- Heading off into some theorising here ---
To get this fully dynamic you will have to end the group somehow - you could do that in a few ways.
One would be to project the idStream using Select into a stream of a custom type that holds the ID plus a global counter value. Give this type an appropriate equality implementation so it will work with the GroupByUntil properly.
Now every time you change the alarm intervals, change the counter. This will cause new groups to be created for every ID. You can then add an additional check in the final filter that makes sure the output events have the most recent counter value.
来源:https://stackoverflow.com/questions/19313360/how-to-partition-groupby-a-stream-and-monitor-absence-of-elements-in-rx-within