CEP - Complex Event Processing復雜事件處理。
訂單下單后超過一定時間還未進行支付確認。
打車訂單生成后超過一定時間沒有確認上車。
外賣超過預定送達時間一定時限還沒有確認送達。
Apache FlinkCEP API
CEPTimeoutEventJob
FlinkCEP源碼簡析
DataStream和PatternStream
DataStream 一般由相同類型事件或元素組成,一個DataStream可以通過一系列的轉換操作如Filter、Map等轉換為另一個DataStream。
PatternStream 是對CEP模式匹配的流的抽象,把DataStream和Pattern組合在一塊,然后對外提供select和flatSelect等方法。PatternStream并不是DataStream,它提供方法把匹配的模式序列和與其相關聯(lián)的事件組成的映射(就是Map<模式名稱,List<事件>>)發(fā)出去,發(fā)到SingleOutputStreamOperator里面,SingleOutputStreamOperator是DataStream。
CEPOperatorUtils工具類里的方法和變量使用了「PatternStream」來命名,比如:
public static <IN, OUT> SingleOutputStreamOperator <OUT> createPatternStream(...){...} public static <IN, OUT1, OUT2> SingleOutputStreamOperator <OUT1> createTimeoutPatternStream(...){...} final SingleOutputStreamOperator <OUT> patternStream;
SingleOutputStreamOperator
@Public public class SingleOutputStreamOperator <T> extends DataStream <T> {...}
PatternStream的構造方法:
PatternStream ( final DataStream <T> inputStream, final Pattern <T, ?> pattern) { this .inputStream = inputStream; this .pattern = pattern; this .comparator = null ; } PatternStream ( final DataStream <T> inputStream, final Pattern <T, ?> pattern, final EventComparator <T> comparator) { this .inputStream = inputStream; this .pattern = pattern; this .comparator = comparator; }
Pattern、Quantifier和EventComparator
Pattern是模式定義的Base Class,Builder模式,定義好的模式會被NFACompiler用來生成NFA。
如果想要自己實現(xiàn)類似next和followedBy這種方法,比如timeEnd,對Pattern進行擴展重寫應該是可行的。
public class Pattern <T, F extends T> { /** 模式名稱 */ private final String name; /** 前面一個模式 */ private final Pattern <T, ? extends T> previous; /** 一個事件如果要被當前模式匹配到,必須滿足的約束條件 */ private IterativeCondition <F> condition; /** 時間窗口長度,在時間長度內進行模式匹配 */ private Time windowTime; /** 模式量詞,意思是一個模式匹配幾個事件等 默認是匹配到一個 */ private Quantifier quantifier = Quantifier .one( ConsumingStrategy .STRICT); /** 停止將事件收集到循環(huán)狀態(tài)時,事件必須滿足的條件 */ private IterativeCondition <F> untilCondition; /** * 適用于{@code times}模式,用來維護模式里事件可以連續(xù)發(fā)生的次數(shù) */ private Times times; // 匹配到事件之后的跳過策略 private final AfterMatchSkipStrategy afterMatchSkipStrategy; ... }
Quantifier是用來描述具體模式行為的,主要有三大類:
Single-單一匹配、Looping-循環(huán)匹配、Times-一定次數(shù)或者次數(shù)范圍內都能匹配到。
每一個模式Pattern可以是optional可選的(單一匹配或循環(huán)匹配),并可以設置ConsumingStrategy。
循環(huán)和次數(shù)也有一個額外的內部ConsumingStrategy,用在模式中接收的事件之間。
public class Quantifier { ... /** * 5個屬性,可以組合,但并非所有的組合都是有效的 */ public enum QuantifierProperty { SINGLE, LOOPING, TIMES, OPTIONAL, GREEDY } /** * 描述在此模式中匹配哪些事件的策略 */ public enum ConsumingStrategy { STRICT, SKIP_TILL_NEXT, SKIP_TILL_ANY, NOT_FOLLOW, NOT_NEXT } /** * 描述當前模式里事件可以連續(xù)發(fā)生的次數(shù);舉個例子,模式條件無非就是boolean,滿足true條件的事件連續(xù)出現(xiàn)times次,或者一個次數(shù)范圍,比如2~4次,2次,3次,4次都會被當前模式匹配出來,因此同一個事件會被重復匹配到 */ public static class Times { private final int from; private final int to; private Times ( int from, int to) { Preconditions .checkArgument(from > 0 , "The from should be a positive number greater than 0." ); Preconditions .checkArgument(to >= from, "The to should be a number greater than or equal to from: " + from + "." ); this .from = from; this .to = to; } public int getFrom() { return from; } public int getTo() { return to; } // 次數(shù)范圍 public static Times of( int from, int to) { return new Times (from, to); } // 指定具體次數(shù) public static Times of( int times) { return new Times (times, times); } @Override public boolean equals( Object o) { if ( this == o) { return true ; } if (o == null || getClass() != o.getClass()) { return false ; } Times times = ( Times ) o; return from == times.from && to == times.to; } @Override public int hashCode() { return Objects .hash(from, to); } } ... }
EventComparator,自定義事件比較器,實現(xiàn)EventComparator接口。
public interface EventComparator <T> extends Comparator <T>, Serializable { long serialVersionUID = 1L ; }
NFACompiler和NFA
NFACompiler提供將Pattern編譯成NFA或者NFAFactory的方法,使用NFAFactory可以創(chuàng)建多個NFA。
public class NFACompiler { ... /** * NFAFactory 創(chuàng)建NFA的接口 * * @param <T> Type of the input events which are processed by the NFA */ public interface NFAFactory <T> extends Serializable { NFA<T> createNFA(); } /** * NFAFactory的具體實現(xiàn)NFAFactoryImpl * * <p>The implementation takes the input type serializer, the window time and the set of * states and their transitions to be able to create an NFA from them. * * @param <T> Type of the input events which are processed by the NFA */ private static class NFAFactoryImpl <T> implements NFAFactory <T> { private static final long serialVersionUID = 8939783698296714379L ; private final long windowTime; private final Collection < State <T>> states; private final boolean timeoutHandling; private NFAFactoryImpl ( long windowTime, Collection < State <T>> states, boolean timeoutHandling) { this .windowTime = windowTime; this .states = states; this .timeoutHandling = timeoutHandling; } @Override public NFA<T> createNFA() { // 一個NFA由狀態(tài)集合、時間窗口的長度和是否處理超時組成 return new NFA<>(states, windowTime, timeoutHandling); } } }
NFA:Non-deterministic finite automaton - 非確定的有限(狀態(tài))自動機。
更多內容參見
https://zh.wikipedia.org/wiki/非確定有限狀態(tài)自動機
public class NFA<T> { /** * NFACompiler返回的所有有效的NFA狀態(tài)集合 * These are directly derived from the user-specified pattern. */ private final Map < String , State <T>> states; /** * Pattern.within(Time)指定的時間窗口長度 */ private final long windowTime; /** * 一個超時匹配的標記 */ private final boolean handleTimeout; ... }
PatternSelectFunction和PatternFlatSelectFunction
當一個包含被匹配到的事件的映射能夠通過模式名稱訪問到的時候,PatternSelectFunction的select()方法會被調用。模式名稱是由Pattern定義的時候指定的。select()方法恰好返回一個結果,如果需要返回多個結果,則可以實現(xiàn)PatternFlatSelectFunction。
public interface PatternSelectFunction <IN, OUT> extends Function , Serializable { /** * 從給到的事件映射中生成一個結果。這些事件使用他們關聯(lián)的模式名稱作為唯一標識 */ OUT select( Map < String , List <IN>> pattern) throws Exception ; }
PatternFlatSelectFunction,不是返回一個OUT,而是使用Collector 把匹配到的事件收集起來。
public interface PatternFlatSelectFunction <IN, OUT> extends Function , Serializable { /** * 生成一個或多個結果 */ void flatSelect( Map < String , List <IN>> pattern, Collector <OUT> out) throws Exception ; }
SelectTimeoutCepOperator、PatternTimeoutFunction
SelectTimeoutCepOperator是在CEPOperatorUtils中調用createTimeoutPatternStream()方法時創(chuàng)建出來。
SelectTimeoutCepOperator中會被算子迭代調用的方法是processMatchedSequences()和processTimedOutSequences()。
模板方法...對應到抽象類AbstractKeyedCEPPatternOperator中processEvent()方法和advanceTime()方法。
還有FlatSelectTimeoutCepOperator和對應的PatternFlatTimeoutFunction。
public class SelectTimeoutCepOperator <IN, OUT1, OUT2, KEY> extends AbstractKeyedCEPPatternOperator <IN, KEY, OUT1, SelectTimeoutCepOperator . SelectWrapper <IN, OUT1, OUT2>> { private OutputTag <OUT2> timedOutOutputTag; public SelectTimeoutCepOperator ( TypeSerializer <IN> inputSerializer, boolean isProcessingTime, NFACompiler . NFAFactory <IN> nfaFactory, final EventComparator <IN> comparator, AfterMatchSkipStrategy skipStrategy, // 參數(shù)命名混淆了flat...包括SelectWrapper類中的成員命名... PatternSelectFunction <IN, OUT1> flatSelectFunction, PatternTimeoutFunction <IN, OUT2> flatTimeoutFunction, OutputTag <OUT2> outputTag, OutputTag <IN> lateDataOutputTag) { super ( inputSerializer, isProcessingTime, nfaFactory, comparator, skipStrategy, new SelectWrapper <>(flatSelectFunction, flatTimeoutFunction), lateDataOutputTag); this .timedOutOutputTag = outputTag; } ... } public interface PatternTimeoutFunction <IN, OUT> extends Function , Serializable { OUT timeout( Map < String , List <IN>> pattern, long timeoutTimestamp) throws Exception ; } public interface PatternFlatTimeoutFunction <IN, OUT> extends Function , Serializable { void timeout( Map < String , List <IN>> pattern, long timeoutTimestamp, Collector <OUT> out) throws Exception ; }
CEP和CEPOperatorUtils
CEP是創(chuàng)建PatternStream的工具類,PatternStream只是DataStream和Pattern的組合。
public class CEP { public static <T> PatternStream <T> pattern( DataStream <T> input, Pattern <T, ?> pattern) { return new PatternStream <>(input, pattern); } public static <T> PatternStream <T> pattern( DataStream <T> input, Pattern <T, ?> pattern, EventComparator <T> comparator) { return new PatternStream <>(input, pattern, comparator); } }
CEPOperatorUtils是在PatternStream的select()方法和flatSelect()方法被調用的時候,去創(chuàng)建SingleOutputStreamOperator(DataStream)。
public class CEPOperatorUtils { ... private static <IN, OUT, K> SingleOutputStreamOperator <OUT> createPatternStream( final DataStream <IN> inputStream, final Pattern <IN, ?> pattern, final TypeInformation <OUT> outTypeInfo, final boolean timeoutHandling, final EventComparator <IN> comparator, final OperatorBuilder <IN, OUT> operatorBuilder) { final TypeSerializer <IN> inputSerializer = inputStream.getType().createSerializer(inputStream.getExecutionConfig()); // check whether we use processing time final boolean isProcessingTime = inputStream.getExecutionEnvironment().getStreamTimeCharacteristic() == TimeCharacteristic . ProcessingTime ; // compile our pattern into a NFAFactory to instantiate NFAs later on final NFACompiler . NFAFactory <IN> nfaFactory = NFACompiler .compileFactory(pattern, timeoutHandling); final SingleOutputStreamOperator <OUT> patternStream; if (inputStream instanceof KeyedStream ) { KeyedStream <IN, K> keyedStream = ( KeyedStream <IN, K>) inputStream; patternStream = keyedStream.transform( operatorBuilder.getKeyedOperatorName(), outTypeInfo, operatorBuilder.build( inputSerializer, isProcessingTime, nfaFactory, comparator, pattern.getAfterMatchSkipStrategy())); } else { KeySelector <IN, Byte > keySelector = new NullByteKeySelector <>(); patternStream = inputStream.keyBy(keySelector).transform( operatorBuilder.getOperatorName(), outTypeInfo, operatorBuilder.build( inputSerializer, isProcessingTime, nfaFactory, comparator, pattern.getAfterMatchSkipStrategy() )).forceNonParallel(); } return patternStream; } ... }
FlinkCEP實現(xiàn)步驟
FlinkCEP匹配超時實現(xiàn)步驟
TimeoutCEP的流需要keyBy,即KeyedStream,如果inputStream不是KeyedStream,會new一個0字節(jié)的Key(上面CEPOperatorUtils源碼里有提到)。
KeySelector <IN, Byte > keySelector = new NullByteKeySelector <>();
Pattern最后調用within設置窗口時間。 如果是對主鍵進行分組,一個時間窗口內最多只會匹配出一個超時事件,使用PatternStream.select(...)就可以了。
FlinkCEP超時不足
和Flink窗口聚合類似,如果使用事件時間和依賴事件生成的水印向前推進,需要后續(xù)的事件到達,才會觸發(fā)窗口進行計算和輸出結果。
FlinkCEP超時完整demo
public class CEPTimeoutEventJob { private static final String LOCAL_KAFKA_BROKER = "localhost:9092" ; private static final String GROUP_ID = CEPTimeoutEventJob . class .getSimpleName(); private static final String GROUP_TOPIC = GROUP_ID; public static void main( String [] args) throws Exception { // 參數(shù) ParameterTool params = ParameterTool .fromArgs(args); StreamExecutionEnvironment env = StreamExecutionEnvironment .getExecutionEnvironment(); // 使用事件時間 env.setStreamTimeCharacteristic( TimeCharacteristic . EventTime ); env.enableCheckpointing( 5000 ); env.getCheckpointConfig().enableExternalizedCheckpoints( CheckpointConfig . ExternalizedCheckpointCleanup .RETAIN_ON_CANCELLATION); env.getConfig().disableSysoutLogging(); env.getConfig().setRestartStrategy( RestartStrategies .fixedDelayRestart( 5 , 10000 )); // 不使用POJO的時間 final AssignerWithPeriodicWatermarks extractor = new IngestionTimeExtractor <POJO>(); // 與Kafka Topic的Partition保持一致 env.setParallelism( 3 ); Properties kafkaProps = new Properties (); kafkaProps.setProperty( "bootstrap.servers" , LOCAL_KAFKA_BROKER); kafkaProps.setProperty( "group.id" , GROUP_ID); // 接入Kafka的消息 FlinkKafkaConsumer011 <POJO> consumer = new FlinkKafkaConsumer011 <>(GROUP_TOPIC, new POJOSchema (), kafkaProps); DataStream <POJO> pojoDataStream = env.addSource(consumer) .assignTimestampsAndWatermarks(extractor); pojoDataStream.print(); // 根據(jù)主鍵aid分組 即對每一個POJO事件進行匹配檢測【不同類型的POJO,可以采用不同的within時間】 // 1. DataStream <POJO> keyedPojos = pojoDataStream .keyBy( "aid" ); // 從初始化到終態(tài)-一個完整的POJO事件序列 // 2. Pattern <POJO, POJO> completedPojo = Pattern .<POJO>begin( "init" ) .where( new SimpleCondition <POJO>() { private static final long serialVersionUID = - 6847788055093903603L ; @Override public boolean filter(POJO pojo) throws Exception { return "02" .equals(pojo.getAstatus()); } }) .followedBy( "end" ) // .next("end") .where( new SimpleCondition <POJO>() { private static final long serialVersionUID = - 2655089736460847552L ; @Override public boolean filter(POJO pojo) throws Exception { return "00" .equals(pojo.getAstatus()) || "01" .equals(pojo.getAstatus()); } }); // 找出1分鐘內【便于測試】都沒有到終態(tài)的事件aid // 如果針對不同類型有不同within時間,比如有的是超時1分鐘,有的可能是超時1個小時 則生成多個PatternStream // 3. PatternStream <POJO> patternStream = CEP.pattern(keyedPojos, completedPojo.within( Time .minutes( 1 ))); // 定義側面輸出timedout // 4. OutputTag <POJO> timedout = new OutputTag <POJO>( "timedout" ) { private static final long serialVersionUID = 773503794597666247L ; }; // OutputTag<L> timeoutOutputTag, PatternFlatTimeoutFunction<T, L> patternFlatTimeoutFunction, PatternFlatSelectFunction<T, R> patternFlatSelectFunction // 5. SingleOutputStreamOperator <POJO> timeoutPojos = patternStream.flatSelect( timedout, new POJOTimedOut (), new FlatSelectNothing () ); // 打印輸出超時的POJO // 6.7. timeoutPojos.getSideOutput(timedout).print(); timeoutPojos.print(); env.execute( CEPTimeoutEventJob . class .getSimpleName()); } /** * 把超時的事件收集起來 */ public static class POJOTimedOut implements PatternFlatTimeoutFunction <POJO, POJO> { private static final long serialVersionUID = - 4214641891396057732L ; @Override public void timeout( Map < String , List <POJO>> map, long l, Collector <POJO> collector) throws Exception { if ( null != map.get( "init" )) { for (POJO pojoInit : map.get( "init" )) { System .out.println( "timeout init:" + pojoInit.getAid()); collector.collect(pojoInit); } } // 因為end超時了,還沒收到end,所以這里是拿不到end的 System .out.println( "timeout end: " + map.get( "end" )); } } /** * 通常什么都不做,但也可以把所有匹配到的事件發(fā)往下游;如果是寬松臨近,被忽略或穿透的事件就沒辦法選中發(fā)往下游了 * 一分鐘時間內走完init和end的數(shù)據(jù) * * @param <T> */ public static class FlatSelectNothing <T> implements PatternFlatSelectFunction <T, T> { private static final long serialVersionUID = - 3029589950677623844L ; @Override public void flatSelect( Map < String , List <T>> pattern, Collector <T> collector) { System .out.println( "flatSelect: " + pattern); } } }
測試結果(followedBy):
3 > POJO{aid= 'ID000-0' , astyle= 'STYLE000-0' , aname= 'NAME-0' , logTime= 1563419728242 , energy= 529.00 , age= 0 , tt= 2019 - 07 - 18 , astatus= '02' , createTime= null , updateTime= null } 3 > POJO{aid= 'ID000-1' , astyle= 'STYLE000-2' , aname= 'NAME-1' , logTime= 1563419728783 , energy= 348.00 , age= 26 , tt= 2019 - 07 - 18 , astatus= '02' , createTime= null , updateTime= null } 3 > POJO{aid= 'ID000-0' , astyle= 'STYLE000-0' , aname= 'NAME-0' , logTime= 1563419749259 , energy= 492.00 , age= 0 , tt= 2019 - 07 - 18 , astatus= '00' , createTime= null , updateTime= null } flatSelect: {init=[POJO{aid= 'ID000-0' , astyle= 'STYLE000-0' , aname= 'NAME-0' , logTime= 1563419728242 , energy= 529.00 , age= 0 , tt= 2019 - 07 - 18 , astatus= '02' , createTime= null , updateTime= null }], end =[POJO{aid= 'ID000-0' , astyle= 'STYLE000-0' , aname= 'NAME-0' , logTime= 1563419749259 , energy= 492.00 , age= 0 , tt= 2019 - 07 - 18 , astatus= '00' , createTime= null , updateTime= null }]} timeout init:ID000- 1 3 > POJO{aid= 'ID000-1' , astyle= 'STYLE000-2' , aname= 'NAME-1' , logTime= 1563419728783 , energy= 348.00 , age= 26 , tt= 2019 - 07 - 18 , astatus= '02' , createTime= null , updateTime= null } timeout end : null 3 > POJO{aid= 'ID000-2' , astyle= 'STYLE000-0' , aname= 'NAME-0' , logTime= 1563419829639 , energy= 467.00 , age= 0 , tt= 2019 - 07 - 18 , astatus= '03' , createTime= null , updateTime= null } 3 > POJO{aid= 'ID000-2' , astyle= 'STYLE000-0' , aname= 'NAME-0' , logTime= 1563419841394 , energy= 107.00 , age= 0 , tt= 2019 - 07 - 18 , astatus= '00' , createTime= null , updateTime= null } 3 > POJO{aid= 'ID000-3' , astyle= 'STYLE000-0' , aname= 'NAME-0' , logTime= 1563419967721 , energy= 431.00 , age= 0 , tt= 2019 - 07 - 18 , astatus= '02' , createTime= null , updateTime= null } 3 > POJO{aid= 'ID000-3' , astyle= 'STYLE000-2' , aname= 'NAME-0' , logTime= 1563419979567 , energy= 32.00 , age= 26 , tt= 2019 - 07 - 18 , astatus= '03' , createTime= null , updateTime= null } 3 > POJO{aid= 'ID000-3' , astyle= 'STYLE000-2' , aname= 'NAME-0' , logTime= 1563419993612 , energy= 542.00 , age= 26 , tt= 2019 - 07 - 18 , astatus= '01' , createTime= null , updateTime= null } flatSelect: {init=[POJO{aid= 'ID000-3' , astyle= 'STYLE000-0' , aname= 'NAME-0' , logTime= 1563419967721 , energy= 431.00 , age= 0 , tt= 2019 - 07 - 18 , astatus= '02' , createTime= null , updateTime= null }], end =[POJO{aid= 'ID000-3' , astyle= 'STYLE000-2' , aname= 'NAME-0' , logTime= 1563419993612 , energy= 542.00 , age= 26 , tt= 2019 - 07 - 18 , astatus= '01' , createTime= null , updateTime= null }]} 3 > POJO{aid= 'ID000-4' , astyle= 'STYLE000-0' , aname= 'NAME-0' , logTime= 1563420063760 , energy= 122.00 , age= 0 , tt= 2019 - 07 - 18 , astatus= '02' , createTime= null , updateTime= null } 3 > POJO{aid= 'ID000-4' , astyle= 'STYLE000-0' , aname= 'NAME-0' , logTime= 1563420078008 , energy= 275.00 , age= 0 , tt= 2019 - 07 - 18 , astatus= '03' , createTime= null , updateTime= null } timeout init:ID000- 4 3 > POJO{aid= 'ID000-4' , astyle= 'STYLE000-0' , aname= 'NAME-0' , logTime= 1563420063760 , energy= 122.00 , age= 0 , tt= 2019 - 07 - 18 , astatus= '02' , createTime= null , updateTime= null } timeout end : null
總結
以上所述是小編給大家介紹的Apache FlinkCEP 實現(xiàn)超時狀態(tài)監(jiān)控的步驟,希望對大家有所幫助,如果大家有任何疑問歡迎給我留言,小編會及時回復大家的!