Confluent 4.1.0 ->KSQL : STREAM-TABLE join -> table data null

后端 未结 1 1758
青春惊慌失措
青春惊慌失措 2020-12-15 13:53

STEP 1: Run the producer to create sample data

./bin/kafka-avro-console-producer \\
         --broker-list localhost:9092 --topic stream-test-topic \\
              


        
相关标签:
1条回答
  • 2020-12-15 14:37

    tl;dr: Your table data needs to be keyed on the column on which you're joining.

    Using the sample data above, here's how to investigate and fix.

    1. Use KSQL to check the data in the topics (no need for kafka-avro-console-consumer). Format of the output data is timestamp, key, value

      • stream:

        ksql> print 'stream-test-topic' from beginning;
        Format:AVRO
        30/04/18 15:59:13 BST, null, {"DEAL_ID": "deal002", "DEAL_EXPENSE_CODE": "EXP002", "DEAL_BRANCH": "AMSTERDAM"}
        30/04/18 15:59:13 BST, null, {"DEAL_ID": "deal003", "DEAL_EXPENSE_CODE": "EXP003", "DEAL_BRANCH": "AMSTERDAM"}
        30/04/18 15:59:13 BST, null, {"DEAL_ID": "deal004", "DEAL_EXPENSE_CODE": "EXP004", "DEAL_BRANCH": "AMSTERDAM"}
        
      • table:

        ksql> print 'expense-test-topic' from beginning;
        Format:AVRO
        30/04/18 16:10:52 BST, pk1, {"EXPENSE_CODE": "EXP001", "EXPENSE_DESC": "Regulatory Deposit"}
        30/04/18 16:10:52 BST, pk2, {"EXPENSE_CODE": "EXP002", "EXPENSE_DESC": "ABC - Sofia"}
        30/04/18 16:10:52 BST, pk3, {"EXPENSE_CODE": "EXP003", "EXPENSE_DESC": "Apple Corporation"}
        30/04/18 16:10:52 BST, pk4, {"EXPENSE_CODE": "EXP004", "EXPENSE_DESC": "Confluent Europe"}
        30/04/18 16:10:52 BST, pk5, {"EXPENSE_CODE": "EXP005", "EXPENSE_DESC": "Air India"}
        30/04/18 16:10:52 BST, pk6, {"EXPENSE_CODE": "EXP006", "EXPENSE_DESC": "KLM International"}
        

      At this point, note that the key (pk<x>) does not match the column on which we will be joining

    2. Register the two topics:

      ksql> CREATE STREAM deals WITH (KAFKA_TOPIC='stream-test-topic', VALUE_FORMAT='AVRO');
      
       Message
      ----------------
       Stream created
      ----------------
      
      ksql> CREATE TABLE expense_codes_table WITH (KAFKA_TOPIC='expense-test-topic', VALUE_FORMAT='AVRO', KEY='EXPENSE_CODE');
      
       Message
      ---------------
       Table created
      ---------------
      
    3. Tell KSQL to query events from the beginning of each topic

      ksql> SET 'auto.offset.reset' = 'earliest';
      Successfully changed local property 'auto.offset.reset' from 'null' to 'earliest'
      
    4. Validate that the table's declared key per the DDL (KEY='EXPENSE_CODE') matches the actual key of the underlying Kafka messages (available through the ROWKEY system column):

      ksql> SELECT ROWKEY, EXPENSE_CODE FROM expense_codes_table;
      pk1 | EXP001
      pk2 | EXP002
      pk3 | EXP003
      pk4 | EXP004
      pk5 | EXP005
      pk6 | EXP006
      

      The keys don't match. Our join is doomed!

    5. Magic workaround—let's rekey the topic using KSQL!

      • Register the table's source topic as a KSQL STREAM:

        ksql> CREATE STREAM expense_codes_stream WITH (KAFKA_TOPIC='expense-test-topic', VALUE_FORMAT='AVRO');
        
         Message
        ----------------
         Stream created
        ----------------
        
      • Create a derived stream, keyed on the correct colum. This is underpinned by a re-keyed Kafka topic.

        ksql> CREATE STREAM EXPENSE_CODES_REKEY AS SELECT * FROM expense_codes_stream PARTITION BY EXPENSE_CODE;
        
         Message
        ----------------------------
         Stream created and running
        ----------------------------
        
      • Re-register the KSQL _TABLE_ on top of the re-keyed topic:

        ksql> DROP TABLE expense_codes_table;
        
         Message
        ----------------------------------------
         Source EXPENSE_CODES_TABLE was dropped
        ----------------------------------------
        ksql> CREATE TABLE expense_codes_table WITH (KAFKA_TOPIC='EXPENSE_CODES_REKEY', VALUE_FORMAT='AVRO', KEY='EXPENSE_CODE');
        
         Message
        ---------------
         Table created
        ---------------
        
      • Check the keys (declared vs message) match on the new table:

        ksql> SELECT ROWKEY, EXPENSE_CODE FROM expense_codes_table;
        EXP005 | EXP005
        EXP001 | EXP001
        EXP002 | EXP002
        EXP003 | EXP003
        EXP006 | EXP006
        EXP004 | EXP004  
        
    6. Successful join:

      ksql> SELECT D.DEAL_EXPENSE_CODE, E.EXPENSE_DESC \
      FROM deals D \
        LEFT JOIN expense_codes_table E \
        ON D.DEAL_EXPENSE_CODE = E.EXPENSE_CODE  \
      WINDOW TUMBLING (SIZE 3 MINUTE) \
      GROUP BY D.DEAL_EXPENSE_CODE, E.EXPENSE_DESC;
      
      EXP006 | KLM International
      EXP003 | Apple Corporation
      EXP002 | ABC - Sofia
      EXP004 | Confluent Europe
      EXP001 | Regulatory Deposit
      EXP005 | Air India
      
    0 讨论(0)
提交回复
热议问题