SQL Query JOIN with Table

后端 未结 2 451
日久生厌
日久生厌 2020-12-02 02:24
select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as 
timestamps from testingtable2 LATERAL VIEW explode(purchased_item) exploded_table
as         


        
相关标签:
2条回答
  • 2020-12-02 02:35

    I think you can do what you want with two queries, but I'm not 100% sure. Often in this situation, it is sufficient to find things in the first table that don't match in the second table. You are also trying to get a "closest" match, which is why this is challenging.

    The following query looks for matches on user id and exactly one of the other two fields, and then combines them:

    SELECT table2.buyer_id, table2.item_id, table2.created_time, prod_and_ts.*
    from (select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps
          from testingtable2 LATERAL VIEW
               explode(purchased_item) exploded_table as prod_and_ts
         ) prod_and_ts JOIN
         table2
         on prod_and_ts.user_id = table2.buyer_id and
            prod_and_ts.product_id = table2.item_id and
            prod_and_ts.timestamps <> UNIX_TIMESTAMP(table2.created_time)
    union all
    SELECT table2.buyer_id, table2.item_id, table2.created_time, prod_and_ts.*
    from (select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps
          from testingtable2 LATERAL VIEW
               explode(purchased_item) exploded_table as prod_and_ts
         ) prod_and_ts JOIN
         table2
         on prod_and_ts.user_id = table2.buyer_id and
            prod_and_ts.product_id <> table2.item_id and
            prod_and_ts.timestamps = UNIX_TIMESTAMP(table2.created_time)
    

    This will not find situations where there is no match on either field.

    Also, I've written this using the "on" syntax rather than "where". I assume HIVE supports this.

    0 讨论(0)
  • 2020-12-02 02:58

    Your rep is too high to open a duplicate and especially 2 duplicates of the same question.

    Joining two Tables in Hive using HiveQL(Hadoop)

    Join Two Tables and get the output from both of them

    You don't have enough info to tie the records back for the third scenario.
    You can do a FULL OUTER JOIN with an OR and get everything back, match the rows that you have enough info on as in the first and second case you list, and identify recs that you don't by returning rows with nulls for the fields from the non matching table in the third scenario.

    SELECT DATEPART(d,B.T1time),DATEPART(d,A.Created_TIME),* 
    FROM SO_Table1HIVE A
    FULL OUTER JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] 
    AND (B.t1time = A.Created_TIME OR B.PRODUCTID = A.ITEM_ID)
    

    Trying to match on the third scenario is a hack - the info is not there

    This will match them with any for the date specified that aren't matching on the other days, but again you will get Cartesian products.

    SELECT DATEPART(d,B.T1time),DATEPART(d,A.Created_TIME),* 
    FROM SO_Table1HIVE A
    FULL OUTER JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] 
    AND (
        (B.t1time = A.Created_TIME OR B.PRODUCTID = A.ITEM_ID)
        OR
        (
            (A.Created_TIME <>  B.t1time AND B.PRODUCTID <> A.ITEM_ID AND DATEPART(d,B.T1time) = DATEPART(d,A.Created_TIME)) 
            AND a.ITEM_ID NOT IN(SELECT ITEM_ID
                       FROM SO_Table1HIVE A2
                       INNER JOIN SO_Table2HIVE B2 ON A2.BUYER_ID = B2.[USER_ID] AND (A2.Created_TIME =  B2.t1time OR B2.PRODUCTID = A2.ITEM_ID)
                       )
    
            AND B.PRODUCTID NOT IN(SELECT PRODUCTID
                       FROM SO_Table1HIVE A2
                       INNER JOIN SO_Table2HIVE B2 ON A2.BUYER_ID = B2.[USER_ID] AND (A2.Created_TIME =  B2.t1time OR B2.PRODUCTID = A2.ITEM_ID)
                       )
        )
    
    )
    

    You could use RANK() or try a top one, etc. RANK() or ROW_NO would probably be the best of these hacks if this were not a Hive question, but as I know your are using HQL, I am not going to write it up. You could pull them out in a separate table and run some logical update queries to update it, then use that as a lookup table to tie back.

     tbl1Tbl2Lookup
     ---------------
     id int identity
     table1info FK
     table2info FK 
    

    What you should probably do is what the person in the question you offered a bounty on suggested - since you really don't have a good way to query the third scenario and they offered you an alternative that is specific to HIVE.

    0 讨论(0)
提交回复
热议问题