Datalake analytic join

自古美人都是妖i 提交于 2019-12-12 04:35:44

问题


I have 2 table. I want classified URL who is in table [Activite_Site] I've try the query below, but it doesn't work... Anyone have idea. Thank you in advance

Table [Categorie]
URL                         CAT
http//www.site.com/business B2B
http//www.site.com/office   B2B
http//www.site.com/job      B2B
http//www.site.com/home     B2C

Table [Actvite_Site]
URL
http//www.site.com/business/page2/test.html
http//www.site.com/business/page3/pagetest/tot.html
http//www.site.com/office/all/tot.html
http//www.site.com/home/holiday/paris.html
http//www.site.com/home/private/moncompte.html

I would like OUTPUT :

URL_SITE                                            CATEGORIE
http//www.site.com/business/page2/test.html         B2B
http//www.site.com/business/page3/pagetest/tot.html B2B
http//www.site.com/office/all/tot.html              B2B
http//www.site.com/home/holiday/paris.html          B2C
http//www.site.com/home/private/moncompte.html      B2C
http//www.site.com/test/pte.html                    Null

My query :

    SELECT A.URL AS URL_SITE
           C.CAT AS  CATEGORIE  
    FROM Actvite_Site as A
        LEFT Categorie as C ON C.URL==A.URL.PadLeft(C.URL.Lenght)

回答1:


RE error E_CSC_USER_JOINCOLUMNSEXPECTEDONEACHSIDEOFCONDITION, U-SQL does not currently support derived columns in join conditions.

One way to achieve this might be to find the matched URLs, then the unmatched and UNION them together.

@category = SELECT *
     FROM (
        VALUES
            ( "http//www.site.com/business", "B2B" ),
            ( "http//www.site.com/office", "B2B" ),
            ( "http//www.site.com/job", "B2B" ),
            ( "http//www.site.com/home", "B2C" )
        ) AS x(url, cat);


@siteActivity = SELECT *
     FROM (
        VALUES
            ( "http//www.site.com/business/page2/test.html" ),
            ( "http//www.site.com/business/page3/pagetest/tot.html" ),
            ( "http//www.site.com/office/all/tot.html" ),
            ( "http//www.site.com/home/holiday/paris.html" ),
            ( "http//www.site.com/home/private/moncompte.html" ),
            ( "http//www.site.com/test/pte.html" )
        ) AS x(url);


// Find matched URLs
@working =
    SELECT sa.url,
           c.cat
    FROM @siteActivity AS sa
         CROSS JOIN
             @category AS c
         WHERE sa.url.Substring(0, c.url.Length) == c.url;


// Combine the matched and unmatched URLs
@output =
    SELECT url,
           cat
    FROM @working

    UNION ALL

    SELECT url,
           (string) null AS cat
    FROM @siteActivity AS sa
         ANTISEMIJOIN
             @working AS w
         ON sa.url == w.url;



OUTPUT @output TO "/output/output.csv"
USING Outputters.Csv(quoting:false);

I am wondering if there is a more efficient way though.



来源:https://stackoverflow.com/questions/46362132/datalake-analytic-join

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!