How to avoid sub folders in snowflake copy statement

て烟熏妆下的殇ゞ 提交于 2021-02-11 18:15:51

问题


I have a requirement to exclude certain folder from prefix and process the data in snowflake (Copy statement)

In the below example I need to process files under emp/ and exclude files from abc/

Input :

s3://bucket1/emp/

E1.CSV
E2.CSV
/abc/E11.csv

s3://bucket1/emp/abc/ - E11.csv

Output :

s3://bucket1/emp/

E1.CSV
E2.CSV

Is there any suggestion around pattern to handle this ?


回答1:


With the pattern keyword you can try to exclude certain files. However when using the pattern matching with the NOT syntax, you exclude any file with any of the characters.

Assuming your stage URL is defined as s3://bucket1/emp/

LS @MY_STAGE pattern = '[^abc].*'; 
  • Excludes anything starting with a, b, or c
LS @MY_STAGE pattern = '[^a][^b][^c][^\\/].*';  
  • Excludes anything where:
    • The first character is a, OR
    • The second character is b, OR
    • The third character is c, OR
    • The fourth character is a forward slash /

Edit

After testing with Sharvan's example. Here is what I've found:

Doesn't work: ls @my_stage PATTERN='^((?!/abc/).)*$'; because the first forward slash is duplicated as part of the stage URL (it is automatically appended to the stage URL if not present)

Works: ls @my_stage PATTERN='^((?!abc/).)*$'; because the first forward slash is removed

Updated as the forward slash does not need to be escaped

Snowflake does not support backreferences (per their documentation) but there is no mention of lookaheads or lookbehinds, which I thought was un-supported.

https://docs.snowflake.net/manuals/sql-reference/functions-regexp.html#backreferences




回答2:


Use this to exclude the prefix pattern

ls @stage PATTERN='^((?!/abc/).)*$'


来源:https://stackoverflow.com/questions/59417105/how-to-avoid-sub-folders-in-snowflake-copy-statement

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!