How to TRUNCATE and / or use wildcards with Databrick

后端 未结 1 610
灰色年华
灰色年华 2020-12-11 23:01

I\'m trying to write a script in databricks that will select a file based on certain characters in the name of the file or just on the datestamp in the file.

For ex

相关标签:
1条回答
  • 2020-12-11 23:48

    You can read filenames with dbutils and can check if a pattern matches in an if-statement: if now in filname. So instead of reading files with a specific pattern directly, you get a list of files and then copy the concrete files matching your required pattern.

    The following code works in a databricks python notebook:

    1. Writing three files to the filesystem:

    data = """
    {"a":1, "b":2, "c":3}
    {"a":{, b:3} 
    {"a":5, "b":6, "c":7}
    
    """
    
    dbutils.fs.put("/mnt/adls2/demo/files/file1-2018-12-22 06-07-31.json", data, True)
    dbutils.fs.put("/mnt/adls2/demo/files/file2-2018-02-03 06-07-31.json", data, True)
    dbutils.fs.put("/mnt/adls2/demo/files/file3-2019-01-03 06-07-31.json", data, True)
    

    2. Reading the filnames as a list:

    files = dbutils.fs.ls("/mnt/adls2/demo/files/")

    3. Getting the actual date:

    import datetime
    
    now = datetime.datetime.now().strftime("%Y-%m-%d")
    print(now)
    

    Output: 2019-01-03

    4. Copy actual files:

    for i in range (0, len(files)):
      file = files[i].name
      if now in file:  
        dbutils.fs.cp(files[i].path,'/mnt/adls2/demo/target/' + file)
        print ('copied     ' + file)
      else:
        print ('not copied ' + file)
    

    Output:

    not copied file1-2018-12-22 06-07-31.json

    not copied file2-2018-02-03 06-07-31.json

    copied file3-2019-01-03 06-07-31.json

    0 讨论(0)
提交回复
热议问题