I\'m trying to write a script in databricks that will select a file based on certain characters in the name of the file or just on the datestamp in the file.
For ex
You can read filenames with dbutils and can check if a pattern matches in an if-statement: if now in filname. So instead of reading files with a specific pattern directly, you get a list of files and then copy the concrete files matching your required pattern.
The following code works in a databricks python notebook:
data = """
{"a":1, "b":2, "c":3}
{"a":{, b:3}
{"a":5, "b":6, "c":7}
"""
dbutils.fs.put("/mnt/adls2/demo/files/file1-2018-12-22 06-07-31.json", data, True)
dbutils.fs.put("/mnt/adls2/demo/files/file2-2018-02-03 06-07-31.json", data, True)
dbutils.fs.put("/mnt/adls2/demo/files/file3-2019-01-03 06-07-31.json", data, True)
files = dbutils.fs.ls("/mnt/adls2/demo/files/")
import datetime
now = datetime.datetime.now().strftime("%Y-%m-%d")
print(now)
Output: 2019-01-03
for i in range (0, len(files)):
file = files[i].name
if now in file:
dbutils.fs.cp(files[i].path,'/mnt/adls2/demo/target/' + file)
print ('copied ' + file)
else:
print ('not copied ' + file)
Output:
not copied file1-2018-12-22 06-07-31.json
not copied file2-2018-02-03 06-07-31.json
copied file3-2019-01-03 06-07-31.json