问题
I am trying to read and excel file in Azure Blob Storage with .xlsx extension in my azure data factory dataset. it throws following error
Error found when processing 'Csv/Tsv Format Text' source 'Filename.xlsx' with row number 3: found more columns than expected column count: 1.
What are the right Column and row delimiters for excel files to be read in azure Data factory
回答1:
Excel files have a proprietary format and are not simple delimited files. As indicated here, Azure Data Factory does not have a direct option to import Excel files, eg you cannot create a Linked Service to an Excel file and read it easily. Your options are:
- Export or convert the data as flat files eg before transfer to cloud, as .csv, tab-delimited, pipe-delimited etc are easier to read than Excel files. This is your simplest option although obviously requires a change in process.
- Try shredding the XML - create a custom task to open the Excel file as XML and extract your data as suggested here.
- SSIS packages are now supported in Azure Data Factory (with the Execute SSIS package activity) and have better support for Excel files, eg a Connection Manager. So it may be an option to create an SSIS package to deal with the Excel and host it in ADFv2. Warning! I have not tested this, I am only speculating it is possible. Also there is the overhead of creating an Integration Runtime (IR) for running SSIS in ADFv2.
- Try some other custom activity, eg there is a custom U-SQL Extractor for shredding XML on github here.
- Try and read the Excel using Databricks, some examples here although spinning up a Spark cluster to read a few Excel files does seem somewhat overkill. This might be a good option if Spark is already in your architecture.
Let us know how you get on.
回答2:
Azure Data Factory does not have a direct option to upload the excel, however, it can be uploaded through linked service via some configurations. Follow below steps to do so:
- Create a Linked service to the source using the appropriate protocol(FTP, Fileshare, SFTP etc.)
- Create a Linked Service to your cloud storage.
- Take a Copy Data activity, define a filesystem Source Dataset and Sink using your previously defined linked services
- In Source and Sink, tick the Binary file option.Binary file option
- Publish and execute your pipeline
来源:https://stackoverflow.com/questions/52514153/how-to-read-files-with-xlsx-and-xls-extension-in-azure-data-factory