Concurrent read/write to ADLA

戏子无情 提交于 2019-12-11 05:44:43

问题


Q:1 We are thinking of parallelizing read/write to ADLA tables and was wondering what are implications of such design. I think reads are fine but what should be the best practice to have concurrent writes to same ADLA table.

Q:2 Suppose we have USQL scripts which has multiple rowsets and multiple output/insert in same/different ADLA tables. What is transaction scope story in USQL. If any of output/insert statement fails then will it cause all previous inserts to rollback or not. How to handle transaction scope

Thanks Amit


回答1:


Before I answer, let me describe what happens when you insert into a table (I assume that's what you mean with writes to a table and not truncate/insert).

Each INSERT statement will create a new extent file for the table. Thus if you insert new rows (recommendation is to insert many rows at a time and not just one row), a new file will gets created and the meta data will get updated during the finalization phase so the meta data service knows that the file belongs to the table.

So you should be able to run several inserts in parallel.

The transactional scope is currently as follows (note that Azure Data Lake Analytics' platform is a big data processing and not an OLTP platform and thus does not provide different transactional guarantees to choose from):

The batch processing of U-SQL in ADLA is done in 4 phases:

  1. Preparation contains the compilation, optimization and code generation
  2. Queuing where a job waits for all the needed resources
  3. Actual runtime execution phase
  4. Finalization phase where files and metadata gets persisted.

During the runtime phase, either all vertices succeed or fail if a runtime error occurs. So it is all or nothing.

Once the processing enters the finalization phase, the atomicity is reduced to the file or table level. You may generate 3 files but finalizing one file may fail for some reason. then the job fails but the 2 files that succeeded will be created.



来源:https://stackoverflow.com/questions/44690304/concurrent-read-write-to-adla

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!