Exporting SQL Server Table to multiple part files

纵然是瞬间 提交于 2019-11-28 00:19:01

I think you could use SQL 2012's paging functions OFFSET and FETCH in conjunction with bcp:

SELECT *
FROM Table
ORDER BY ID --Primary Key
OFFSET 100000001 ROWS
FETCH NEXT 100000000 ROWS ONLY

BCP's batch_size argument does not control the output, unfortunately.

Ways I've done this type of splitting:

1 - Simple but non-repeatable: Create a command file (.cmd) that runs a multiple BCPs over the table for specific row ranges. This probably requires an IDENTITY(1,1) based primary key on the table.

bcp "SELECT * FROM MyTable WHERE Id BETWEEN 0 AND 10000000" queryout …  
bcp "SELECT * FROM MyTable WHERE Id BETWEEN 10000000 AND 20000000" queryout …   

2 - Simple and repeatable, uses a lot of disk: BCP out the entire table to a single file and use split to create as many new files as needed with a given number of bytes in each (note: splitting by lines would be a better idea IMO). Use 'Cygwin' (GnuWin32 no longer maintained) to install split and any other utilities you want.

 bcp MyDb.MySchema.MyTable out C:\MyFile.csv -T -w  
 split -b 10737418240 C:\MyFile.csv C:\MySplitFile_  

Generates the following files

 C:\MySplitFile_aaa
 C:\MySplitFile_aab
 …

3 - Complex but repeatable, requires possibly insecure T-SQL: Use the xp_cmdshell function to call BCP inside a stored procedure that iterates through the table.

 DECLARE @loop AS INT;   
 --Use WHILE to loop as needed--   
 DECLARE @sql AS VARCHAR(MAX);   
 --Add code to generate dynamic SQL here--   
 DECLARE @bcp AS VARCHAR(MAX);   
 SELECT @bcp='BCP "'+@sql+'" queryout C:\MyFolder\MyFile_'+@loop+'.csv';   

FINAL NOTE: If you are using any NVARCHAR fields in your data then you need to use the -w flag and be aware that the output will be in UTF-16LE. I would strongly recommend converting that to UTF-8 using iconv (from 'Cygwin' again) before trying to do anything with it in Hadoop.

If you've got a single sortable primary key field, you could find the keys that define the boundaries of the rows you want, then just SELECT records using a WHERE that defines the boundaries.

It's similar to what Joe put for #1, but your keys need not be contiguous nor numeric. The following is a simple example:

DECLARE @maxrowsperfile AS bigint = 1048576
DECLARE boundaries CURSOR FOR
    SELECT the_sortable_key
    FROM
    (
        SELECT
            the_sortable_key
            , ROW_NUMBER() OVER(ORDER BY the_sortable_key) AS the_row_number
        FROM the_table
    ) AS t
    WHERE the_row_number % @maxrowsperfile = 0

OPEN boundaries

DECLARE @lowerbound AS [key type] = [value A]
DECLARE @upperbound AS [key type] = [value A]

FETCH NEXT FROM boundaries
INTO @upperbound

IF @lowerbound = @upperbound
    PRINT 'bcp "SELECT * FROM the_table" queryout file -w -T'
ELSE
    DECLARE @filecount AS int = 1
    BEGIN
        WHILE @@FETCH_STATUS = 0
        BEGIN
            PRINT 'bcp "SELECT * FROM the_table WHERE key > ' + CAST(@lowerbound AS varchar) + ' AND key <= ' + CAST(@upperbound AS varchar) + ' queryout file_' + CAST(@filecount AS varchar) + ' -w -T'
            SET @filecount = @filecount + 1
            SET @lowerbound = @upperbound
            FETCH NEXT FROM boundaries
            INTO @upperbound
        END
        PRINT 'bcp "SELECT * FROM table WHERE key > ' + CAST(@lowerbound AS varchar) + ' queryout file_' + CAST(@filecount AS varchar) + ' -w -T'

    END
CLOSE boundaries
DEALLOCATE boundaries
BellevueBob

I would export the file first and then divide it externally. Assuming you are running on a Windows machine, there are several "free-ware" tools out there that can help. See this other answer on Superuser for more info.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!