SQL Server Code Pages and Collations

前端 未结 2 1165
Happy的楠姐
Happy的楠姐 2020-12-09 13:55

Is there any way in SQL Server of determining what a character in a code page would represent without actually creating a test database of that collation?

Example. I

相关标签:
2条回答
  • 2020-12-09 14:22

    Actually I have found an answer to my question now. A bit clunky but does the job unless there's a better way out there?

    SET NOCOUNT ON;
    
    CREATE TABLE #Collations
    (
         code TINYINT PRIMARY KEY
    );
    
    WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),   --2
            E02(N) AS (SELECT 1 FROM E00 a, E00 b), --4
            E04(N) AS (SELECT 1 FROM E02 a, E02 b), --16
            E08(N) AS (SELECT 1 FROM E04 a, E04 b) --256
    INSERT INTO #Collations
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1
    FROM E08    
    
    DECLARE @AlterScript NVARCHAR(MAX) = ''
    
    SELECT @AlterScript = @AlterScript + ' 
    RAISERROR(''Processing' + name + ''',0,1) WITH NOWAIT;
    ALTER TABLE #Collations ADD ' + name + ' CHAR(1) COLLATE ' + name + ';
    EXEC(''UPDATE #Collations SET ' + name + '=CAST(code AS BINARY(1))'');
    EXEC(''UPDATE #Collations SET ' + name + '=NULL WHERE ASCII(' + name + ') <> code'');
    '
    FROM   sys.fn_helpcollations()
    WHERE  name LIKE '%CS_AS'
           AND name NOT IN    /*Unicode Only Collations*/
                            ( 'Assamese_100_CS_AS', 'Bengali_100_CS_AS',
                             'Divehi_90_CS_AS', 'Divehi_100_CS_AS' ,
                             'Indic_General_90_CS_AS', 'Indic_General_100_CS_AS',
                                 'Khmer_100_CS_AS', 'Lao_100_CS_AS',
                             'Maltese_100_CS_AS', 'Maori_100_CS_AS',
                             'Nepali_100_CS_AS', 'Pashto_100_CS_AS',
                             'Syriac_90_CS_AS', 'Syriac_100_CS_AS',
                             'Tibetan_100_CS_AS' )
    
    
    EXEC (@AlterScript)
    
    SELECT * FROM #Collations
    
    DROP TABLE #Collations
    
    0 讨论(0)
  • 2020-12-09 14:30

    While MS SQL supports both code pages and Unicode unhelpfully it doesn't provide any functions to convert between the two so figuring out what character is represented by a value in a different code page is a pig.

    There are two potential methods I've seen to handle conversions, one is detailed here http://www.codeguru.com/cpp/data/data-misc/values/article.php/c4571 and involves bolting a custom conversion program onto the database and using that for conversions.

    The other is to construct a db table consisting of

    [CodePage], [ANSI Value], [UnicodeValue]
    

    with the unicode value stored as either the int representing the unicode character to be converted using nchar()or the nchar itself

    Your using the collation SQL_Ukrainian_CP1251_CS_AS which is code page 1251 (CP1251 from the centre of the string). You can grab its translation table here http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT

    Its a TSV so after trimming the top off the raw data should import fairly cleanly.

    Personally I'd lean more towards the latter than the former especially for a production server as the former may introduce instability.

    0 讨论(0)
提交回复
热议问题