MySQL function to split strings by delimiter doens't work with Polish special characters

别来无恙 提交于 2021-02-08 06:52:37

问题


This is a function which successfully grabs single lines out of strings until it's a text with some Polish special characters

DELIMITER $$
DROP FUNCTION SPLIT_STR $$

CREATE FUNCTION SPLIT_STR(x VARCHAR(1500) CHARSET utf8 COLLATE utf8_unicode_ci, delim VARCHAR(12) CHARSET utf8 COLLATE utf8_unicode_ci, pos INTEGER) 
RETURNS VARCHAR(500) CHARSET utf8 COLLATE utf8_unicode_ci
BEGIN
  DECLARE output VARCHAR(1500) CHARSET utf8 COLLATE utf8_unicode_ci;
  SET output = REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos)
                 , LENGTH(SUBSTRING_INDEX(x, delim, pos - 1)) + 1) 
                 , delim
                 , '');
  RETURN output;
END $$

As you can see, I am manually setting charset and collation (the same that whole database uses). I have also tried without charset and collation settings and it doesn't work.

Output to reproduce (that's how it's stored in DB as a single field):

śńąśąńśąńśąńóńśńąśąńśąńśąńóń
śńąśąńśąńśąńóń
sas

By doing

SELECT
SPLIT_STR(slides.content1, '\n', 1), 
SPLIT_STR(slides.content1, '\n', 2), 
SPLIT_STR(slides.content1, '\n', 3), 

I actually only get the first line (the other 2 fields are empty)

śńąśąńśąńśąńóńśńąśąńśąńśąńóń

回答1:


CHAR_LENGTH() returns the length in characters, while LENGTH() returns the length in bytes. You should always use CHAR_LENGTH() when you intend to deal with the length in characters, and especially when dealing with multi-byte character sets, where the result between the two functions may differ.

Replacing LENGTH() with CHAR_LENGTH() in your function will likely fix the issue.



来源:https://stackoverflow.com/questions/28816726/mysql-function-to-split-strings-by-delimiter-doenst-work-with-polish-special-ch

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!