Remove special characters from a database field

前端 未结 10 1673
梦如初夏
梦如初夏 2020-12-06 01:00

I have a database with several thousand records, and I need to strip down one of the fields to ensure that it only contains certain characters (Alphanumeric, spaces, and sin

相关标签:
10条回答
  • 2020-12-06 01:41

    I have created simple function for this

    DROP FUNCTION IF EXISTS `regex_replace`$$
    
    CREATE FUNCTION `regex_replace`(pattern VARCHAR(1000),replacement VARCHAR(1000),original VARCHAR(1000)) RETURNS VARCHAR(1000) CHARSET utf8mb4
        DETERMINISTIC
    BEGIN    
        DECLARE temp VARCHAR(1000); 
        DECLARE ch VARCHAR(1); 
        DECLARE i INT;
        SET i = 1;
        SET temp = '';
        IF original REGEXP pattern THEN 
            loop_label: LOOP 
                IF i>CHAR_LENGTH(original) THEN
                    LEAVE loop_label;  
                END IF;
    
                SET ch = SUBSTRING(original,i,1);
    
                IF NOT ch REGEXP pattern THEN
                    SET temp = CONCAT(temp,ch);
                ELSE
                    SET temp = CONCAT(temp,replacement);
                END IF;
    
                SET i=i+1;
            END LOOP;
        ELSE
            SET temp = original;
        END IF;
    
        RETURN temp;
    END
    

    Usage example:

    SELECT <field-name> AS NormalText, regex_replace('[^A-Za-z0-9 ]', '', <field-name>)AS RegexText FROM 
    <table-name>
    
    0 讨论(0)
  • 2020-12-06 01:45
    update mytable
    set FieldName = REPLACE(FieldName,'/','')
    

    That's a good place to start.

    0 讨论(0)
  • 2020-12-06 01:49

    The Replace() function is first choice. However, Special Characters can sometimes be tricky to write in a console. For those you can combine Replace with the Char() function.

    e.g. removing €

    Update products set description = replace(description, char(128), '');
    

    You can find all the Ascii values here

    Ideally you could do a regex to find all the special chars, but apparently that's not possible with MySQL.

    Beyond that, you'd need to run it through your favorite scripting language.

    0 讨论(0)
  • 2020-12-06 01:50

    This might be useful.

    This solution doesn't involves creating procedures or functions or lengthy use of replace within replace. Instead we know that all the ASCII characters that doesn't involves special character lies within ASCII codes \x20-\x7E (Hex representation). Source ASCII From Wikipedia, the free encyclopedia Below are all those characters in that interval.

    Hex: 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E
    Glyph:  space ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ↑ ← @ a b c d e f g h i j k l m n o p q r s t u v w x y z { ACK } ESC
    

    so as simple regular expression replace will do the job

    SELECT REGEXP_REPLACE(columnName, '[^\\x20-\\x7E]', '') from tableName;
    

    PHP Custom query string

    $query = "select REGEXP_REPLACE(columnName, '(.*)[(].*[)](.*)', CONCAT('\\\\1', '\\\\2')) `Alias` FROM table_Name";
    

    The above statement replaces the content in between brackets as well as brackets. e.g. if the column contains 'Staff Oreintation (CMST TOT)' then above statement will removes the brackets and its contant i.e. 'Staff Oreintation'.

    PS: If you are doing any DML (select, update ...) operation using prepare statement in stored procedure OR through PHP (creating a custom query string); then remember to escape the slash i.e.

    SET @sql = CONCAT("SELECT REGEXP_REPLACE(columnName, '[^\\\\x20-\\\\x7E]', '') from tableName");
    PREPARE stmt FROM @sql;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    

    The above SQL statement does a simple regular expression replaces (actually removes) of all the special character; i.e. In the SQL a REGEX pattern is mention of all the special characters to be replaced with nothing.

    Explanation of the pattern

    A character group is started with the square bracket. The first character is caret which means; negation of all the characters mention in the group (i.e. with in the squares brackets). This simply means select compliment (other character than those selected) of all the characters in the group.

    Just to summarize the above statement will

    Unchanged: all the alphanumeric characters, punctuation characters, arithmetic operators.

    Remove all the Unicode characters (other than Latin alphabets) or special characters.

    0 讨论(0)
  • 2020-12-06 01:50

    Have a look at LIB_MYSQLUDF_PREG which needs to be compiled into the MySQL server, but has advanced regular expression facilities such as preg_replace which will help with your task.

    0 讨论(0)
  • 2020-12-06 01:56

    This may also be useful.

    First you have to know the character set of the database and / or of the table. For example, let us suppose you have a UTF-8 environment and you want to strip / remove symbols like circled registered symbols, circled copyright symbol, and registered trademark symbol from a field then search the internet via bing or yahoo or google for for the hex code values of these symbols in the UTF-8 system:

    Symbol                       Utf-8 Hex
    =======                      =========
    circled copyright              C2A9
    circled registered             C2AE
    Trademark (i.e., TM)           E284A2
    

    Then your scrubbing select sql for field f1 from table t1, using the hex / unhex facility in conjunction with the replace function, will most likely look like this:

    SELECT cast(unhex(replace(replace(replace(hex(f1),'C2A9',''),'C2AE',''),'E284A2','')) AS char) AS cleanf1 FROM t1 ;

    Above, note the original field to be scrubbed / cleansed is f1, the table is t1 and the output header is cleanf1. The "as char" casting is necessary because, w/o it, the mysql 5.5.8 on which I tested is returning blob. Hope this helps

    0 讨论(0)
提交回复
热议问题