How to Detect Non “GSM 7 bit alphabet” characters in input field

前端 未结 5 1527
借酒劲吻你
借酒劲吻你 2020-12-10 04:47

I am trying to detect if a text input field has any character that doesn\'t belong to the GSM 7 bit alphabet. The table with the characters is here http://www.dreamfabric.co

相关标签:
5条回答
  • 2020-12-10 05:22

    The accepted answers will work, but they suffer from complexity (using a regex) and performance (needing to search through two arrays). Here's a solution that will perform better, due to the use of a lookup Set, and a loop which will short-circuit if a non-GSM7 character is found. Unicode points are used so that different character encodings are not a problem when cutting and pasting this code.

    const gsmCodePoints = new Set([
      0x000a, 0x000c, 0x000d, 
      0x0020, 0x0021, 0x0022, 0x0023, 0x0024, 0x0025, 0x0026, 0x0027, 0x0028, 0x0029, 0x002a, 0x002b, 0x002c, 0x002d, 0x002e, 0x002f,
      0x0030, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, 0x0037, 0x0038, 0x0039, 0x003a, 0x003b, 0x003c, 0x003d, 0x003e, 0x003f,
      0x0040, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047, 0x0048, 0x0049, 0x004a, 0x004b, 0x004c, 0x004d,
      0x004e, 0x004f,
      0x0050, 0x0051, 0x0052, 0x0053, 0x0054, 0x0055, 0x0056, 0x0057, 0x0058, 0x0059,  0x005a, 0x005b, 0x005c, 0x005d, 0x005e, 0x005f, 
      0x0061, 0x0062, 0x0063, 0x0064, 0x0065, 0x0066, 0x0067, 0x0068, 0x0069, 0x006a, 0x006b, 0x006c, 0x006d, 0x006e, 0x006f, 
      0x0070, 0x0071, 0x0072, 0x0073, 0x0074, 0x0075, 0x0076, 0x0077, 0x0078, 0x0079, 0x007a, 0x007b, 0x007c, 0x007d, 0x007e,
      0x00a1, 0x00a3, 0x00a4, 0x00a5, 0x00a7,
      0x00bf,
      0x00c4, 0x00c5, 0x00c6, 0x00c7, 0x00c9,
      0x00d1, 0x00d6, 0x00d8, 0x00dc, 0x00df,
      0x00e0, 0x00e4, 0x00e5, 0x00e6, 0x00e8, 0x00e9, 0x00ec,
      0x00f1, 0x00f2, 0x00f6, 0x00f8, 0x00f9, 0x00fc,
      0x0393, 0x0394, 0x0398, 0x039b, 0x039e, 0x03a0, 0x03a3, 0x03a6, 0x03a8, 0x03a9,
      0x20ac,
    ]);
    
    function isGsmMessage(message) {
      for (const s of message) {
        const codePoint = s.codePointAt(0);
        if (codePoint && !gsmCodePoints.has(codePoint)) {
          return false;
        }
      }
      return true;
    }
    
    isGsmMessage('foo'); // -> true
    isGsmMessage('⚡️ bar                                                                     
    0 讨论(0)
  • 2020-12-10 05:24

    I have textarea with id smscontent. I use below regex/code

    $('#smscontent').on('input, change keyup', function(){
        $(this).val($(this).val().replace(/[^A-Za-z0-9 \r\n@£$¥!\"#$%&'\(\)*\+,_.\/:;<=>?^{}\\\[~\]]*/ig, ''));
    });
    

    To test the regex shared by Lajos - https://www.regextester.com/99623

    To test the regex used in this answer - https://www.regextester.com/?fam=106436

    0 讨论(0)
  • 2020-12-10 05:27

    Try this

    http://www.frightanic.com/2012/04/10/regex-for-gsm-03-38-7bit-character-set/

    0 讨论(0)
  • 2020-12-10 05:33
    function isGSMAlphabet(text) {
        var regexp = new RegExp("^[A-Za-z0-9 \\r\\n@£$¥èéùìòÇØøÅå\u0394_\u03A6\u0393\u039B\u03A9\u03A0\u03A8\u03A3\u0398\u039EÆæßÉ!\"#$%&'()*+,\\-./:;<=>?¡ÄÖÑܧ¿äöñüà^{}\\\\\\[~\\]|\u20AC]*$");
    
        return regexp.test(text);
    }
    

    This regular expression should solve your problem.

    0 讨论(0)
  • You can put all valid characters in a string and then search the string repeatedly.

    gsm = "@£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞ^{}\[~]|€ÆæßÉ!\"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà";
    var letter = 'a';
    var letterInAlfabet = gsm.indexOf(letter) !== -1;
    

    Make sure you get your encodings right if you use this, i.e. save your Javascript file as UTF8 and specify that it is UTF8 to the browser.

    0 讨论(0)
提交回复
热议问题