Regex to validate JSON

前端 未结 12 2161
挽巷
挽巷 2020-11-22 11:37

I am looking for a Regex that allows me to validate json.

I am very new to Regex\'s and i know enough that parsing with Regex is bad but can it be used to validate?

相关标签:
12条回答
  • 2020-11-22 11:42

    Here my regexp for validate string:

    ^\"([^\"\\]*|\\(["\\\/bfnrt]{1}|u[a-f0-9]{4}))*\"$
    

    Was written usign original syntax diagramm.

    0 讨论(0)
  • 2020-11-22 11:46

    I tried @mario's answer, but it didn't work for me, because I've downloaded test suite from JSON.org (archive) and there were 4 failed tests (fail1.json, fail18.json, fail25.json, fail27.json).

    I've investigated the errors and found out, that fail1.json is actually correct (according to manual's note and RFC-7159 valid string is also a valid JSON). File fail18.json was not the case either, cause it contains actually correct deeply-nested JSON:

    [[[[[[[[[[[[[[[[[[[["Too deep"]]]]]]]]]]]]]]]]]]]]
    

    So two files left: fail25.json and fail27.json:

    ["  tab character   in  string  "]
    

    and

    ["line
    break"]
    

    Both contains invalid characters. So I've updated the pattern like this (string subpattern updated):

    $pcreRegex = '/
              (?(DEFINE)
                 (?<number>   -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )
                 (?<boolean>   true | false | null )
                 (?<string>    " ([^"\n\r\t\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
                 (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
                 (?<pair>      \s* (?&string) \s* : (?&json)  )
                 (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
                 (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
              )
              \A (?&json) \Z
              /six';
    

    So now all legal tests from json.org can be passed.

    0 讨论(0)
  • 2020-11-22 11:49

    Yes, a complete regex validation is possible.

    Most modern regex implementations allow for recursive regexpressions, which can verify a complete JSON serialized structure. The json.org specification makes it quite straightforward.

    $pcre_regex = '
      /
      (?(DEFINE)
         (?<number>   -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )    
         (?<boolean>   true | false | null )
         (?<string>    " ([^"\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
         (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
         (?<pair>      \s* (?&string) \s* : (?&json)  )
         (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
         (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
      )
      \A (?&json) \Z
      /six   
    ';
    

    It works quite well in PHP with the PCRE functions . Should work unmodified in Perl; and can certainly be adapted for other languages. Also it succeeds with the JSON test cases.

    Simpler RFC4627 verification

    A simpler approach is the minimal consistency check as specified in RFC4627, section 6. It's however just intended as security test and basic non-validity precaution:

      var my_JSON_object = !(/[^,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]/.test(
             text.replace(/"(\\.|[^"\\])*"/g, ''))) &&
         eval('(' + text + ')');
    
    0 讨论(0)
  • 2020-11-22 11:50

    I created a Ruby implementation of Mario's solution, which does work:

    # encoding: utf-8
    
    module Constants
      JSON_VALIDATOR_RE = /(
             # define subtypes and build up the json syntax, BNF-grammar-style
             # The {0} is a hack to simply define them as named groups here but not match on them yet
             # I added some atomic grouping to prevent catastrophic backtracking on invalid inputs
             (?<number>  -?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?){0}
             (?<boolean> true | false | null ){0}
             (?<string>  " (?>[^"\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " ){0}
             (?<array>   \[ (?> \g<json> (?: , \g<json> )* )? \s* \] ){0}
             (?<pair>    \s* \g<string> \s* : \g<json> ){0}
             (?<object>  \{ (?> \g<pair> (?: , \g<pair> )* )? \s* \} ){0}
             (?<json>    \s* (?> \g<number> | \g<boolean> | \g<string> | \g<array> | \g<object> ) \s* ){0}
           )
        \A \g<json> \Z
        /uix
    end
    
    ########## inline test running
    if __FILE__==$PROGRAM_NAME
    
      # support
      class String
        def unindent
          gsub(/^#{scan(/^(?!\n)\s*/).min_by{|l|l.length}}/u, "")
        end
      end
    
      require 'test/unit' unless defined? Test::Unit
      class JsonValidationTest < Test::Unit::TestCase
        include Constants
    
        def setup
    
        end
    
        def test_json_validator_simple_string
          assert_not_nil %s[ {"somedata": 5 }].match(JSON_VALIDATOR_RE)
        end
    
        def test_json_validator_deep_string
          long_json = <<-JSON.unindent
          {
              "glossary": {
                  "title": "example glossary",
              "GlossDiv": {
                      "id": 1918723,
                      "boolean": true,
                      "title": "S",
                "GlossList": {
                          "GlossEntry": {
                              "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                                  "para": "A meta-markup language, used to create markup languages such as DocBook.",
                      "GlossSeeAlso": ["GML", "XML"]
                              },
                    "GlossSee": "markup"
                          }
                      }
                  }
              }
          }
          JSON
    
          assert_not_nil long_json.match(JSON_VALIDATOR_RE)
        end
    
      end
    end
    
    0 讨论(0)
  • 2020-11-22 11:50

    As was written above, if the language you use has a JSON-library coming with it, use it to try decoding the string and catch the exception/error if it fails! If the language does not (just had such a case with FreeMarker) the following regex could at least provide some very basic validation (it's written for PHP/PCRE to be testable/usable for more users). It's not as foolproof as the accepted solution, but also not that scary =):

    ~^\{\s*\".*\}$|^\[\n?\{\s*\".*\}\n?\]$~s
    

    short explanation:

    // we have two possibilities in case the string is JSON
    // 1. the string passed is "just" a JSON object, e.g. {"item": [], "anotheritem": "content"}
    // this can be matched by the following regex which makes sure there is at least a {" at the
    // beginning of the string and a } at the end of the string, whatever is inbetween is not checked!
    
    ^\{\s*\".*\}$
    
    // OR (character "|" in the regex pattern)
    // 2. the string passed is a JSON array, e.g. [{"item": "value"}, {"item": "value"}]
    // which would be matched by the second part of the pattern above
    
    ^\[\n?\{\s*\".*\}\n?\]$
    
    // the s modifier is used to make "." also match newline characters (can happen in prettyfied JSON)
    

    if I missed something that would break this unintentionally, I'm grateful for comments!

    0 讨论(0)
  • 2020-11-22 11:56

    I realize that this is from over 6 years ago. However, I think there is a solution that nobody here has mentioned that is way easier than regexing

    function isAJSON(string) {
        try {
            JSON.parse(string)  
        } catch(e) {
            if(e instanceof SyntaxError) return false;
        };  
        return true;
    }
    
    0 讨论(0)
提交回复
热议问题