Right now my regex is something like this:
[a-zA-Z0-9] but it does not include accented characters like I would want to. I would also like - \' , to be included.
Accented Characters: DIY Character Range Subtraction
If your regex engine allows it (and many will), this will work:
(?i)^(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ])+$
Please see the demo (you can add characters to test).
Explanation
(?i)
sets case-insensitive mode^
anchor asserts that we are at the beginning of the string(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ])
matches one character...(?![×Þß÷þø])
asserts that the char is not one of those in the brackets[-'0-9a-zÀ-ÿ]
allows dash, apostrophe, digits, letters, and chars in a wide accented range, from which we need to subtract+
matches that one or more times$
anchor asserts that we are at the end of the stringReference
Extended ASCII Table
You just put:
\p(L}\p{M}
in your expression. This in Unicode will match:
A version without the exclusion rules:
^[-'a-zA-ZÀ-ÖØ-öø-ÿ]+$
Explanation
^
anchor asserts that we are at the beginning of the string [...]
allows dash, apostrophe,
digits, letters, and chars in a wide accented range,+
matches that one or more times$
anchor asserts that we are at the end of the stringReference
Use a POSIX character class (http://www.regular-expressions.info/posixbrackets.html):
[-'[:alpha:]0-9]
or [-'[:alnum:]]
The [:alpha:]
character class matches whatever is considered "alphabetic characters" in your locale.