Update: As per comments regarding the ambiguity of my question, I\'ve increased the detail in the question.
(Terminology: by words I am refering to
This works fine
('*)(?:'')*('?(?:\w+'?)+\w+('\b|'?[^']))(\1)
on this data no problem
'bou
it's
persons'
'open'
open
foo''bar
''foo
bee''
''foo''
'
''
on this data you should strip result (remove spaces from matches)
'bou it's persons' 'open' open foo''bar ''foo ''foo'' ' ''
(tested in The Regulator, results in $2)
(?=.*\w)^(\w|')+$
'bout # pass
it's # pass
persons' # pass
' # fail
'' # fail
NODE EXPLANATION
(?= look ahead to see if there is:
.* any character except \n (0 or more times
(matching the most amount possible))
\w word characters (a-z, A-Z, 0-9, _)
) end of look-ahead
^ the beginning of the string
( group and capture to \1 (1 or more times
(matching the most amount possible)):
\w word characters (a-z, A-Z, 0-9, _)
| OR
' '\''
)+ end of \1 (NOTE: because you're using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
$ before an optional \n, and the end of the
string
I submitted this 2nd answer coz it looks like the question has changed quite a bit and my previous answer is no longer valid. Anyway, if all conditions are listed up, try this:
(((?<!')')?\b[0-9A-Za-z]+\b('(?!'))?|\b[0-9A-Za-z]+('[0-9A-Za-z]+)*\b)
/('\w+)|(\w+'\w+)|(\w+')|(\w+)/
How about this?
'?\b[0-9A-Za-z']+\b'?
EDIT: the previous version doesn't include apostrophes on the sides.