I\'m trying to write a regex that will parse out the directory and filename of a fully qualified path using matching groups.
so...
/
I did a little research through trial and error method. Found out that all the values that are available in keyboard are eligible to be a file or directory except '/' in *nux machine.
I used touch command to create file for following characters and it created a file.
(Comma separated values below)
'!', '@', '#', '$', "'", '%', '^', '&', '*', '(', ')', ' ', '"', '\', '-', ',', '[', ']', '{', '}', '`', '~', '>', '<', '=', '+', ';', ':', '|'
It failed only when I tried creating '/' (because it's root directory) and filename container / because it file separator.
And it changed the modified time of current dir . when I did touch .. However, file.log is possible.
And of course, a-z, A-Z, 0-9, - (hypen), _ (underscore) should work.
So, by the above reasoning we know that a file name or directory name can contain anything except / forward slash. So, our regex will be derived by what will not be present in the file name/directory name.
/(?:(?P(?:[/]?)(?:[^\/]+/)+)(?P[^/]+))/
root directoryA directory can start with / when it is absolute path and directory name when it's relative. Hence, look for / with zero or one occurrence.
/(?P(?P[/]?)(?P.+))/
Next, a directory and its child is always separated by /. And a directory name can be anything except /. Let's match /var/ first then.
/(?P(?P(?P[/]?)[^\/]+/)(?P.+))/
Next, let's match all directories
/(?P(?P(?P[/]?)(?P[^\/]+/)+)(?P.+))/
Here, single_dir is yz/ because, first it matched var/, then it found next occurrence of same pattern i.e. log/, then it found the next occurrence of same pattern yz/. So, it showed the last occurrence of pattern.
Now, we know that we're never going to use the groups like single_dir, filepath, root. Hence let's clean that up.
Let's keep them as groups however don't capture those groups.
And rest_of_the_path is just the filename! So, rename it. And a file will not have / in its name, so it's better to keep [^/]
/(?:(?P(?:[/]?)(?:[^\/]+/)+)(?P[^/]+))/
This brings us to the final result. Of course, there are several other ways you can do it. I am just mentioning one of the ways here.
^ means string starts with
(?P means capture group by group name. We have two groups with group name dir and file
(?:pattern) means don't consider this group or non-capturing group.
? means match zero or one.
+ means match one or more
[^\/] means matches any char except forward slash (/)
[/]? means if it is absolute path then it can start with / otherwise it won't. So, match zero or one occurrence of /.
[^\/]+/ means one or more characters which aren't forward slash (/) which is followed by a forward slash (/). This will match var/ or xyz/. One directory at a time.