问题
I have an awk script where I need to validate a large number of lines containing dates.
I'm currently using either a regex based solution to do a basic validation (without testing for leap years or ) or calling the UNIX date command to validate it more accurately. The date command works well, but calling a system command is pretty expensive in terms of performance.
I was hoping that someone here might be able to suggest a solution that is both accurate and is fast.
Here's an example of my data
20140804024614
20140803190020
20140803163320
20140803083222
20140803170321
20140803234044
20140804011857
20140803204008
20140803160026
20140803140120
Thanks.
回答1:
Given a whole lot of assumptions about your input file, this is probably all you need to print only the valid dates+times using GNU awk for time functions and gensub():
awk 'strftime("%Y%m%d%H%M%S",mktime(gensub(/(.{4})(..)(..)(..)(..)/,"\\1 \\2 \\3 \\4 \\5 ",""))) == $0' file
It will only work with dates since the epoch.
If you need to print some kind of "valid/invalid" message for each date/time:
$ cat file
20140230035900
20140804024614
$
$ awk '{print (strftime("%Y%m%d%H%M%S",mktime(gensub(/(.{4})(..)(..)(..)(..)/,"\\1 \\2 \\3 \\4 \\5 ",""))) == $0 ? "" : "in") "valid:", $0}' file
invalid: 20140230035900
valid: 20140804024614
The above works by converting the date+time to seconds since the epoch, then converting those seconds to a date+time in the original format and if the result is identical to what you started with then the original date was valid.
回答2:
Check this:
checkFormat ()
{
dateV="${1}"
echo "${dateV}"|gawk '{
if (match($0,/^((?:19|20)[0-9][0-9])(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])([01][0-9]|2[0-4])$/,a)) {
year=a[1]+0
mon=a[3]+0
day=a[4]+0
hour=a[5]+0
}
else {
print "KO: "$0
exit
}
if (day == 31 && (mon == 4 || mon == 6 || mon == 9 || mon == 11))
print "KO: "$0 # 30 days months
else if (day >= 30 && mon == 2)
print "KO: "$0 # Febrary never 30 o 31
else if (mon == 2 && day == 29 && ! ( year % 4 == 0 && (year % 100 != 0 || year % 400 == 0)))
print "KO: "$0 # Febrary 29 leap year
else
print "Correct date !:" $0
}'
}
checkFormat 2014080417
checkFormat 20140803190035
Usage:
$ ./checker.sh
Correct date !:2014080417
KO: 20140803190035
NOTE: MINUTES and SECONDS will be your task :)
Check also: http://nixtip.wordpress.com/2011/11/28/an-awk-date-format-validator/
来源:https://stackoverflow.com/questions/26761659/awk-date-validation