问题
How do we build Normalized table from DeNormalized text file one?
Thanks for your replies/time.
We need to build a Normalized DB Table from DeNormalized text file. We explored couple of options such as unix shell , and PostgreSQL etc. I am looking learn better ideas for resolutions from this community.
The input text file is various length with comma delimited records. The content may look like this:
XXXXXXXXXX , YYYYYYYYYY, TTTTTTTTTTT, UUUUUUUUUU, RRRRRRRRR,JJJJJJJJJ
111111111111, 22222222222, 333333333333, 44444444, 5555555, 666666
EEEEEEEE,WWWWWW,QQQQQQQ,PPPPPPPP
We like to normalize as follows (Split & Pair):
XXXXXXXXXX , YYYYYYYYYY
TTTTTTTTTTT, UUUUUUUUUU
RRRRRRRRR,JJJJJJJJJ
111111111111, 22222222222
333333333333, 44444444
5555555, 666666
EEEEEEEE,WWWWWW
QQQQQQQ,PPPPPPPP
Do we need to go with text pre-process and Load approach?
If yes, what is the best way to pre-process?
Are there any single SQL/Function approach to get the above?
Thanks in helping.
回答1:
Using gnu awk
(due to the RS)
awk '{$1=$1} NR%2==1 {printf "%s,",$0} NR%2==0' RS="[,\n]" file
XXXXXXXXXX,YYYYYYYYYY
TTTTTTTTTTT,UUUUUUUUUU
RRRRRRRRR,JJJJJJJJJ
111111111111,22222222222
333333333333,44444444
5555555,666666
EEEEEEEE,WWWWWW
QQQQQQQ,PPPPPPPP
{$1=$1}
Cleans up and remove extra spacesNR%2==1 {printf "%s,",$0}
prints odd partsNR%2==0
prints even part and new lineRS="[,\n]"
sets the record to ,
or newline
回答2:
Here is an update. Here is what I did in Linux server.
sed -i 's/\,,//g' inputfile <------ Clean up lot of trailing commas
awk '{$1=$1} NR%2==1 {printf "%s,",$0} NR%2==0' RS="[,\n]" inputfile <----Jotne's idea
dos2unix -q -n inputfile outputfle <------ to remove ^M in some records
outputfile is ready to process as comma delimited format
Any thoughts to improve above steps further?
Thanks in helping.
来源:https://stackoverflow.com/questions/20445701/how-do-we-build-normalized-table-from-denormalized-text-file-one