Linux write into same file with awk and tee: very odd behaviour

六月ゝ 毕业季﹏ 提交于 2019-12-24 12:07:59

问题


I was trying to do something unusual and overwrite the same file on Unexpected new line when writing out in Unix Shell Script this question just out of curiosity

I found that on some attempts I could tee > to_same_file and it worked as you can see on the very first attempt and then subsequent attempts produced an empty file, my assumption is, this must be related to processing time.. Meaning on the first attempt it took longer to get to tee and had time to i/o where as it happens faster on the other attempts and has no chance to write to the file in time that it has read it.. just interested to understand why this odd behaviour occured

me@desktop:~/$ cp 2.csv 1.csv
me@desktop:~/$ cat 1.csv
ABCD89A, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
ABCD89N, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
me@desktop:~/$ awk  -F"," '{ 
     timestamp=$5;  
     gsub(":"," ",timestamp); 
     gsub("-"," ",timestamp);   
     EPOCH=(mktime(timestamp))
     } 
     {
      print $0","EPOCH
      }' 1.csv  2>&1 | tee > 1.csv
me@desktop:~/$ cat 1.csv
ABCD89A, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0,1388998800
ABCD89N, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0,1388998800
me@desktop:~/$ cp 2.csv 1.csv
me@desktop:~/$ cat 1.csv 
ABCD89A, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
ABCD89N, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
me@desktop:~/$ awk  -F"," '{ 
     timestamp=$5;  
     gsub(":"," ",timestamp); 
     gsub("-"," ",timestamp);   
     EPOCH=(mktime(timestamp))
     } 
     {
      print $0","EPOCH
      }' 1.csv  2>&1 | tee > 1.csv
me@desktop:~/$ cat 1.csv 
me@desktop:~/$ cp 2.csv 1.csv
me@desktop:~/$ awk  -F"," '{ 
     timestamp=$5;  
     gsub(":"," ",timestamp); 
     gsub("-"," ",timestamp);   
     EPOCH=(mktime(timestamp))
     } 
     {
      print $0","EPOCH
      }' 1.csv  2>&1 | tee > 1.csv
me@desktop:~/$ cat 1.csv 
me@desktop:~/$ cp 2.csv 1.csv
me@desktop:~/$ cat 1.csv 
ABCD89A, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
ABCD89N, Admin, shop, Stall Count, 2014-01-06 09:00:00, 0
me@desktop:~/$ awk  -F"," '{ 
     timestamp=$5;  
     gsub(":"," ",timestamp); 
     gsub("-"," ",timestamp);   
     EPOCH=(mktime(timestamp))
     } 
     {
      print $0","EPOCH
      }' 1.csv  2>&1 | tee -a > 1.csv
me@desktop:~/$ cat 1.csv 
me@desktop:~/$ 

回答1:


A small, self contained test case with the same problem is this:

cat file | tee > file

This pipeline consists of two parts that runs in parallel.

cat file tries to open and read from the file.

tee > file tries to truncate the file.

Depending on whether the file is (partially) read or truncated first, you'll get either parts or all of your data, or just an empty file.




回答2:


What you've done is create a race condition between awk and tee. The awk process is opening 1.csv for reading while tee is being redirected to 1.csv in another process.

As is the nature of race conditions, the results are random and depend on who gets there first.

To do this safely, you'll need to to save it to a new file or use a tool like sponge.



来源:https://stackoverflow.com/questions/21149972/linux-write-into-same-file-with-awk-and-tee-very-odd-behaviour

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!