Convert .csv file for further manipulation using 'highfrequency' package on R

家住魔仙堡 提交于 2019-12-06 12:29:06

问题


The highfrequency package has been created in a way to transform .txt and .csv files from the NYSE TAQ and WRDS TAQ respectively into .RData files of xts objects, which then can be easily manipulated through the package.

The problem is that I have limited access to the WRDS database which only enables me to download tick-data from the CRSP (The Center for Research in Security Prices) database but not the TAQ (Trades and Quotes) database. So my data look like this. The downloadable file contains tick-data for the REIT index from 2014-01-01 to 2014-01-05. I changed manually the ticker header for the header PRICE as it is proposed by Kris Boudt, one of the main authors.

The code that I use is the following:

 from="2014-03-01"
 to="2014-04-31"
 datasource="C:/Users/aris/Desktop/raw_data"
 datadestination="C:/Users/aris/Desktop/xts_data"
 convert(from = from,to=to,datasource = datasource,datadestination = datadestination,
 trades=TRUE,quotes=FALSE,ticker="REIT",dir=FALSE,extension="csv",header = TRUE,
 tradecolnames = NULL, quotecolnames = NULL,format = "%Y%m%d %H:%M:%S",onefile=TRUE)

I suspect that the problem lies at the line format = "%Y%m%d %H:%M:%S", as at the .csv file the date and the time are comma separated. I tried to put a comma between %d and %H like this format = "%Y%m%d,%H:%M:%S" but nothing.

The error reads

 Error in `$<-.data.frame`(`*tmp*`, "COND", value = numeric(0)) :   
 replacement has 0 rows, data has 1048575

All the suggestions are welcomed.


回答1:


Thanks to Joshua Ulrich I was able to gain some additional intuition and solve the problem(s). Actually, there is no need to manipulate the .csv file itself and add extra columns. Instead of setting tradecolnames = NULL you let the machine know which columns are contained into your file by setting tradecolnames = c("DATE","TIME","PRICE"). The problem with the non-existent directories is fixed by setting dir=TRUE . The final code looks like this:

from="2014-03-01" 
to="2014-04-31"
datasource="C:/Users/aris/Desktop/raw_data"
datadestination="C:/Users/aris/Desktop/xts_data" 
convert(from,to,datasource,datadestination,trades=TRUE,quotes=FALSE,ticker="REIT",dir=TRUE,extension="csv",header= TRUE,tradecolnames=c("DATE","TIME","PRICE"),format = "%Y%m%d %H:%M:%S",onefile=TRUE)



回答2:


The highfrequency::convert function calls highfrequency:::makeXtsTrades, which expects the following columns in your text file: DATE,TIME,PRICE,SIZE,SYMBOL,EX,COND,CORR,G127.

I added empty columns to your text file, and did not get the error in your question. The edited text file looks like:

DATE,TIME,PRICE,SIZE,SYMBOL,EX,COND,CORR,G127
20140102,9:30:00,1123.77,,,,,,
20140102,9:30:01,1122.81,,,,,,
20140102,9:30:02,1122.77,,,,,,

I got another error though.

Error in gzfile(file, "wb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "wb") :
  cannot open compressed file '/home/josh/Desktop/z_xts/2014-01-02/REIT_trades.RData', probable reason 'No such file or directory'

So it looks like the convert function expects all the daily output directories to exist before you run it. The function runs and creates the output after I create those directories.



来源:https://stackoverflow.com/questions/38326286/convert-csv-file-for-further-manipulation-using-highfrequency-package-on-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!