why does knitr caching fail for data.table `:=`?

笑着哭i 提交于 2019-12-20 10:36:23

问题


This is related in spirit to this question, but must be different in mechanism.

If you try to cache a knitr chunk that contains a data.table := assignement then it acts as though that chunk has not been run, and later chunks do not see the affect of the :=.

Any idea why this is? How does knitr detect objects have updated, and what is data.table doing that confuses it?

It appears you can work around this by doing DT = DT[, LHS:=RHS].

Example:

```{r}
library(data.table)
```
Data.Table Markdown
========================================================
Suppose we make a `data.table` in **R Markdown**
```{r, cache=TRUE}
DT = data.table(a = rnorm(10))
```
Then add a column using `:=`
```{r, cache=TRUE}
DT[, c:=5] 
```
Then we display that in a non-cached block
```{r, cache=FALSE}
DT
```
The first time you run this, the above will show a `c` column, 
from the second time onwards it will not.

Output on second run


回答1:


Speculation:

Here is what appears to be going on.

knitr quite sensibly caches objects as as soon as they are created. It then updates their cached value whenever it detects that they have been altered.

data.table, though, bypasses R's normal copy-by-value assignment and replacement mechanisms, and uses a := operator rather than a =, <<-, or <-. As a result knitr isn't picking up the signals that DT has been changed by DT[, c:=5].

Solution:

Just add this block to your code wherever you'd like the current value of DT to be re-cached. It won't cost you anything memory or time-wise (since nothing except a reference is copied by DT <- DT) but it does effectively send a (fake) signal to knitr that DT has been updated:

```{r, cache=TRUE, echo=FALSE}
DT <- DT 
```

Working version of example doc:

Check that it works by running this edited version of your doc:

```{r}
library(data.table)
```
Data.Table Markdown
========================================================
Suppose we make a `data.table` in **R Markdown**
```{r, cache=TRUE}
DT = data.table(a = rnorm(10))
```

Then add a column using `:=`
```{r, cache=TRUE}
DT[, c:=5] 
```

```{r, cache=TRUE, echo=FALSE}
DT <- DT 
```

Then we display that in a non-cached block
```{r, cache=FALSE}
DT
```
The first time you run this, the above will show a `c` column. 
The second, third, and nth times, it will as well.



回答2:


As indicated in the fourth comment under the answer by Josh O'Brien, I have added a new chunk option cache.vars to handle this very special case. In the second cached chunk, we can specify cache.vars='DT' so that knitr will save a copy of DT.

```{r}
library(data.table)
```
Data.Table Markdown
========================================================
Suppose we make a `data.table` in **R Markdown**
```{r, cache=TRUE}
DT = data.table(a = rnorm(10))
```
Then add a column using `:=`
```{r, cache=TRUE, cache.vars='DT'}
DT[, c:=5] 
```
Then we display that in a non-cached block
```{r, cache=FALSE}
DT
```

The output is like this no matter how many times you compile the document:



来源:https://stackoverflow.com/questions/15298359/why-does-knitr-caching-fail-for-data-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!