Please see two edits below (added later)...
I have loaded a big data.frame into memory (2.7 mio rows and 7 columns - 74 MB of RAM).
If I want to view the data using Tcl/Tk's Tktable widget via the tcltk2 package function tk2edit
- it takes over 15 minutes till the window is displayed with the data
- and about 7 GB of RAM (!) is consumed by R (incl. Tcl/Tk) en plus!
Example:
library(tcltk2)
my.data.frame <- data.frame(ID=1:2600000,
col1=rep(LETTERS,100000),
col2=rep(letters,1E5),
col3=26E5:1) # about 40 MB of data
tk2edit(my.data.frame)
The basic problem seems to be that each cell of the data.frame must loaded into an tcl array via two nested loops (see the code in this tktable question).
The tcltk2 package's function tk2edit
works the same way, over-simplified:
# my.data.frame contains a lot of rows...
for (i in 0:(dim(my.data.frame)[1])) {
for (j in 0:(dim(my.data.frame)[2]-1)) {
tclarray1[[i,j]] <- my.data.frame[i, j]
}
}
Question: Is there any way to optimize displaying big data.frames with tktable, e. g. by avoiding the nested loops? I just want to view data (no editing required)...
tktable
has the -variable
option where you can set the tcl array variable that contains ALL the data of the table. So we "only" have to find way to create a tcl array from an R data.frame with "one call to tcl from R"...
PS: This is not a problem of the tcltk2
package but seems to be a general problem how to "bulk load" data of a data.frame into Tcl variables...
PS2: The good thing is that Tktable
seems to be able to display such a lot of data efficiently (I can scroll and even edit cells without noticing any severe delays).
Edit 1 (09/01/2015): Adding pure Tcl/Tk benchmark results with Tktable
and data in an array
I have prepared a simple benchmark in Tcl/Tk to measure the execution time and memory consumption of filling a similar Tktable
:
#!/usr/bin/env wish
package require Tktable
set rows 2700000
set columns 4
for {set row 0} {$row <= $rows} {incr row} {
for {set column 0} {$column < $columns} {incr column} {
if {$row == 0} {
set data($row,$column) Titel$column
} else {
set data($row,$column) R${row}C${column}
}
}
}
ttk::frame .fr
table .fr.table -rows $rows -cols $columns -titlerows 1 -titlecols 0 -height 5 -width 25 -rowheight 1 -colwidth 9 -maxheight 100 -maxwidth 400 -selectmode extended -variable data -xscrollcommand {.fr.xscroll set} -yscrollcommand {.fr.yscroll set}
scrollbar .fr.xscroll -command {.fr.table xview} -orient horizontal
scrollbar .fr.yscroll -command {.fr.table yview}
pack .fr -fill both -expand 1
pack .fr.xscroll -side bottom -fill x
pack .fr.yscroll -side right -fill y
pack .fr.table -side right -fill both -expand 1
Results:
- Memory consumption: 3.2 GB
- Time until the table is displayed: 15 sec.
Conclusion: Tcl/Tk arrays are wasting memory, but the performance is very good (the runtime of 15 minutes when using R with tcltk seem to be caused by R to Tcl/Tk communication overhead.
Test setup: Ubuntu 14.04 64 Bit with 16 GB RAM...
Edit 2 (10/01/2015): Adding pure Tcl/Tk benchmark results of ttk::treeview with data in a list
To compare the memory consumption of Tktable
to ttk::treeview
I wrote this code:
#!/usr/bin/env wish
set rows 2700000
set columns 4
set data {}
set colnames {}
for {set i 0} {$i < $columns} {incr i} {
lappend colnames Title$i
}
for {set row 0} {$row <= $rows} {incr row} {
set newrow {}
for {set column 0} {$column < $columns} {incr column} {
lappend newrow R${row}C${column}
}
lappend data $newrow
}
ttk::treeview .tv -columns $colnames -show headings -yscrollcommand {.sbY set} -xscrollcommand {.sbX set}
foreach Element $data {
.tv insert {} end -values $Element
}
foreach column $colnames {
.tv heading $column -text $column
}
ttk::scrollbar .sbY -command {.tv yview}
ttk::scrollbar .sbX -command {.tv xview} -orient horizontal
pack .sbY -side right -fill y
pack .sbX -side bottom -fill x
pack .tv -side left -fill both
Results:
- Memory consumption: 2 GB (thereof data stored as list: 1.2 GB)
- Time until the table is displayed: 15 sec.
- Compare: 10 mio rows consume 7.2 GB of RAM but selecting a row takes serveral seconds (2 - 5) then (possible reason: Internal list traversal?)
Conclusion:
- The
treeview
is more memory efficient thanTktable
since it can use a list instead of an array. - For bigger data sizes (> a few million rows) the row selection is slow (the more at the end the slower!)
I have found one possible solution/workaround using Tktable
in an "unbound" (command) mode.
With the command
option of Tktable
you can specify a function that is called each time a cell shall be displayed on the screen. This avoids "loading" all the data from R to Tcl at once improving the "start-up" time and significantly reduces the memory consumption caused by TCL's way of storing arrays and lists.
This way every time you scroll a series of function calls are done to ask for the content of the visible cells.
It works for me even with over 10 mio. rows!
Drawback: Calling an R function that returns a Tcl variable for each cell is still far from being efficient. If you scroll for the first time you can watch the cells being updated. Therefore I am still looking for a bulk data transfer solution between R and Tcl/Tk.
Any suggestions to improve the performance are welcome!
I have implemented a small demo (with 1 mio. rows and 21 columns consuming 1.2 GB of RAM) and added some buttons to test different features (like caching).
Note: The long start-up time is caused by creating the underlying test data, NOT by Tktable!
library(tcltk)
library(data.table)
# Tktable example with -command ("unbound" mode) ---------------------------
# Doc: http://tktable.sourceforge.net/tktable/doc/tkTable.html
NUM.ROWS <- 1E6
NUM.COLS <- 20
# generate a big data.frame - this will take a while but is required for the demo
dt.data <- data.table(ID = 1:NUM.ROWS)
for (i in 1:NUM.COLS) {
dt.data[, (paste("Col",i)) := paste0("R", 1:NUM.ROWS, " C", i)]
}
# Fill one cell with a long text containing special control characters to test the Tktable behaviour
dt.data[3,3 := "This is a long text with backslash \\ and \"quotes\"!"]
tclRequire("Tktable")
t <- tktoplevel()
tkwm.protocol(t, "WM_DELETE_WINDOW", function() tkdestroy(t))
# Function to return the current row and column as "calculated" value (without an underlying data "model")
calculated.data <- function(C) {
# Function arguments for Tcl "substitutions":
# See: http://tktable.sourceforge.net/tktable/doc/tkTable.html
# %c the column of the triggered cell.
# %C A convenience substitution for %r,%c.
# %i 0 for a read (get) and 1 for a write (set). Otherwise it is the current cursor position in the cell.
# %r the row of the triggered cell.
return(tclVar(C)) # this does work!
}
# Function to return the content of a data.table for the current row and colum
data.frame.data <- function(r, c) {
if( r == "0")
return(tclVar(names(dt.data)[as.integer(c)+1])) # First row contains the column names
else
return(tclVar(as.character(dt.data[as.integer(r)+1, as.integer(c)+1, with = FALSE]))) # Other rows are data rows
}
frame <- ttklabelframe(t, text = "Data:")
# Add the table to the window environment to ensure killing it when the window is closed (= no more phantom calls to the data command handler)
# Cache = TRUE: This greatly enhances speed performance when used with -command but uses extra memory.
t$env$table <- tkwidget(frame, "table", rows = NUM.ROWS, cols = NUM.COLS, titlerows = 1, selecttype = "cell", selectmode = "extended", command = calculated.data, cache = TRUE, yscrollcommand = function(...) tkset(scroll.y, ...), xscrollcommand = function(...) tkset(scroll.x, ...))
scroll.x <- ttkscrollbar(frame, orient = "horizontal", command=function(...) tkxview(t$env$table,...)) # command that performs the scrolling
scroll.y <- ttkscrollbar(frame, orient = "vertical", command=function(...) tkyview(t$env$table,...)) # command that performs the scrolling
buttons <- ttkframe(t)
btn.read.only <- ttkbutton(buttons, text = "make read only", command = function() tkconfigure(t$env$table, state = "disabled"))
btn.read.write <- ttkbutton(buttons, text = "make writable", command = function() tkconfigure(t$env$table, state = "normal"))
btn.clear.cache <- ttkbutton(buttons, text = "clear cache", command = function() tcl(t$env$table, "clear", "cache"))
btn.bind.data.frame <- ttkbutton(buttons, text = "Fill cells from R data.table",
command = function() {
tkconfigure(t$env$table, command = data.frame.data, rows = nrow(dt.data), cols = ncol(dt.data), titlerows = 1)
tcl(t$env$table, "clear", "cache")
tkwm.title(t,"Cells are filled from an R data.table")
})
btn.bind.calc.value <- ttkbutton(buttons, text = "Fill cells with calculated values",
command = function() {
tkconfigure(t$env$table, command = calculated.data, rows = 1E5, cols = 40)
tcl(t$env$table, "clear", "cache")
tkwm.title(t,"Cells are calculated values (to test the highest performance possible)")
})
tkgrid(btn.read.only, row = 0, column = 1)
tkgrid(btn.read.write, row = 0, column = 2)
tkgrid(btn.clear.cache, row = 0, column = 3)
tkgrid(btn.bind.data.frame, row = 0, column = 5)
tkgrid(btn.bind.calc.value, row = 0, column = 6)
tkpack(frame, fill = "both", expand = TRUE)
tkpack(scroll.x, fill = "x", expand = FALSE, side = "bottom")
tkpack(scroll.y, fill = "y", expand = FALSE, side = "right")
tkpack(t$env$table, fill = "both", expand = TRUE, side = "left")
tkpack(buttons, side = "bottom")
来源:https://stackoverflow.com/questions/34667618/r-with-tcltk-tcltk2-improve-slow-performance-when-displaying-big-data-frame-wit