create consecutive integer and then create index to a table stored in sqlserver using dplyr

匿名 (未验证) 提交于 2019-12-03 07:36:14

问题:

I am doing some data processing of some large tables stored in sqlserver that creating an index sometimes reduce the time needed for some R script to run. I try to use the mutate function of dplyr to create a new column (idx) with consecutive number, then use that idxcolumn as index. But the mutate function seems not working and constantly give me this error:

> tbl(channel,'tbl_iris') %>% mutate(idx=1:n()) Error in from:to : NA/NaN argument In addition: Warning message: In 1:n() : NAs introduced by coercion\ 

Right now I am doing something that seems quite stupid to me like this, to "bypass" the above error message:

iris <- tbl(channel,'tbl_iris') %>%    collect %>%   mutate(idx=1:n())  try(db_drop_table(channel,'##iris')) copy_to(channel,iris,'##iris',temporary=FALSE) db_create_index(channel,'##iris',columns='idx') 

Is there any better way of doing this? Thanks!

Update 01

I tried mutate(idx = row_number()) as suggested by @Phil, it is not working and show the following error message:

> tbl(channel,'##iris') %>% +   mutate(idx=row_number()) Error: <SQL> 'SELECT  TOP 10 "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species", row_number() OVER () AS "idx" FROM "##iris"'   nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]The function 'row_number' must have an OVER clause with ORDER BY.  > tbl(channel,'##iris') %>% +   arrange(Species) %>% +   mutate(idx=row_number()) Error: <SQL> 'SELECT  TOP 10 "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species", row_number() OVER (ORDER BY "Species") AS "idx" FROM (SELECT * FROM "##iris" ORDER BY "Species") "kwtundzona"'   nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.  

Update 02

I tried the way suggested by @Moody_Mudskipper, and it seems working

> try(db_drop_table(channel,'##iris')) [1] 0 > copy_to(channel,iris,'##iris',temporary=FALSE) > tbl(channel,'##iris') %>% head(.,1) # Source:   lazy query [?? x 5] # Database: Microsoft SQL Server 11.00.6251[dbo@WCDCHCMS9999\CMSAH_DC7_999/data_xx_yyy]   Sepal.Length Sepal.Width Petal.Length Petal.Width Species          <dbl>       <dbl>        <dbl>       <dbl> <chr>   1         5.10        3.50         1.40       0.200 setosa  >  > DBI::dbSendQuery(channel,"ALTER TABLE ##iris ADD idx INT IDENTITY(1,1) NOT NULL") <OdbcResult>   SQL  ALTER TABLE ##iris ADD idx INT IDENTITY(1,1) NOT NULL   ROWS Fetched: 0 [complete]        Changed: 0 > db_create_index(channel,'##iris',columns='idx') [1] 0 Warning message: In new_result(connection@ptr, statement) : Cancelling previous query > tbl(channel,'##iris') %>% head(.,5) # Source:   lazy query [?? x 6] # Database: Microsoft SQL Server 11.00.6251[dbo@WCDCHCMS9999\CMSAH_DC7_999/data_xx_yyy]   Sepal.Length Sepal.Width Petal.Length Petal.Width Species   idx          <dbl>       <dbl>        <dbl>       <dbl> <chr>   <int> 1         5.10        3.50         1.40       0.200 setosa      1 2         4.90        3.00         1.40       0.200 setosa      2 3         4.70        3.20         1.30       0.200 setosa      3 4         4.60        3.10         1.50       0.200 setosa      4 5         5.00        3.60         1.40       0.200 setosa      5 

I will try to modify my script to see if this gives similar performance boost when compared to my previous more silly method.

Other than the error message shown as below, I hope things are working as planned.

Warning message: In new_result(connection@ptr, statement) : Cancelling previous query 

回答1:

To my knowledge you can't add a column to an existing table on server side with dbplyr but for a simple query like this it is just as easy to use DBI::dbSendQuery for the desired effect. The following line will create an id column:

DBI::dbSendQuery(channel, "ALTER TABLE tbl_iris ADD ID INT IDENTITY(1,1) NOT NULL") 

Then you can create the index by using dplyr::db_create_index or send another query :

 DBI::dbSendQuery(channel, "CREATE INDEX id ON tbl_iris (id);") 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!