Long vectors stringdist package R

旧时模样 提交于 2021-02-11 05:59:27

问题


I posted a question some days ago and while the solution seems to be working on RStudio in Windows (but takes forever and sometimes spits out no results), I keep getting an error of long vectors not supported when I run the same code with 30 CPUs on a HPC. Any ideas why?

Here is a sample of the data:

> head(forfuzzy)
# A tibble: 6 x 3
  grantee_name                 grantee_city grantee_state
  <chr>                        <chr>        <chr>        
1 (ICS)2 MAINE CHAPTER         CLEARWATER   FL           
2 (SUFFOLK COUNTY) VANDERBILT~ CENTERPORT   NY           
3 1 VOICE TREKKING A FUND OF ~ WESTMINSTER  MD           
4 10 CAN                       NEWBERRY     FL           
5 10 THOUSAND WINDOWS          LIVERMORE    CA           
6 100 BLACK MEN IN CHICAGO INC CHICAGO      IL   
... 7 - 97000 rows to go

> head(filings)
# A tibble: 6 x 2
  grantee_name                       ein 
  <chr>                             <dbl>               
1 ICS-2 MAINE CHAPTER              123456             
2 SUFFOLK COUNTY VANDERBILT        654321            
3 VOICE TREKKING A FUND OF VOICES  789456            
4 10 CAN                           654987               
5 10 THOUSAND MUSKETEERS INC       789123               
6 100 BLACK MEN IN HOUSTON INC     987321      

rows 7-1200000 omitted for brevity

And the code with error message after 20 or so minutes of runtime:

n=10
lst=split(forfuzzy, cumsum(1:nrow(forfuzzy)-1)%%n==0)
knitr::opts_chunk$set(cache = TRUE, warning = FALSE, message = FALSE, cache.lazy = FALSE) # This was added and didnt change anything
df=purrr::map_dfr(lst, ~stringdist_inner_join(., filings, by="grantee_name", method="jw", p=0.25, max_dist=0.1, distance_col="distance"))
Error in do_dist(a = b, b = a, method = method, weight = weight, q = q,  : 
  long vectors not supported yet: ../../src/include/Rinlinedfuns.h:535
Calls: <Anonymous> ... list2 -> lapply -> FUN -> mf -> <Anonymous> -> do_dist
Execution halted

Any idea how I can get this to work (as said, sometimes Windows crashes as well but for different reasons where there is not enough space on my C drive I think).

来源:https://stackoverflow.com/questions/64549055/long-vectors-stringdist-package-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!