R: scraping additional data after POST only works for first page

不打扰是莪最后的温柔 提交于 2019-12-05 13:47:45

I think you are simply overthinking the problem. The issue lies in the xpath. Essentially the xpath that you are using for data extraction is the same for all pages. And it is, //*[@id="ctl00_cphContent_gvwPreparations"] The only component that is changing in your code is the txtPageNumber. In the below code, I've changed the txtPageNumber to 3, like, txtPageNumber=3 I suggest your focus should be on something like, How to automate page numbering for data extraction?. This way, you'll not have to manually change the txtPageNumber in

page<-rvest:::request_POST(pgsession,url,
                           body=list(
                             `ctl00$cphContent$gvwPreparations$ctl13$gvwpPreparations$txtPageNumber`=3,
                             `__VIEWSTATE`=pgform$fields$`__VIEWSTATE`$value,
                             `__VIEWSTATEGENERATOR`=pgform$fields$`__VIEWSTATEGENERATOR`$value,
                             `__VIEWSTATEENCRYPTED`=pgform$fields$`__VIEWSTATEENCRYPTED`$value,
                             `__EVENTVALIDATION`=pgform$fields$`__EVENTVALIDATION`$value,
                             `ctl00$cphContent$gvwPreparations$ctl13$gvwpPreparations$ddlPageSize`="10",
                             `__EVENTTARGET`="ctl00$cphContent$gvwPreparations$ctl02$ctl00",
                             `__EVENTARGUMENT`=""

                           ),
                           encode="form")

The following code worked for me;

library(rvest)
library(dplyr)


url <- "http://www.spezialitaetenliste.ch/ShowPreparations.aspx?searchType=Substance&searchValue="
pgsession<-html_session(url)
pgform<-html_form(pgsession)[[1]]

page<-rvest:::request_POST(pgsession,url,
                           body=list(
                             `ctl00$cphContent$gvwPreparations$ctl13$gvwpPreparations$txtPageNumber`=3,
                             `__VIEWSTATE`=pgform$fields$`__VIEWSTATE`$value,
                             `__VIEWSTATEGENERATOR`=pgform$fields$`__VIEWSTATEGENERATOR`$value,
                             `__VIEWSTATEENCRYPTED`=pgform$fields$`__VIEWSTATEENCRYPTED`$value,
                             `__EVENTVALIDATION`=pgform$fields$`__EVENTVALIDATION`$value,
                             `ctl00$cphContent$gvwPreparations$ctl13$gvwpPreparations$ddlPageSize`="10",
                             `__EVENTTARGET`="ctl00$cphContent$gvwPreparations$ctl02$ctl00",
                             `__EVENTARGUMENT`=""

                           ),
                           encode="form")
# makes a table of all results of the first page

read_html(page) %>%
  html_nodes(xpath = '//*[@id="ctl00_cphContent_gvwPreparations"]') %>%
  html_table(fill=TRUE) %>% 
  bind_rows %>%
  tibble()

# A tibble: 11 x 1
   .$``  $Präparat $`Galen. Form /~ $Packung $FAP  $PP   $SB   $`Lim-Pkt` $Lim 
   <chr> <chr>     <chr>            <chr>    <chr> <chr> <chr> <chr>      <chr>
 1 21.   Accolate  Tabl 20 mg       60 Stk   29.75 50.55 ""    ""         ""   
 2 22.   Accupaque Inj Lös 300 mg   Plast F~ 32.00 53.10 ""    ""         ""   
 3 23.   Accupaque Inj Lös 300 mg   Plast F~ 61.15 86.60 ""    ""         ""   
 4 24.   Accupaque Inj Lös 300 mg   Plast F~ 120.~ 154.~ ""    ""         ""   
 5 25.   Accupaque Inj Lös 350 mg   Plast F~ 33.97 55.35 ""    ""         ""   
 6 26.   Accupaque Inj Lös 350 mg   Plast F~ 66.88 93.20 ""    ""         ""   
 7 27.   Accupaque Inj Lös 350 mg   Plast F~ 129.~ 164.~ ""    ""         ""   
 8 28.   Accupro ~ Filmtabl 10 mg   30 Stk   8.56  18.00 ""    ""         ""   
 9 29.   Accupro ~ Filmtabl 10 mg   100 Stk  26.60 46.90 ""    ""         ""   
10 30.   Accupro ~ Filmtabl 20 mg   30 Stk   14.02 28.35 ""    ""         ""   
11 "Ein~ "Einträg~ "Einträge pro S~ "Einträ~ "Ein~ "Ein~ "Ein~ "Einträge~ "Ein~
# ... with 9 more variables: $`Swissmedic-Code` <chr>, $Zulassungsinhaberin <chr>,
#   $Wirkstoff <chr>, $`BAG-Dossier` <chr>, $Aufnahme <chr>, $`Befr. AufnahmeBefr.
#   Limitation` <chr>, $`O/G` <chr>, $`IT-Code` <chr>, $`ATC-Code` <chr>

# gives the desired informations of the first drug (not yet very structured)

read_html(page) %>%
  html_nodes(xpath = '//*[@id="ctl00_cphContent_gvwPreparations"]') %>%
  html_text %>%
  head(10)


[1] " PräparatGalen. Form / DosierungPackungFAPPPSBLim-PktLimSwissmedic-CodeZulassungsinhaberinWirkstoffBAG-DossierAufnahmeBefr. AufnahmeBefr. LimitationO/GIT-CodeATC-Code\r\n\t\t\t\t\r\n                        21.\r\n                    \r\n                        Accolate\r\n                    \r\n                        Tabl 20 mg \r\n                    \r\n                        60 Stk\r\n                    \r\n                        29.75\r\n                    \r\n                        50.55\r\n                    \r\n                                                \r\n                    \r\n                        \r\n                    \r\n                      \r\n                    \r\n                        53750036\r\n                    \r\n                        AstraZeneca AG\r\n                    \r\n                        Zafirlukastum\r\n                    \r\n                        17053\r\n                    \r\n                        15.03.1998\r\n                    \r\n                        \r\n                        \r\n                    \r\n                        \r\n                    \r\n                        03.04.50.\r\n                    \r\n                        R03DC01\r\n                    \r\n\t\t\t\t\r\n                        22.\r\n                    \r\n                        Accupaque\r\n                    \r\n 
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!