google-scholar

Google scholar Captcha verification problem

夙愿已清 提交于 2021-02-11 13:54:14
问题 I'm working on a project for which I need to extract some data from Google Scholar. My PHP program takes a string from my local machine, passes it to the Google Scholar and on the search results page it takes out the first result and saves it to the database. I have to do this for almost 90 thousand strings/queries. The problem is that after a few hundred entries the program stops as the Google Scholar asks for captcha verification. What can I do about that? 回答1: Because Google Scholar does

Python: How to access the elements in a generator object and put them in a Pandas dataframe or in a dictionary?

吃可爱长大的小学妹 提交于 2021-01-28 00:56:16
问题 I am using the scholarly module in python to search for a keyword. I am getting back a generator object as follows: import pandas as pd import numpy as np import scholarly search_query = scholarly.search_keyword('Python') print(next(search_query)) {'_filled': False, 'affiliation': 'Juelich Center for Neutron Science', 'citedby': 75900, 'email': '@fz-juelich.de', 'id': 'zWxqzzAAAAAJ', 'interests': ['Physics', 'C++', 'Python'], 'name': 'Gennady Pospelov', 'url_picture': 'https://scholar.google

Retrieve citations of a journal paper using R

一笑奈何 提交于 2021-01-24 07:15:20
问题 Using R, I want to obtain the list of articles referencing to a scientific journal paper. The only information I have is the title of the article, e.g. "Protein measurement with the folin phenol reagent". Is anyone able to help me by producing a replicable example that I can use? Here is what I tried so far. The R package fulltext seems to be useful, because it allows to retrieve a list of IDs linked to an article. For instance, I can get the article's DOI: library(fulltext) res1 <- ft_search

Retrieve citations of a journal paper using R

允我心安 提交于 2021-01-24 07:14:50
问题 Using R, I want to obtain the list of articles referencing to a scientific journal paper. The only information I have is the title of the article, e.g. "Protein measurement with the folin phenol reagent". Is anyone able to help me by producing a replicable example that I can use? Here is what I tried so far. The R package fulltext seems to be useful, because it allows to retrieve a list of IDs linked to an article. For instance, I can get the article's DOI: library(fulltext) res1 <- ft_search

Scraping large amount of Google Scholar pages with url

不羁的心 提交于 2020-07-18 02:52:08
问题 I'm trying to get full author list of all publications from an author on Google scholar using BeautifulSoup. Since the home page for the author only has a truncated list of authors for each paper, I have to open the link of the paper to get full list. As a result, I ran into CAPTCHA every few attempts. Is there a way to avoid CAPTCHA (e.g. pause for 3 secs after every request)? Or make the original Google Scholar profile page to show full author list? 回答1: Recently I faced similar issue. I at

extract text from google scholar

强颜欢笑 提交于 2020-01-17 08:35:49
问题 I am trying to extract the text from the test snippet that google scholar gives for a particular query. By text snippet I mean the text below the title (in black letter). Currently I am trying to extract it from the html file using python but it contains a lot of extra test such as /div><div class="gs_fl" ...etc. Is there a easy way or some code which can help me get the text without these redundant texts. 回答1: You need an html parser: import lxml.html doc = lxml.html.fromstring(html) text =

Can anybody share a simple example of using Mathematica and Google scholar to extract academic research information

大兔子大兔子 提交于 2020-01-01 06:27:22
问题 How can I use Mathematica and Google scholar to find the number of papers a person published in 2011? 回答1: Google Scholar is not very suited for this goal as it doesn't have a formal API AFAIK. It also doesn't provide results in a structured (e.g. XML) format. So, we have to resort to a quick (and very, very fragile!) text pattern matching hack like: searchGoogleScholarAuthor[author_String] := First[StringCases[ Import["http://scholar.google.com/scholar?start=0&num=1&q=" <> StringDrop[

Can anybody share a simple example of using Mathematica and Google scholar to extract academic research information

假装没事ソ 提交于 2020-01-01 06:27:15
问题 How can I use Mathematica and Google scholar to find the number of papers a person published in 2011? 回答1: Google Scholar is not very suited for this goal as it doesn't have a formal API AFAIK. It also doesn't provide results in a structured (e.g. XML) format. So, we have to resort to a quick (and very, very fragile!) text pattern matching hack like: searchGoogleScholarAuthor[author_String] := First[StringCases[ Import["http://scholar.google.com/scholar?start=0&num=1&q=" <> StringDrop[

Get all publications by an author from Google Scholar using scholar.py

自古美人都是妖i 提交于 2019-12-23 21:53:32
问题 I am trying to get all the publications by an author using scholar.py https://github.com/ckreibich/scholar.py But whenever I run the script, I only get a fraction of the publications associated with the author in my results. So running: ./scholar.py --author "albert einstein" Will only retrieve a subset of Einstein's 1000+ publications associated with him in Google Scholar. How can I get all of the publications for an author? 来源: https://stackoverflow.com/questions/39257172/get-all

Google Scholar: get links for cited papers(not cited by) [closed]

南楼画角 提交于 2019-12-23 10:27:09
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . This may seem like a stupid question, but I have been looking for this for quite some time and haven't found anything helpful. I want to download all papers cited within a given paper. Is there such a feature available in Google scholar? Or even just a page listing all the cited paper links? 来源: https:/