I want to create a scraper using Google Spreadsheets with Google Apps Script. I know it is possible and I have seen some tutorials and threads about it.
The main ide
This has been discussed before. See here: What is the best way to parse html in google apps script
Unlike XML service, the XMLService is not very forgiving of malformed html. The trick in the answer by Justin Bicknell does the job. Even though XML service has been deprecated, it still continues to work.
Could you use javascript to parse the html? If your Google Apps Script retrieved the html as a string and then returned it to a javascript function, it seems like you could parse it just fine outside of the Google Apps script. Any tags you want to scrape, you could send to a dedicated Google Apps function that would save the content.
You could probably accomplish this more easily with jQuery.