Getting all links of a webpage using Ruby
I'm trying to retrieve every external link of a webpage using Ruby. I'm using String.scan with this regex: /href="https?:[^"]*|href='https?:[^']*/i Then, I can use gsub to remove the href part: str.gsub(/href=['"]/) This works fine, but I'm not sure if it's efficient in terms of performance. Is this OK to use or I should work with a more specific parser (nokogiri, for example)? Which way is better? Thanks! why you dont use groups in your pattern? e.g. /http[s]?:\/\/(.+)/i so the first group will already be the link you searched for. Using regular expressions is fine for a quick and dirty