Getting the website title from a link in a string

时光怂恿深爱的人放手 提交于 2019-12-11 03:38:55

问题


string: "Here is the badges, https://stackoverflow.com/badges bla bla bla"

If string contatins a link (see above) I want to parse the website title of that link.

It should return : Badges - Stack Overflow.

How can i do that?

Thanks.


回答1:


#!/usr/bin/perl -w

require LWP::UserAgent;

my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;

my $response = $ua->get('http://search.cpan.org/');

if ($response->is_success) {
    print $response->title();
}
else {
    die $response->status_line;
}

See LWP::UserAgent. Cheers :-)




回答2:


I use URI::Find::Simple's list_uris method and URI::Title for this.




回答3:


Depending how the link is given and how you define title, you need one or other approach.

In the exact scenario that you have presented, getting the URL with URI::Find, HTML::LinkExtractor etc, and then my $title=URI->new($link)->path() will provide the title and the link.

But if the website title is the linked text like <a href="https://stackoverflow.com/badges"> badged</a>, then How can I extract URL and link text from HTML in Perl? will give you the answer.

If the title is encoded in the link itself and the link is the text itself of the link, how do you define the title?

  1. Do you want the last bit of the URI before any query? What happens with the queries set as URL paths?
  2. Do you want the part between the host and the query?
  3. Do you want to parse the link source and retrieve the title tag if any?

As always going from trivial first implementation to cover all corner cases is a daunting tasks ;-)



来源:https://stackoverflow.com/questions/5532584/getting-the-website-title-from-a-link-in-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!