问题
This is the code to extract all the href links of a website by passing url of the website.
from BeautifulSoup import BeautifulSoup
import urllib2
import re
html_page = urllib2.urlopen("http://kteq.in/services")
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
if link.get('href')==None:
continue
result = re.sub(r"http\S+", "", link.get('href'))
print result
When I run the above code, the href links of that website are extracted. I get the following output.
index
index
#
solutions#internet-of-things
solutions#online-billing-and-payment-solutions
solutions#customer-relationship-management
solutions#enterprise-mobility
solutions#enterprise-content-management
solutions#artificial-intelligence
solutions#b2b-and-b2c-web-portals
solutions#robotics
solutions#augement-reality-virtual-reality
solutions#azure
solutions#omnichannel-commerce
solutions#document-management
solutions#enterprise-extranets-and-intranets
solutions#business-intelligence
solutions#enterprise-resource-planning
services
clients
contact
#
#
#
#
#
#
#
#contactform
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
index
services
#
contact
#
iOSDevelopmentServices
AndroidAppDevelopment
WindowsAppDevelopment
HybridSoftwareSolutions
CloudServices
HTML5Development
iPadAppDevelopment
services
services
services
services
services
services
contact
contact
contact
contact
contact
#
#
#
#
Now, I have to extract the CSS from these href links. For example, I have to extract the CSS from the 'index' href link which I've obtained in the output. Please suggest me.
回答1:
You can loop through all the href links you have collected and get the css links in those pages.
base_link='http://kteq.in/'
hrefs = ['index']
for link in hrefs:
url = base_link+link
html_page = urllib.request.urlopen(url)
soup = BeautifulSoup(html_page,'html.parser')
css_links = []
for link in soup.findAll('link'):
css_links.append(re.search(r"[A-Za-z0-9:/.-]+.css",link.get('href')))
for i in css_links:
if i==None:
continue
print(i[0])
By going through the index page i got the following css links
Outputbootstrap/bootstrap.min.css
https://maxcdn.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css
https://cdn.linearicons.com/free/1.0.0/icon-font.min.css
//fonts.googleapis.com/css
cards/card.css
GalleryStyle/set1.css
css/custom.css
page-transition/css/component.css
page-transition/css/animations.css
https://cdnjs.cloudflare.com/ajax/libs/normalize/5.0.0/normalize.min.css
https://cdnjs.cloudflare.com/ajax/libs/slick-
carousel/1.5.5/slick.min.css
css/scrollpage.css
css/changingtext.css
css/color-slider.css
来源:https://stackoverflow.com/questions/51905704/extract-css-from-href-links