Extract CSS from href links

≡放荡痞女 提交于 2020-01-11 13:21:32

问题


This is the code to extract all the href links of a website by passing url of the website.

from BeautifulSoup import BeautifulSoup
import urllib2
import re
   html_page = urllib2.urlopen("http://kteq.in/services")
   soup = BeautifulSoup(html_page)
   for link in soup.findAll('a'):
      if link.get('href')==None:
          continue
      result = re.sub(r"http\S+", "", link.get('href'))
      print result

When I run the above code, the href links of that website are extracted. I get the following output.

  index
  index
  #
  solutions#internet-of-things
  solutions#online-billing-and-payment-solutions
  solutions#customer-relationship-management
  solutions#enterprise-mobility
  solutions#enterprise-content-management
  solutions#artificial-intelligence
  solutions#b2b-and-b2c-web-portals
  solutions#robotics
  solutions#augement-reality-virtual-reality
  solutions#azure
  solutions#omnichannel-commerce
  solutions#document-management
  solutions#enterprise-extranets-and-intranets
  solutions#business-intelligence
  solutions#enterprise-resource-planning
  services
  clients
  contact
  #
  #
  #

  #
  #
  #
  #
  #contactform
  #
  #
  #
  #
  #
  #
  #
  #
  # 
  #
  #
  #
  #
  #
  #
  index
  services
  #
  contact
  #
  iOSDevelopmentServices
  AndroidAppDevelopment
  WindowsAppDevelopment
  HybridSoftwareSolutions
  CloudServices
  HTML5Development
  iPadAppDevelopment
  services
  services
  services
  services
  services
  services
  contact
  contact
  contact
  contact
  contact

  #
  #
  #
  #

Now, I have to extract the CSS from these href links. For example, I have to extract the CSS from the 'index' href link which I've obtained in the output. Please suggest me.


回答1:


You can loop through all the href links you have collected and get the css links in those pages.

base_link='http://kteq.in/'
hrefs = ['index']
for link in hrefs:
    url = base_link+link
    html_page = urllib.request.urlopen(url)
    soup = BeautifulSoup(html_page,'html.parser')
    css_links = []
    for link in soup.findAll('link'):
        css_links.append(re.search(r"[A-Za-z0-9:/.-]+.css",link.get('href')))

for i in css_links:
    if i==None:
        continue
   print(i[0])

By going through the index page i got the following css links

Output

bootstrap/bootstrap.min.css
https://maxcdn.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css
https://cdn.linearicons.com/free/1.0.0/icon-font.min.css
//fonts.googleapis.com/css
cards/card.css
GalleryStyle/set1.css
css/custom.css
page-transition/css/component.css
page-transition/css/animations.css
https://cdnjs.cloudflare.com/ajax/libs/normalize/5.0.0/normalize.min.css
https://cdnjs.cloudflare.com/ajax/libs/slick-
carousel/1.5.5/slick.min.css
css/scrollpage.css
css/changingtext.css
css/color-slider.css



来源:https://stackoverflow.com/questions/51905704/extract-css-from-href-links

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!