问题
I have created a service to join, minify and compress css-references on a CMS system. Example:
Before :
<link href="/Files/css1.css" rel="stylesheet" type="text/css"/>
<link href="/Files/css2.css" rel="stylesheet" type="text/css"/>
<link href="/Files/css3.css" rel="stylesheet" type="text/css" media="all"/>
Now you can write:
<link href="/min.ashx?files=/Files/css1.css,/Files/css2.css,/Files/css3.css" rel="stylesheet" type="text/css" />
My next task is to take all references in head section AUTOMATICALLY and replace them by one single line, as seen in the example.
I should only replace those that falls with in these rules:
- Href starts with '/Files/', to avoid trying to load externals externals
- Only the ones with attribute media or with a media="all" should be included, as the resulting css-file will only have one setting.
I have acces to the raw html of the page, but is stuck on sucsfully locating the references, not knowing if I should parse to xml or use regex or such..
can anyone point me in the right direction?
回答1:
Use HTML Agility Pack. Rough plan of attack:
Load the html content into an HtmlDocument object.
Find the link nodes in the HtmlDocument object via XPath
var nodes = doc.DocumentBody.SelectNodes("//head/link[@type='text/css']");
Retrieve the hrefs from those nodes
string href = nodes[0].Attributes["href"].Value;
Then replace the nodes with the new node.
回答2:
You can find the links that match your rules with regex:
<link href="(/Files/[^"]+)" .* media
It will give you the file path inside the quotes, e.g. '/Files/css1.css'. You can use that result to build up the string you wanted.
C# friendly regex:
@"<link href=""(/Files/[^""]+)"" .* media"
Use the Regex.Match method to get the groupings: http://msdn.microsoft.com/en-us/library/twcw2f1c.aspx
来源:https://stackoverflow.com/questions/12705583/how-can-i-find-and-remove-css-references-in-html-head