问题
script for cleaning HTML files i.e. delete everything after </HTML> tag (without quotes), for all files recursively in a partition. This would be like recovering Web server content after Virus infects/injects code in multiple HTML files
回答1:
You've tagged this question "vbscript" and "wscript" which I have no idea about, but if you have access to a unix or linux system, you might be able to use this command line:
find /path/to/root -exec grep -qi '</html>' -exec sed -ne -i'' 's|</html>.*|</html>|I;1,/<\/html>/Ip' {} \;
Check your command line options for sed to make sure the -i option is used the right way. This works for me in FreeBSD.
Always back up your data before trying anything like this.
回答2:
Start with top level code to test:
Dim aTests : aTests = Array( _
Array( "", "", "" ) _
, Array( "<html></html>junk", "</html>", "<html></html>" ) _
)
Dim aTest
For Each aTest In aTests
WScript.Echo qq(aTest(0))
WScript.Echo qq(aTest(1))
WScript.Echo qq(cutTail(aTest(0), aTest(1)))
Wscript.Echo CStr(aTest(2) = cutTail(aTest(0), aTest(1)))
WScript.Echo
Next
a function that could solve your first sub task - cleaning a string:
Function cutTail(sTxt, sFnd)
cutTail = sTxt
Dim nPos : nPos = Instr(1, sTxt, sFnd, vbTextCompare)
If 0 < nPos Then cutTail = Left( sTxt, nPos + Len(sFnd) - 1)
End Function
Write a bare bones Sub to traverse a folder tree and call a "do what I want" Sub for each file found:
Sub walkDirs(oDir, fFile)
Dim oItem
For Each oItem In oDir.Files
fFile oItem
Next
For Each oItem In oDir.SubFolders
walkDirs oItem, fFile
Next
End Sub
Test drive it with a trivial worker Sub provided:
Dim sRDir : sRDir = "..\data"
Dim fFile : Set fFile = GetRef("justPrint")
walkDirs goFS.GetFolder(sRDir), fFile
Sub justPrint(oFile)
WScript.Echo "Processing:", qq(oFile.Path)
End Sub
Write a 'first attempt' version for a worker Sub that cleans a file:
Sub cleanHtml(oFile)
WScript.Echo "Processing:", qq(oFile.Path)
Dim sAll : sAll = cutTail(OFile.OpenAsTextStream(ForReading).ReadAll(),"</html>")
OFile.OpenAsTextStream(ForWriting).Write sAll
End Sub
Use it on a test folder with a representative sample set of files. Look for problems:
Will cutTail fail for data like:
, Array( "<html></html>", "</HTml>", "<html></html>" ) _
, Array( "<html><!--</html>-->keep</html>junk", "</HTml>", "<html><!--</html>-->keep</html>" ) _
Will the traversal fail because of security restriction?
Will your script clobber .js, .css, or .jpg files?
来源:https://stackoverflow.com/questions/9453161/fastest-way-to-recursively-clean-html-files-on-a-partition