Fastest way to recursively clean HTML files on a partition?

﹥>﹥吖頭↗ 提交于 2019-12-12 04:06:50

问题


script for cleaning HTML files i.e. delete everything after </HTML> tag (without quotes), for all files recursively in a partition. This would be like recovering Web server content after Virus infects/injects code in multiple HTML files


回答1:


You've tagged this question "vbscript" and "wscript" which I have no idea about, but if you have access to a unix or linux system, you might be able to use this command line:

find /path/to/root -exec grep -qi '</html>' -exec sed -ne -i'' 's|</html>.*|</html>|I;1,/<\/html>/Ip' {} \;

Check your command line options for sed to make sure the -i option is used the right way. This works for me in FreeBSD.

Always back up your data before trying anything like this.




回答2:


Start with top level code to test:

  Dim aTests : aTests = Array( _
      Array( "", "", "" ) _
    , Array( "<html></html>junk", "</html>", "<html></html>" ) _
  )
  Dim aTest
  For Each aTest In aTests
      WScript.Echo qq(aTest(0))
      WScript.Echo qq(aTest(1))
      WScript.Echo qq(cutTail(aTest(0), aTest(1)))
      Wscript.Echo CStr(aTest(2) = cutTail(aTest(0), aTest(1)))
      WScript.Echo
  Next

a function that could solve your first sub task - cleaning a string:

Function cutTail(sTxt, sFnd)
  cutTail = sTxt
  Dim nPos : nPos = Instr(1, sTxt, sFnd, vbTextCompare)
  If 0 < nPos Then cutTail = Left( sTxt, nPos + Len(sFnd) - 1)
End Function

Write a bare bones Sub to traverse a folder tree and call a "do what I want" Sub for each file found:

Sub walkDirs(oDir, fFile)
  Dim oItem
  For Each oItem In oDir.Files
      fFile oItem
  Next
  For Each oItem In oDir.SubFolders
      walkDirs oItem, fFile
  Next
End Sub

Test drive it with a trivial worker Sub provided:

  Dim sRDir : sRDir     = "..\data"
  Dim fFile : Set fFile = GetRef("justPrint")
  walkDirs goFS.GetFolder(sRDir), fFile

Sub justPrint(oFile)
  WScript.Echo "Processing:", qq(oFile.Path)
End Sub

Write a 'first attempt' version for a worker Sub that cleans a file:

Sub cleanHtml(oFile)
  WScript.Echo "Processing:", qq(oFile.Path)
  Dim sAll : sAll = cutTail(OFile.OpenAsTextStream(ForReading).ReadAll(),"</html>")
  OFile.OpenAsTextStream(ForWriting).Write sAll
End Sub

Use it on a test folder with a representative sample set of files. Look for problems:

Will cutTail fail for data like:

, Array( "<html></html>", "</HTml>", "<html></html>" ) _
, Array( "<html><!--</html>-->keep</html>junk", "</HTml>", "<html><!--</html>-->keep</html>" ) _

Will the traversal fail because of security restriction?

Will your script clobber .js, .css, or .jpg files?



来源:https://stackoverflow.com/questions/9453161/fastest-way-to-recursively-clean-html-files-on-a-partition

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!