Using Python to scrape DataSet and Query data from RDL

有些话、适合烂在心里 提交于 2019-12-23 03:17:19

问题


I set out today with the intent to parse an SSRS RDL file (XML) using Python in order to gather the DataSet and Query data. A recent project has me back tracking on a variety of reports and data sources with the intention of consolidating and cleaning up what we have published.

I was able to use this script to create a CSV file with the following columns: system path|report file name|command type|command text|

It's not very elegant, but it works.

What I'm hoping to be able to do with this post is solicit for any of you experts out there who have either tried this already or are experienced in XML parsing with Python to take a shot at cleaning it up and provided the ability to:

  • Include Headers, which would be XML tags
  • Include DataSet name in the column
  • Deliver results into single file

Here is the full code in my "rdlparser.py" file:

import sys, os

from xml.dom import minidom
xmldoc = minidom.parse(sys.argv[1])

content = ""
TargetFile = sys.argv[1].split(".", 1)[0] + ".csv"
numberOfQueryNodes = 0

queryNodes = xmldoc.getElementsByTagName('Query')
numberOfQueryNodes = queryNodes.length -1


while (numberOfQueryNodes > -1):
    content = content + os.path.abspath(sys.argv[1])+ '|'+ sys.argv[1].split(".", 1)[0]+ '|' 
    outputNode = queryNodes.__getitem__(numberOfQueryNodes)
    children = [child for child in outputNode.childNodes if child.nodeType==1]
    numberOfQueryNodes = numberOfQueryNodes - 1
    for node in children:
        if node.firstChild.nodeValue != '\n          ':
            if node.firstChild.nodeValue != 'true':
                content = content + node.firstChild.nodeValue + '|'
    content = content + '\n'

fp = open(TargetFile, 'wb')
fp.write(content)
fp.close()

回答1:


I know you asked for Python; but I figured Powershell's built in xml handling capabilities would make this fairly simple. While I'm sure it is not guru level, I think it came out pretty nicely (the lines starting with # are comments):

# The directory to search 
$searchpath = "C:\"

# List all rdl files    from the given search path recusrivley searching sub folders, store results into a variable
$files = gci $searchpath -recurse -filter "*.rdl" | SELECT FullName, DirectoryName, Name 

# for each of the found files pass the folder and file name  and the xml content
$files | % {$Directory = $_.DirectoryName; $Name = $_.Name; [xml](gc $_.FullName)}
            # in the xml content navigate to the the DataSets Element
            | % {$_.Report.DataSets} 
                    # for each query retrieve the Report directory , File Name, DataSource Name, Command Type, Command Text output thwese to a csv file
                    | % {$_.DataSet.Query} | SELECT  @{N="Path";E={$Directory}}, @{N="File";E={$Name}}, DataSourceName, CommandType, CommandText | Export-Csv Test.csv -notype   


来源:https://stackoverflow.com/questions/3206993/using-python-to-scrape-dataset-and-query-data-from-rdl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!