Python sas7bdat module usage

后端 未结 4 849
温柔的废话
温柔的废话 2021-02-20 12:41

I have to dump data from SAS datasets. I found a Python module called sas7bdat.py that says it can read SAS .sas7bdat datasets, and I think it would be simpler and more straigh

相关标签:
4条回答
  • 2021-02-20 13:00

    As time passes, solutions become easier. I think this one is easiest if you want to work with pandas:

    import pandas as pd
    df = pd.read_sas('/support/sas/locked_data.sas7bdat')
    

    Note that it is easy to get a numpy array by using df.values

    0 讨论(0)
  • 2021-02-20 13:02

    Personally I think the better approach would be to export the data using SAS then process the external file as needed using Python.

    In SAS, you can do this...

    libname datalib "/support/sas";
    filename sasdump "/support/textfiles/locked_data.txt";
    
    proc export
        data = datalib.locked_data
        outfile = sasdump
        dbms = tab
        label
        replace;
    run;
    

    The downside to this is that while the column labels are used rather than the variable names, the labels are enclosed in double quotes. When processing in Python, you may need to programmatically remove them if they cause a problem. I hope that helps even though it doesn't use Python like you wanted.

    0 讨论(0)
  • 2021-02-20 13:12

    I know I'm late for the answer, but in case someone searches for similar question. The best option is:

    import sas7bdat
    from sas7bdat import *
    foo = SAS7BDAT('/support/sas/locked_data.sas7bdat')
    # This converts to dataframe:
    ds = foo.to_data_frame()
    
    0 讨论(0)
  • 2021-02-20 13:13

    This is only a partial answer as I've found no [easy to read] concrete documentation.

    You can view the source code here

    This shows some basic info regarding what arguments the methods require, such as:

    • readColumnAttributes(self, colattr)
    • readColumnLabels(self, collabs, coltext, colcount)
    • readColumnNames(self, colname, coltext)

    I think most of what you are after is stored in the "header" class returned when creating an object with SAS7BDAT. If you just print that class you'll get a lot of info, but you can also access class attributes as well. I think most of what you may be looking for would be under foo.header.cols. I suspect you use various header attributes as parameters for the methods you mention.

    Maybe something like this will get you closer?

    from sas7bdat import SAS7BDAT
    foo = SAS7BDAT(inFile) #your file here...
    
    for i in foo.header.cols:
        print '"Atrributes"', i.attr
        print '"Labels"', i.label
        print '"Name"', i.name
    

    edit: Unrelated to this specific question, but the type() and dir() commands come in handy when trying to figure out what is going on in an unfamiliar class/library

    0 讨论(0)
提交回复
热议问题