Limiting results in PROC SQL

一世执手 提交于 2019-11-28 19:09:27

问题


I am trying to use PROC SQL to query a DB2 table with hundreds of millions of records. During the development stage, I want to run my query on an arbitrarily small subset of those records (say, 1000). I've tried using INOBS to limit the observations, but I believe that this parameter is simply limiting the number of records which SAS is processing. I want SAS to only fetch an arbitrary number of records from the database (and then process all of them).

If I were writing a SQL query myself, I would simply use SELECT * FROM x FETCH FIRST 1000 ROWS ONLY ... (the equivalent of SELECT TOP 1000 * FROM x in SQL Server). But PROC SQL doesn't seem to have any option like this. It's taking an extremely long time to fetch the records.

The question: How can I instruct SAS to arbitrarily limit the number of records to return from the database.

I've read that PROC SQL uses ANSI SQL, which doesn't have any specification for a row limiting keyword. Perhaps SAS didn't feel like making the effort to translate its SQL syntax to vendor-specific keywords? Is there no work around?


回答1:


When SAS is talking to a database via SAS syntax, part of the query can be translated to DBMS language equivalent - this is called implicit pass through. The rest of the query is "post-processed" by SAS to produce final result. Depending on SAS version, DBMS vendor and DBMS version, and in some cases even some connection/libname options, different parts of SAS syntax are translatable/considered compatible between SAS and DBMS and thus sent to be performed by DBMS instead of SAS.

With SAS SQL options - INOBS and OUTOBS - I've worked a lot with MS SQL and Oracle via different versions of SAS, but I haven't seen those ever translated to TOP xxx type of queries, so this is probably not supported yet, although when query touches just DMBS data (no joins to SAS data etc), should be quite doable.

So I think you're left with the so called explicit pass-through - specific SAS SQL syntax to connect to database. This type of queries look like this:

proc sql;
    connect to oracle as db1 (user=user1 pw=pasw1 path=DB1);
    create table test_table as
    select *
    from connection to db1
        ( /* here we're in oracle */
                  select * from test.table1 where rownum <20 
                )
    ;
    disconnect from db1;
quit;

In SAS 9.3 the syntax can be simplified - if there's already a LIBNAME connection, you can reuse it for explicit pass-through:

LIBNAME ORALIB ORACLE user=...;

PROC SQL;
connect to oracle using ORALIB;
create table work.test_table as
        select *
        from connection to ORALIB (
....

When connecting using libname be sure to use READBUFF (I usually set some 5000 or so) or INSERTBUFF options (1000 or more) when loading database.

To see if implicit pass-through takes place, set sastrace option:

option sastrace=',,,ds' sastraceloc=saslog nostsuffix;



回答2:


Have you tried using the outobs option in your proc sql?

For example,

proc sql outobs=10; create table test
    as
    select * from schema.HUGE_TABLE
    order by n;
quit;

Alternatively, you can use SQL passthrough to write a query using DB2 syntax (FETCH FIRST 10 ROWS ONLY), although this requires you to store all your data in the database, at least temporarily.

Passthrough looks something like this:

proc sql;
    connect to db2 (user=&userid. password=&userpw.  database=MY_DB);

    create table test as
    select * from connection to db2 (
        select * from schema.HUGE_TABLE
        order by n
        FETCH FIRST 10 ROWS ONLY
    );
quit;

It requires more syntax and can't access your sas datasets, so if outobs works for you, I would recommend that.



来源:https://stackoverflow.com/questions/11746815/limiting-results-in-proc-sql

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!