Why are static analysis tools missing this seemingly obvious case?

大兔子大兔子 提交于 2020-01-02 05:48:28

问题


I have a very simple C program with a potential buffer overflow using strcpy:

#include <string.h>
#include <stdio.h>

void buffer_overflow(char* dst, const char* src)
{
        strcpy(dst, src);
}

int main(int argc, char** argv)
{
        if(argc == 2)
        {
                char buffer[16] = {0};
                buffer_overflow(buffer, argv[1]);
                printf("[%d]: %s", (int)strlen(buffer), buffer);
        }

        return 0;
}

Neither clang static analyzer (using scan-build gcc -O0 -g3 -gdwarf-2) nor cppcheck (using cppcheck --enable=warning,style) find this as an issue.

Am I just asking too much from my static analysis tools?


回答1:


I can't speak for the quality of "your" static analysis tools.

Here's a dynamic analysis tool, CheckPointer, from my company that finds the problem(s) with your code (which I tested as "buggy.c"):

C:\DMS\Domains\C\GCC4\Tools\CheckPointer\Example>runexample
RunExample.cmd 1.2: Batch file to execute C CheckPointer example
Copyright (C) 2011-2016 Semantic Designs; All Rights Reserved
c:\DMS\Domains\C\GCC4\Tools\CheckPointer\Example\Source\buggy.c
*** Instrument source code for memory access checking
Copyright (C) 2011 Semantic Designs; All Rights Reserved
C~GCC4 CheckPointer Version 1.2.1001
Copyright (C) 2011-2016 Semantic Designs, Inc; All Rights Reserved; SD Confidential
Powered by DMS (R) Software Reengineering Toolkit
*** Unregistered CheckPointer Version 1.2
*** Operating with evaluation limits.
Parsing source file "C:/Users/idbaxter/AppData/Local/Temp/DMS/Domains/C/GCC4/Tools/CheckPointer/Example/Source/buggy.c" using encoding CP-1252 +CRLF $^J $^M $^e -1 +8 ...
Writing target file "C:/Users/idbaxter/AppData/Local/Temp/DMS/Domains/C/GCC4/Tools/CheckPointer/Example/Target/buggy.c" using encoding CP-1252 +CRLF $^J $^M $^e -1 +8 ...
*** Compiling sources with memory access checking code
gcc.exe -I"c:\DMS\Domains\C\GCC4\Tools\CheckPointer" -I.\Target -obuggy.exe Target\buggy.c ...


C:\DMS\Domains\C\GCC4\Tools\CheckPointer\Example\Source>C:/Users/idbaxter/AppData/Local/Temp/DMS/Domains/C/GCC4/Tools/CheckPointer/Example\buggy.exe foo
[3]: foo

C:\DMS\Domains\C\GCC4\Tools\CheckPointer\Example\Source>C:/Users/idbaxter/AppData/Local/Temp/DMS/Domains/C/GCC4/Tools/CheckPointer/Example\buggy.exe 0123456789ABCDE
[15]: 0123456789ABCDE

C:\DMS\Domains\C\GCC4\Tools\CheckPointer\Example\Source>C:/Users/idbaxter/AppData/Local/Temp/DMS/Domains/C/GCC4/Tools/CheckPointer/Example\buggy.exe 0123456789ABCDEF
*** Error: CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer
       Dereference of pointer is out of bounds.
  in wrapper function: strcpy
called in function: buffer_overflow, line: 6, file: C:/Users/idbaxter/AppData/Local/Temp/DMS/Domains/C/GCC4/Tools/CheckPointer/Example/Source/buggy.c
called in function: main, line: 14, file: C:/Users/idbaxter/AppData/Local/Temp/DMS/Domains/C/GCC4/Tools/CheckPointer/Example/Source/buggy.c
*** Error: CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer
       Dereference of pointer is out of bounds.
  in wrapper function: strlen
called in function: main, line: 15, file: C:/Users/idbaxter/AppData/Local/Temp/DMS/Domains/C/GCC4/Tools/CheckPointer/Example/Source/buggy.c
*** Error: CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer
       Dereference of pointer is out of bounds.
  in wrapper function: printf
called in function: main, line: 15, file: C:/Users/idbaxter/AppData/Local/Temp/DMS/Domains/C/GCC4/Tools/CheckPointer/Example/Source/buggy.c
[16]: 0123456789ABCDEF
C:\DMS\Domains\C\GCC4\Tools\CheckPointer\Example\Source>



回答2:


This is a fairly open-ended question that doesn't have a right answer. But as someone who has worked on several static analyzers, I'll bite and briefly explain what's hard about this example.

Problem 1: False positives

You and I know that argv[1] could be an arbitrarily long string. (In part that knowledge is derived from the context of being a Stack Overflow question, in which code is usually presented absent any context or a-prior assumptions.) But if a static analyzer reported every strcpy where the source string was not known (by the static analyzer!) to have a bounded length, it would in practice report a very large number of what most developers would consider false positives (FPs): incorrect reports that they don't want to see.

That's because there a very large number of cases where strcpy is used correctly, but the reason it is correct is beyond the reasoning capacity of the analyzer, and might not even be present in the source code at all (e.g., "program X is only ever invoked by program Y, which never passes arguments longer than 80 characters"). In comparison, only a very small fraction of strcpy calls (with arguments the analyzer can't bound) are wrong. Let's be generous and say 10% of them are wrong--that's still a 90% FP rate if we report them all, far beyond what most developers will tolerate.

When a tool reports too many FPs, most developers quickly stop using them, at which point it's providing no value. Consequently, most static analyses choose to limit what they report to cases where the tool is fairly confident, at least by default.

Problem 2: Interprocedural analysis

Even for a tool that wants to report this (as you say Cppcheck does, when the strcpy is directly in main), the second problem is understanding what buffer_overflow does. It is impractical for a static analyzer to simply "inline" the contents of callees into callers arbitrarily deeply because the resulting AST would be enormous and the number of paths astronomical. Consequently, analyzers generally summarize callee behavior. The exact form of those summaries, and the algorithms that compute them, are the subject of active academic research and closely guarded trade secrets.

The behavior of strcpy is actually fairly complex compared to what a typical function summary can express. There is a size involved, but that size is derived by examining the contents of the data one of the pointers points at, specifically the location of the first NUL byte. Then that size affects what happens to the things pointed at by both arguments. That's a lot to encode in a summary in a general and scalable way, so most tools don't. As a result, the tool has only a very crude understanding of what buffer_overflow does, typically too crude to allow it to confidently report a defect here.



来源:https://stackoverflow.com/questions/55069152/why-are-static-analysis-tools-missing-this-seemingly-obvious-case

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!