Static Analysis for Secure Development • Introduction • Static analysis : What , and why ? • Basic analysis ! • Example : Flow analysis ! • Increasing precision ! • Context -, flow -, and path sensitivity • Scaling it up ! • Pointers, arrays, information flow, …
Current Practice for Software Assurance • Testing ! – Make sure program runs correctly on set of inputs register char *q; char inp[MAXLINE]; char cmdbuf[MAXLINE]; inputs outputs Is it correct? extern ENVELOPE BlankEnvelope; extern void help __P((char *)); extern void settime __P((ENVELOPE *)); extern bool enoughdiskspace __P((long)); extern int runinchild __P((char *, ENVELOPE *)); . . . program oracle – Benefits : Concrete failure proves issue, aids in fix – Drawbacks : Expensive, difficult, hard to cover all code paths , no guarantees
Current Practice (cont’d) • Code Auditing ! – Convince someone else your source code is correct – Benefit : humans can generalize beyond single runs – Drawbacks : Expensive, hard, no guarantees register char *q; cmd < &cmdbuf[sizeof cmdbuf - 2]) while (isascii(*p) && isspace(*p)) char inp[MAXLINE]; *cmd++ = *p++; p++; ! char cmdbuf[MAXLINE]; *cmd = '\0'; if (*p == '\0') ??? extern ENVELOPE BlankEnvelope; break; ! extern void help __P((char *)); /* throw away leading whitespace */ kp = p; extern void settime __P((ENVELOPE *)); while (isascii(*p) && isspace(*p)) ! extern bool enoughdiskspace __P((long)); p++; /* skip to the value portion */ extern int runinchild __P((char *, ENVELOPE *)); while ((isascii(*p) && isalnum(*p)) || *p == '-') ! /* decode command */ extern void checksmtpattack __P((volatile int *, int, char *, ENVELOPE *)); p++; for (c = CmdTab; c->cmdname != NULL; c++) if (*p == '=') if (fileno(OutChannel) != fileno(stdout)) { { { if (!strcasecmp(c->cmdname, cmdbuf)) *p++ = '\0'; ! /* arrange for debugging output to go to remote host */ break; vp = p; ! } (void) dup2(fileno(OutChannel), fileno(stdout)); } /* skip to the end of the value */ settime(e); /* reset errors */ while (*p != '\0' && *p != ' ' && ! peerhostname = RealHostName; errno = 0; !(isascii(*p) && iscntrl(*p)) && if (peerhostname == NULL) *p != '=') /* peerhostname = "localhost"; p++; ! CurHostName = peerhostname; ** Process command. } CurSmtpClient = macvalue('_', e); ** if (CurSmtpClient == NULL) ** If we are running as a null server, return 550 if (*p != '\0') ! ! CurSmtpClient = CurHostName; ** to everything. *p++ = '\0'; ! */ setproctitle("server %s startup", CurSmtpClient); if (tTd(19, 1)) #if DAEMON if (nullserver) printf("RCPT: got arg %s=\"%s\"\n", kp, ! if (LogLevel > 11) { vp == NULL ? "<null>" : vp); { switch (c->cmdcode) { /* log connection information */ rcpt_esmtp_args(a, kp, vp, e); sm_syslog(LOG_INFO, NOQID, case CMDQUIT: if (Errors > 0) "SMTP connect from %.100s (%.100s)", case CMDHELO: break; CurSmtpClient, anynet_ntoa(&RealHostAddr)); case CMDEHLO: } } case CMDNOOP: if (Errors > 0) ! ! /* process normally */ #endif break; ! break; /* output the first line, inserting "ESMTP" as second word */ /* save in recipient list after ESMTP mods */ expand(SmtpGreeting, inp, sizeof inp, e); default: a = recipient(a, &e->e_sendqueue, 0, e); p = strchr(inp, '\n'); if (++badcommands > MAXBADCOMMANDS) if (Errors > 0) ! sleep(1); if (p != NULL) break; *p++ = '\0'; usrerr("550 Access denied"); id = strchr(inp, ' '); continue; /* no errors during parsing, but might be a duplicate */ if (id == NULL) } e->e_to = a->q_paddr; ! id = &inp[strlen(inp)]; } if (!bitset(QBADADDR, a->q_flags)) cmd = p == NULL ? "220 %.*s ESMTP%s" : "220-%.*s ESMTP%s"; { ! message(cmd, id - inp, inp, id); /* non-null server */ message("250 Recipient ok%s", switch (c->cmdcode) bitset(QQUEUEUP, a->q_flags) ? /* output remaining lines */ { " (will queue)" : ""); while ((id = p) != NULL && (p = strchr(id, '\n')) != NULL) case CMDMAIL: nrcpts++; case CMDEXPN: { } *p++ = '\0'; case CMDVRFY: else if (isascii(*id) && isspace(*id)) { /* punt -- should keep message in ADDRESS.... */
If You’re Worried about Security… A malicious adversary is trying to exploit anything you miss! What more can we do?
Static analysis • Analyze program’s code without running it ! • In a sense, we are asking a computer to do what a human might do during a code review • Benefit is (much) higher coverage – Reason about many possible runs of the program – Sometimes all of them , providing a guarantee – Reason about incomplete programs (e.g., libraries) • Drawbacks ! • Can only analyze limited properties • May miss some errors, or have false alarms • Can be time consuming to run
Impact • Thoroughly check limited but useful properties ! – Eliminate categories of errors ! – Developers can concentrate on deeper reasoning ! • Encourages better development practices – Develop programming models that avoid mistakes in the first place – Encourage programmers to think about and make manifest their assumptions Using annotations that improve tool precision ! – • Seeing increased commercial adoption
The Halting Problem • Can we write an analyzer that can prove, for any program P and inputs to it, P will terminate • Doing so is called the halting problem register char *q; char inp[MAXLINE]; char cmdbuf[MAXLINE]; Always terminates? extern ENVELOPE BlankEnvelope; extern void help __P((char *)); extern void settime __P((ENVELOPE *)); extern bool enoughdiskspace __P((long)); extern int runinchild __P((char *, ENVELOPE *)); . . . program P analyzer • Unfortunately, the halting problem is undecidable • That is, it is impossible to write such an analyzer: it will fail to produce an answer for at least some programs (and/or some inputs) Some material inspired by work of Matt Might: http://matt.might.net/articles/intro-static-analysis/
Other properties? • Perhaps security-related properties are feasible • E.g., that all accesses a[i] are in bounds • But these properties can be converted into the halting problem by transforming the program • I.e., a perfect array bounds checker could solve the halting problem, which is impossible! • Other undecidable properties (Rice’s theorem) – Does this SQL string come from a tainted source ? – Is this pointer used after its memory is freed ? – Do any variables experience data races ?
Halting ≈ Index in Bounds • Proof by transformation • Change indexing expressions a[i] to exit (i >= 0 && i < a.length) ? a[i] : exit() ! - Now all array bounds errors instead result in termination - • Change program exit points to out-of-bounds accesses a[a.length+10] ! - • Now if the array bounds checker • … finds an error , then the original program halts • … claims there are no such errors , then the original program does not halt • … contradiction! ! with undecidability of the halting problem -
Static analysis is impossible? • Perfect static analysis is not possible • Useful static analysis is perfectly possible , despite 1. Nontermination - analyzer never terminates, or 2. False alarms - claimed errors are not really errors, or 3. Missed errors - no error reports ≠ error free • Nonterminating analyses are confusing, so tools tend to exhibit only false alarms and/or missed errors • Fall somewhere between soundness and completeness
Soundness Completeness If analysis says that X is If X is true, then analysis true, then X is true. says X is true. ! ! ! True things Things I say ! Things I say are all ! True things Things I say True things Trivially Sound: Say nothing Trivially Complete: Say everything Sound and Complete : Say exactly the set of true things
Stepping back • Soundness : if the program is claimed to be error free, then it really is • Alarms do not imply erroneousness • Completeness : if the program is claimed to be erroneous, then it really is • Silence does not imply error freedom • Essentially, most interesting analyses • are neither sound nor complete (and not both ) • … usually lean toward soundness (“soundy”) or completeness
The Art of Static Analysis • Analysis design tradeoffs • Precision : Carefully model program behavior, to minimize false alarms • Scalability : Successfully analyze large programs • Understandability : Error reports should be actionable • Observation: Code style is important ! • Aim to be precise for “good” programs • It’s OK to forbid yucky code in the name of safety • False alarms viewed positively: reduces complexity • Code that is more understandable to the analysis is more understandable to humans
Tainted Flow Analysis • The root cause of many attacks is trusting unvalidated input • Input from the user is tainted • Various data is used, assuming it is untainted • Examples expecting untainted data • source string of strcpy ( ≤ target buffer size) • format string of printf (contains no format specifiers) • form field used in constructed SQL query (contains no SQL commands)
Recommend
More recommend