Introduction • Shell language is a powerful language for orchestrating shell An Empirical Study commands on Bad Practices in Bash Scripts • It has been widely used for decades • System administration • Various shell command orchestration • However, its syntax is not straightforward compared to modern languages CS846 • This study focuses on Bash and its common bad practices using Charles Li, Yiwen Dong ShellCheck and IntelliJ Shell Parser RQ1: How common is Bash scripts? RQ2: What are the distribution of Shell language constructs? • GitHub • 320k public Shell repositories • Octoverse(An annual survey by Github) • SIMPLE_COMMAND • echo "Hello, world!" • Shell language has been among the top 10 programming languages in the last 5 years • grep mysql/etc/passwd based on the amount of unique contributors in public and private repositories • These constructs are • Top 1000 Shell repositories Abstract Syntax Tree(AST) • 22240 bash files nodes defined in Intellij Parser • 4927 sh files • Not each node is useful • 264 z-shell files as some can be recursive • 5 c-shell files expansion rules • 2 k-shell files
RQ3: How frequent do bad practices occur in Manual inspection the Bash files? • 8279/27167(30.5%) of files • Manual inspection of files with <2000 errors revel large files with exhibit no bad practices. Some many of the same errors. files have extremely high • Error SC2086 SC2162 SC2140 covers 7048/7091(99.39%) of all errors in these number of errors <2000 files • Mean of 9.43 Name Level Count Message • Median of 3 SC2086 Info 5488 Double quote to prevent globbing and word splitting. SC2162 Info 804 read without -r will mangle backslashes SC2140 Warning 756 Word is on the form "A"B"C" (B indicated). Did you mean "ABC" or "A\"B\"C"? Occurrence of group in top 15 most common RQ4: What is the distribution of bad practices in the Suggestion Bash files? What are the common bad practices? Rank Level Suggestion Count Message Group • Top 10 most seen errors are … Group Count Percentage 1 info SC2086 130567 Double quote to prevent globbing and word splitting. Quote Parsing 20103 9.55% 2 style SC2006 13694 Use $(...) notation instead of legacy backticked `...`. Syntax Companion 0 0.00% 3 warning SC2034 12575 foo appears unused. Verify it or export it. Variable 4 info SC1091 8546 Not following: (error message here) Parsing Syntax 13694 6.51% 5 warning SC2154 7935 var is referenced but not assigned. Variable Variable 23671 11.25% 6 warning SC2046 4919 Quote this to prevent word splitting Quote Error Handling 3064 1.46% 7 style SC2002 4522 Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead. IO Quote 142062 67.51% 8 error SC1041 4155 Found 'eof' further down, but not on a separate line. Parsing IO 7834 3.72% 9 error SC1072 4131 Unexpected .. Parsing 10 info SC2016 3362 Expressions don't expand in single quotes, use double quotes for that. Quote Logic 0 0.00% 11 info SC2162 3312 read without -r will mangle backslashes IO sum: 210428 12 error SC1073 3271 Couldn't parse this (thing). Fix to allow more checks. Parsing 13 warning SC2027 3214 The surrounding quotes actually unquote this. Remove or escape them. Quote 14 style SC2004 3161 $/${} is unnecessary on arithmetic variables. Variable 15 warning SC2164 3064 Use cd ... || exit in case cd fails. Error Handling
RQ5: What are the more error-prone shell language constructs? Error • We associate each Abstract Syntax Tree(AST) node to errors identified by ShellCheck • Error location from ShellCheck output is quite limited and some errors might • FOR_CLAUSE be incorrectly associated to some parent nodes • for URL in ${URLS[@]}; do... • Results collected have 4 types of output • for URL in "${URLS[@]}"; do... • Error • Double quote array • Warning expansion to avoid re- • Info splitting elements • Style Warning Info • ASSIGNMENT_COMMAND • VARIABLE • rmdir $STAGING • export foo="$(mycmd)" • rmdir "$STAGING" Return value of mycmd is ignored • Double quotes to prevent globbing and word Better to have export on a splitting separate line • Case-by-case • Some other errors are due to unused variables ShellCheck is quite limited in finding references in external files
Threats to Validity Style • Auto-generated scripts could skew the data • ShellCheck is not perfect • COMMAND_SUBSTITUION_ • The data was prepared on a windows machine with CRLF and some COMMAND ShellCheck errors are sensitive to this • echo `uname` • There are exceptions in ShellCheck errors • Use $(...) notation instead of legacy backticked `...` • False positive • Legacy syntax • The boundary between Error/Warning/Info/Style are not clear-cut • Backtick is hard to nest Conclusion • Shell language is popular with Bash being the mainstream • Quoting, variable handling, and syntax of Bash language can use the most help from researchers and developers to make better
Recommend
More recommend