c style strings
play

C-Style Strings CS2253 Owen Kaser, UNBSJ Strings In C and some - PowerPoint PPT Presentation

C-Style Strings CS2253 Owen Kaser, UNBSJ Strings In C and some other low-level languages, strings are just consecutive memory locations that contain characters. A special null character (ASCII code 0) terminates the string.


  1. C-Style Strings CS2253 Owen Kaser, UNBSJ

  2. Strings ● In C and some other low-level languages, strings are just consecutive memory locations that contain characters. A special “null character” (ASCII code 0) terminates the string. ● Common string-processing library routines are good source of assembly-language examples.

  3. Making a Constant String ● (Review) Use DCB and don't forget the null character terminator ● mystring dcb “hello”,0

  4. A String Local Variable ● Suppose you know you need a string local variable. If you know the maximum length you could possibly need (say 50 characters), proceed as follows.... ● mySubroutine STMFD SP!, {some regs, LR} SUB SP, SP, #52 ;maintain SP alignment MOV R0, #0 ; null character STRB R0, [SP] ; terminate string (show picture) … use space from SP to SP+51 for your string.. ADD SP, SP, #52 ; pop off space used by string LDMFD SP!, {some regs, PC}

  5. Stack Smashing ● Q: What if someone is allowed to put a 56-byte string into your 52 byte area? ● A: You affect the things in the memory addresses above your string. ● The last thing pushed by the STMFD was the return address. So you have a wrong return address. ● A cracker can write some nasty machine code program as the 56-byte “string” and arrange for you to return to her program. ● Moral: String locals need to be very carefully checked to see that they are not too long. ● Some modern CPUs will mark the stack region of memory as “nonexecutable” to help. You can still be forced to return to an arbitrary location in the existing program, may be good enough for cracker.

  6. Returning a String ● Suppose your subroutine is supposed to return a string. ● You can just return the memory address of somewhere in memory that holds the characters of your string. (In C terminology, you return a pointer to your characters.) ● But that somewhere needs to be “safe” - not subject to arbitrary destruction. ● Any stack location below the top of the stack is not safe.

  7. Bad Scenario ● main subroutine calls foo ● foo has a local string variable, v, that it puts some lovely string into. ● foo returns the address of v to main ● main turns around and calls bar ● bar returns. main tries to use the lovely string. Unhappiness results.

  8. Bad Scenario, picture 1

  9. Bad Scenario, picture 2 ● Because the string address sent by 'foo' to main was in the danger zone, 'bar' trashed it. Not bar's fault. ● Solution: Never return the address of a local variable.

  10. Non-Reentrant Solution ● If a subroutine S needs to return a string (whose maximum length is known), then it can put the string in a “buffer” memory location set aside just for S. And it can return the address of S to its caller. ● S's buffer is safe enough...except from itself. This approach means S won't be reentrant – S cannot be recursive. ● And callers to S should copy out the answer, in case anyone they invoke also calls S.

  11. Example S_buffer DCB 0 SPACE 31 ; total length 32 S STMFD SP!,{...,LR} … put some string into S_buffer... LDR R0, =S_buffer ; return value in R0 LDMFD, SP!,{...,PC} ;return to caller

  12. Length of a String (in R0) strlen mov R1, #0 ; length counter loop ldrb R2, [R0],#1 ; get current character cmp R2,#0 addne R1,R1,#1 bne loop mov R1, R0 ; return value in R0 mov PC,LR ; return ● Since this is a leaf method, we didn't need STM and LDM

  13. Reverse (buffer version, untested) rev_buffer SPACE 32 reverse mov R1,R0 ;R1 is caller save stmfd SP!, {R1,LR} bl strlen ;length in R0 mov R1,#0 ldr R2,=rev_buffer strb R1, [R2,R0,LSL #0] ; mark end sub R0, R0, #1 ldr R1, [SP,#4] ; recover start of input loop ldrb R3, [R1],#1 ;the copying loop cmp R3,#0 beq done strb R3, [R2, R0, LSL #0] sub R0, R0, #1 b loop done ldmfd SP!, {R1, LR} ldr R0, =rev_buffer ;return value mov PC, LR

  14. Or, Use a Stack ● Can push a bunch of characters to stack from input. (And count them). ● Pop them off, one at a time, and append to buffer ● Then return address of buffer.

  15. Alternative Approach ● We can make the caller responsible for finding space for us to store the returned string. ● The address of the space for the returned string (probably in the caller's activation record) is passed as a parameter. ● This is a little better than the buffer approach.

  16. Reverse (param 2 has address) reverse mov R2,R0 ;R2 is caller save stmfd SP!, {R2,LR} bl strlen ;length in R0 mov R2,#0 ldr R1,=rev_buffer strb R2, [R1,R0,LSL #0] ; mark end sub R0, R0, #1 ldr R2, [SP,#4] ; recover start of input loop ldrsb R3, [R2],#1 ;the copying loop beq done strb R3, [R1, R0, LSL #0] sub R0, R0, #1 b loop done ldmfd SP!, {R2, PC} ; no return value

  17. Making It Robust ● When the address of an output buffer is passed in, you should usually pass along another parameter to indicate how long the buffer is. ● And the string routine should be coded to avoid overflowing the buffer. ● Without the “how long” parameter, the string routine would have no way of knowing when overflow might occur. ● Early design of the C string library didn't really seem to appreciate this enough. Later additions did, but by then, programmers had developed sloppy habits.

Recommend


More recommend