hack the cpython
play

Hack The CPython Batuhan Taskaya @isidentical What is hacking? - PowerPoint PPT Presentation

Hack The CPython Batuhan Taskaya @isidentical What is hacking? Why do we hack? Yes, we want FREEDOM! We want to use PEP313! Before we hack, Learn the internals Lexing - Tokenization - Read #define NEWLINE 4 #define INDENT


  1. Hack The CPython Batuhan Taskaya @isidentical

  2. What is hacking?

  3. Why do we hack?

  4. Yes, we want FREEDOM! We want to use PEP313!

  5. Before we hack, Learn the internals

  6. Lexing - Tokenization - Read #define NEWLINE 4 #define INDENT 5 - Split #define DEDENT 6 - Set the first #define LPAR 7 token #define RPAR 8 #define LSQB 9 #define RSQB 10 #define COLON 11 #define COMMA 12

  7. Parsing - Parser - Generated by PGen2 - Keeps record of structres in arcs, dfas etc. - Keeps non-affect things (like whitespace) - Constructs a CST

  8. AST (where actual hack begins) - Generated by ASDL class RewriteName (NodeTransformer): - A highly relational tree def visit_Name(self, node): that constructed from return ast.Name(“a” + CST node.id, node.ctx) - Doesn’t keep any thing if it doesn’t need (like whitespace) - Can be manipulated easily

  9. Bytecode Generation >>> dis.dis("a.xyz(3)") - CFG construction - Compiling to a code 1 0 LOAD_NAME 0 (a) object 2 LOAD_METHOD 1 (xyz) - Peephole 4 LOAD_CONST 0 (3) 6 CALL_METHOD 1 8 RETURN_VALUE

  10. Evaluation - A biiig for loop frame graph - (with labeled goto’s if gcc) - Tons of structs tries to track everything - Based on frame by frame execution atop on stacks - Global & Local namespaces

  11. Let’s Hack

  12. Walrus on Python 3.7 A project that allows you to use walrus operator on python 3.7 with using a new encoding

  13. The Strategy For Hacking - Should run before the tokenization happen - Needs a new tokenizer or modification to python’s tokenize module - Should be tokenized with that tokenizer - Needs an untokenizer that consumes sequence of tokens to construct source back - Should stream that source to real tokenizer

  14. Modifiying the Tokens tokens.COLONEQUAL = 0xFF tokens.tok_name[0xFF] = "COLONEQUAL" - Add a new token under `token` tokenize.EXACT_TOKEN_TYPES[":="] = module (where python keep token tokens.COLONEQUAL names and ids) - Add a new key to `tokenize.EXACT_TOKEN_TYPES` tokenize.PseudoToken = for getting token name when tokenize.Whitespace + tokenize.group( that token streamed r":=", - Updating rule for tokenization tokenize.PseudoExtras, (if not python will throw error tokenize.Number, tokens because it cant tokenize.Funny, understand :=) tokenize.ContStr, tokenize.Name, )

  15. Modifying The Source def generate_walrused_source (readline): - A function that reads walrused source_tokens = list(tokenize(readline)) source and returns the 3.7 modified_source_tokens = adapted source source_tokens.copy() - Tokenizes the walrused source with new modifications for index, token in - Creates a copy of that tokens enumerate(source_tokens): - Uses real one for detection and the copy for modification if token.exact_type == tokens.COLONEQUAL: <code for replacing that token> return untokenize(modified_source_tokens)

  16. Creating decode function for Encoding def decode(input, errors ="strict", - Reads source encoding=None): - Decodes with the actual decoding if not isinstance(input, bytes): - Streams into input, _ = encoding.encode( input, `generate_walrused_source` errors) - Returns the clean source back buffer = io.BytesIO(input) result = generate_walrused_source(buffer.readline) return encoding.decode(result)

  17. Adding a search function - `codecs.register` takes a def search(name): search function that returns if "walrus37" in name: the `codecs.CodecInfo` if the encoding = given name is the codec’s name name.strip("walrus37").strip("-") or else returns `None` "utf8" - For using walrus37 with other encoding = lookup(encoding) encodings then utf8 allow user decoder = <partial decoder with to specify encoding and bind given encoding> that encoding into `decode` function walrus_codec = CodecInfo(...) return walrus_codec

  18. Implementing Rejected PEPs A project that allows you to use features of rejected peps

  19. The Strategy For Hacking - Should run when imported - Should be effective only with-in the Allow(<pep num>) space - If the syntax is used outside the scope should raise the proper error (for an example if I used without the pep313 scope it should raise NameError)

  20. Implementing Peps (Example PEP313) - Should go through all names (a, class PEP313(HandledTransformer ): x, obtainer, I, IV, test) def visit_Name(self, node): - If the name is a valid roman number = roman(node.id) literal if number: - Get the value of that literal and then replace it with proper return ast.Num(number) number return node

  21. Scoping - Should go through all with class PEPTransformer (Transformer): statements def visit_With(self, node): - Find with’s name and check if if <name check>: name is `Allow` pep = <get first arg> - Get args of `Allow` (PEP Number) new_node = <get node> - Dispatch the elements of that with to proper PEP handler copyloc(new_node, node) fix_missing(new_node) return node

  22. Runtime - Run when imported def allow(): - Get the source code of the file main = __import__("__main__") it is imported tf = PEPTransformer() - Transform that source into AST f = main.__file__ - Dispatch AST to Scoping Handler - Get back the AST main_ast = ast.parse(<open>) - Compile AST to bytecode main_ast = tf.visit(main_ast) - Run the bytecode fix_missing_locations(main_ast) bc= compile(main_ast, f, "exec") exec(bc, main. __dict__) allow()

  23. Rusty Return Implicitly return the last expression (like rust)

  24. The Strategy For Hacking - Should run when function decorated - Should be return the last expression - Should support infinite branching

  25. Transforming AST (1) - Visit the function definition class RLR(ast.NodeTransformer ): - Remove the @rlr from the def visit_FunctionDef (self, fn): decorators list (for preventing self._adjust(fn) infinite recursion) ds = filter(lambda d: d.id != "rlr", fn.decorator_list ) fn.decorator_list = list(ds) return fn

  26. Transforming AST (2) - If the last node is an def _adjust(self, container: ast.AST, items: str = "body") -> None: expression should replace last items = getattr(container, items) if items is node with `ast.Return` not None else container - Call itself back while the last last_stmt = items[-1] statement is `ast.If` if isinstance(last_stmt, ast.Expr): items.append(ast.Return(value=items.pop().value)) elif isinstance(last_stmt, ast.If): self._adjust(last_stmt) if len(last_stmt.orelse) > 0: self._adjust(last_stmt.orelse, None) else: return None

  27. Poophole Optimizer An extra bytecode optimizer for python

  28. The Strategy For Hacking - Should run when function decorated - Should go through bytecode and only apply the optimizations the user specified - Should re-set the optimized bytecode

  29. Optimize Function - A decorator that takes a set of @classmethod options def optimize(cls, el): - Creates a `dis.Bytecode` from def wrapper(func): function buffer = Bytecode(func) - Call optimizers by checking the given options if el: - Re-set the bytecode buffer = elem(buffer) - Return the function reset_bytecode(func, buffer) return func return wrapper

  30. Optimizers 1 (Example Elem Local Vars) - Go over bytecode buffer def _elem_locals(self, buffer, - Keep a dict of variables their function): value is a constant (like a int constant_loaded = False or string) stack, symbols = [], {} - Find unused variables for instr in buffer: <create a list of symbols> unuseds = [(unused[0], unused[1]) for unused in symbols.values() if unused[2] == 0]

  31. Optimizers 2 (Example Elem Local Vars) - Remove unused parts from unused_consts, unused_varnames = bytecode [], [] - Remove unnecessary constants offset = 0 - Remove unnecessary symbols for value, unused in unuseds: <replace code> <remove consts> <remove names>

  32. Catlizor v1-extended Assign hooks to python functions without mutating functions

  33. The Strategy For Hacking - Should not mutate the function itself - Should notify before a function call - Should notify during a function call (result = notify(call(x))) - Should notify after a function call

  34. Hooking - Write onto the memory address #pragma pack(push, 1) of default function call jumper = { function .push = 0x50, - Written by @dutc .mov = {0x48, 0xb8}, .jmp = {0xff, 0xe0} }; #pragma pack(pop) lpyhook(_PyFunction_FastCallKeywords, &hookify_PyFunction_FastCallKeywords);

  35. Modifiying - Adding hooks for pre, on call PyObject * and post actions hookify_PyFunction_FastCallKeywords - Calling catlizor interface when (PyObject *func, PyObject * const these hooks activated *stack, Py_ssize_t nargs, PyObject *kwnames) { <code> <code> }

  36. Thanks @isidentical

Recommend


More recommend