Overview of runtime audits in PEP 578

A lot of programming today is done with the help of wide ecosystem in a language. So each program builds upon another to solve the given problem. As the program grows more and more work needs to be done that are already solved by someone else which is added to the codebase. But this is inherently based on certain factors like maintenance of the module, release cycle, docs, stability etc. More importantly it’s based on trust that the code is supposed to do what it advertises to do. Though many modules are open source the API is used but programmers are less bothered to read the source code.

But there are a lot of times where the module might do something malicious over the course of one release that might go unfound for a long time unless someone looks into suspicious activity and reports it publicly to others. This has happened a lot of times like a few months back where a popular CSS related library in Ruby downloaded around 28 million times was using eval on unrelated code that lead to a security incident to be reported in GitHub. Python is also no different from such attacks since there is no specific way to verify if the third party code is some action like network request, eval function etc. which brings us to the rationale of PEP 578

PEP 578

Abstract

This PEP describes additions to the Python API and specific behaviours for the CPython implementation that make actions taken by the Python runtime visible to auditing tools. Visibility into these actions provides opportunities for test frameworks, logging frameworks, and security tools to monitor and optionally limit actions taken by the runtime.

This PEP proposes adding two APIs to provide insights into a running Python application: one for arbitrary events, and another specific to the module import system. The APIs are intended to be available in all Python implementations, though the specific messages and values used are unspecified here to allow implementations the freedom to determine how best to provide information to their users. Some examples likely to be used in CPython are provided for explanatory purposes.

This PEP provides an API through which auditable hooks can be added which are triggered for a certain event and also a way through which these can be used by existing systems to provide more information for monitoring and so on. Below I will describe a simple program around how this can be used.

Let’s say we want to write a program that does some numerical work and we want to calculate the product of a series. Naturally this can be written as below :

# stats.py
from functools import reduce

def product(series):
    return reduce(lambda acc, num: acc * num, series)

We already have a math library from PyPI as a dependency that does this and we don’t want to reimplement it as a utility function. So we use it in the program

# app.py
import stats

print(stats.product(range(1, 10)))

There is inherent trust that the module does what it does and thus even for new releases of stats dependency unless the API is changed we don’t bother to look into the code. At one state the library could have gained some malicious change where it now makes a http request and the new code is as below.

# stats.py
from functools import reduce

def product(series):
    import urllib.request
    try:
        urllib.request.urlopen("http://example.com")
    except:
        pass
    return reduce(lambda acc, num: acc * num, series)

Unless we look back at the code or have a sandboxed environment where network requests are monitored there would be close to no clue about the http activity made. PEP 578 provides a way through which this could be monitored and adds a function to sys module sys.addaudithook through which a function can be added as a callback for every defined set of events made. This is similar to sys.settrace(trace) where every function call goes through the trace function but in this case it’s only the events. The list is available in the PEP. In the list there is urllib.Request event added to the module and hence for every http request from urllib module an event is triggered. The callback gets the name of the event and the args with which the event is triggered. So we can filter it out

import sys
import stats

def audit_hook(event, args):
    if event in ['urllib.Request']:
        print(f"Network {event=} {args=}")

sys.addaudithook(audit_hook)

print(stats.product(range(1, 10)))

./python.exe audit_hooks_tut/app.py
Network event='urllib.Request' args=('http://example.com', None, {}, 'GET')
362880

Running the above program would give the event name and the args for the event. This event can be added through in Python programs through sys.audit(event, args) and you can view the audit event for urllib.request made in open function of the format sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method()) so for every http request made through urllib.request this event is passed through the hook function.

Types of audit events

HTTP request is one event but there are other things like you want to know the imports made by the module to see if it’s doing an import it shouldn’t be doing. In the above example we can rewrite to see if the code or it’s dependencies that are math related do any import of socket module which is not needed for this use case but is imported by the urllib.request module internally.

import sys
import stats

suspicious_modules = ['socket']

def audit_hook(event, args):
    if event in ['urllib.Request']:
        print(f"Network {event=} {args=}")
    elif event in ['import'] and args[0] in suspicious_modules:
        print(f"Suspicious import action {event=} module={args[0]}")

sys.addaudithook(audit_hook)

print(stats.product(range(1, 10)))

./python.exe audit_hooks_tut/app.py
Suspicious import action event='import' module=socket
Network event='urllib.Request' args=('http://example.com', None, {}, 'GET')
362880

Other events include eval, code compilation using compile, pickling, file open etc. that might also be not required. Since this events can be defined in Python one can also define custom events in the program so that third party library can provide a set of events. For example when I am using stats module and also mod1 from PyPI which is also used by my program and indirectly used by stats module I can monitor for events if any defined by mod1 from my program. In this case mod1 might provide a helper make_request which is used by stats.py that can be audited.

# mod1.py
import sys

def make_request(url):
    sys.audit('make_request', url)
    pass

# stats.py
from functools import reduce

def product(series):
    import mod1
    mod1.make_request("http://example.com")
    return reduce(lambda acc, num: acc * num, series)

# app.py
import sys
import stats

suspicious_modules = ['socket']

def audit_hook(event, args):
    if event in ['urllib.Request', 'make_request']:
        print(f"Network {event=} {args=}")
    elif event in ['import'] and args[0] in suspicious_modules:
        print(f"Suspicious import action {event=} module={args[0]}")

sys.addaudithook(audit_hook)

print(stats.product(range(1, 10)))

With mod1 defining a custom audit hook we can audit it from our program as shown. Thus with the above setup it gives the below output.

./python.exe audit_hooks_tut/app.py
Network event='make_request' args=('http://example.com',)
362880

This also provides library authors to provide hooks for their library so that they can also be used for audit purposes. This also provides a C API and other means to enable more fine grained auditing that are explained in the linked implementation PR. The PEP also has a section explaining limitations of sandboxing. The performance impact of this PEP is also found to be negligible. This PEP is also closely related to PEP 551 - Security transparency in the Python runtime . Thanks to Steve Dower, Christian Heimes and everybody part of the review.

As noted by Steve Dower in this tweet. The above is not recommended way to use it in production apps. Please refer to PEP 551.

Links :

This PEP is merged to master branch and will be released as part of 3.8 beta 1 in a week. I am trying to write a post per feature and if you like this post you might enjoy my other post on f-string debugging in Python 3.8 . Feedback welcome on writing style.

Presses C-x C-s to save the post