Proxy Objects
The Python internals that make proxy objects possible
Introduction
If you’re a Python developer a pattern in that you’ve probably come across, whether or not you’re aware of it, is the concept of the “proxy object”: an object which (doing what it says on the tin) “proxies” calls to another object. Which is to say, calling any method or attribute in FooProxy
passes the call or attribute access through to the underlying Foo
object in some way that makes sense for it’s usage: FooProxy.doStuff()
does the same thing as Foo.doStuff()
more or less.
It’s a powerful construct that enables things like database connection pooling & connection management, filesystem abstraction (writing to a local file, S3 or other destination can become transparent to the developer) and many others. You may wish to (as SQLAlchemy does) proxy the list
object. This proxy can be treated by the programmer like a list but when acted on transparently changes database tables. This is without the programmer needing that logic to be explicitly added, and the object can be passed to a library function that expects a list, duck typing (”if it quacks like a duck …”) rather than strong typing being a feature of the Python language.
It’s all well and good to accept that fact, but like so many other things we can gain a more enlightened understanding on how they work & how to use them by asking what they are exactly, and how do they work?
Some Python Review
Object
In Python, everything derives from object
. That is: everything is (eventually) an inherited subclass of object
. Complex classes are object
, integers are object
, True
is object
, functions are object
, modules are object
, types are object
, None
is object
, everything is object
.
>>> isinstance({}, object)
True
>>> isinstance(1, object)
True
>>> isinstance(True, object)
True
>>> isinstance(isinstance, object)
True
>>> isinstance(os, object)
True
>>> isinstance(type, object)
True
>>> isinstance(None, object)
True
This ends up being more than an implementation detail. We can get pretty far by treating it as such, we can build entire decades long careers working in Python without ever needing to think about this fact too hard but when we do start to examine this property some interesting details emerge.
One question that arises from this examination is that if everything is object
, what then is the fundamental difference between an integer and a module? 1 + 1
is a sensible statement, and Python will return a sensible answer. urllib + datetime
is nonsense, and Python will raise an exception if it’s encountered. Both, though are statements of object + object
. Both the concept of int
and the concept of module
are abstractions provided to us the programmers by the developers of the Python language. The only real difference between what an int
object is and what a module
object is are what extra methods each implements.
Special Methods
Many classes implement methods that are meant to be directly called: datetime.now()
or str.strip()
, and these are the kinds of methods we’re most familiar with. The existence of certain method definitions is how Python knows to treat certain types of objects differently. Our ability to write code like 1 + 1
comes from a number of special methods (known in Python parlance as “dunder” methods, surrounded by two underscore characters.)
Python lacks the concept of a private method that many other object-oriented languages strictly enforce, but as a matter of convention a leading underscore character indicates “you probably shouldn’t use this” and dunder methods are a convention for “really, though, this is a fundamental class property, it’s not meant for general usage.” Python does not, however, enforce this and it remains a decision by the programmer to ignore at their own peril or amusement.
>>> type(datetime)
<class 'module'>
>>>
>>> dir(datetime)
['MAXYEAR', 'MINYEAR', 'UTC', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'date', 'datetime', 'datetime_CAPI', 'time', 'timedelta', 'timezone', 'tzinfo']
>>> type(1)
<class 'int'>
>>>
>>> dir(1)
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__getstate__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'as_integer_ratio', 'bit_count', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'is_integer', 'numerator', 'real', 'to_bytes']
Here for instance, modules are objects that implement (among other things) __loader__
and __spec__
, which Python’s import machinery uses to load a module in to the global namespace, and numbers are objects that implement all your standard numeric functions (__add__
etc)
This leaves open the question of “Why does 1+1
work but urllib + datetime
does not? How does Python know that I can use +
on numbers but not on modules” and the answer, simply is: it doesn’t!
Operators
The Python interpreter provides us with a number of shorthands for our convenience which in actuality just call methods of their objects. The +
character is interpreted by Python as “call the __add__
method of the left hand object, using the right hand object as it’s argument.” myvar + 1
means the same thing as myvar.__add__(1)
(and the latter is how Python actually operates on the two objects)
We can even implement these methods ourselves with the confidence that Python will treat it the way we expect:
>>> class Foo:
... def __init__(self, value):
... self.value = value
... def __add__(self, other):
... return self.value + other
...
>>> foo = Foo(1)
>>> foo + 1
2
Class inheritance means we can override some methods to give us an object that can be interchangeable but with different behavior:
>>> class AppendInt(int):
... def __add__(self, other):
... return int(str(self) + str(other))
...
>>> foo = AppendInt(1)
>>> foo + 2
12
Attributes
One important method inherited all the way down from object
itself is __getattribute__
(and it’s related twin __getattr__
) This is the method that Python uses to implement the .
operator, the operator used to access the attributes & methods of an object.
Every time you call someobject.somemethod()
Python is effectively transparently calling someobject.__getattribute__('somemethod')
to return the function to call.
__getattribute__
being just yet another method every object inherits from it’s base, object
means that like every other method in the inheritance hierarchy, it can be overridden.
A tangential note, overriding this method makes it extremely easy to leave a class in the state where attributes cannot be accessed or accessing any attribute leads to infinite recursion when Python attempts to access such basic methods as __getattribute__
itself, there is a related __getattr__
method that is much safer to override
Overriding __getattribute__
or __getattr__
means that a class can implement methods & attributes that it doesn’t even know about. A class can return default methods such that no matter which attribute you access something will be returned, a class can log accesses to unknown attributes (for debugging or telemetry), or change snake_case
to camelCase
if that is the programming style you choose to enforce, etc.
Proxy Objects
A proxy object is an object which, for various reasons, encapsulates an object & provides a transparent interface to the thing it’s meant to proxy.
Proxy objects can be implemented several ways but typically follow the pattern of keeping a reference to an instance of their underlying class, overriding whichever methods needed for the side-effect, and finally overriding __getattr__
with a method that will call the referent.
An example we use at Crowdalert is Flask-SQLAlchemy. SQLAlchemy is extremely powerful and does many things but it’s concept of session & connection handling don’t 100% fit with Flask’s concept of request & application contexts, and Flask-SQLAlchemy helps abstract those details without the API developer needing to handle session logic & it accomplishes this, as expected, by overriding __getattr__
with a function that returns the attribute of it’s underlying connection ( ref ).
- By
- John Sonnenschein
Last Updated 2024.07.08