Python is a beautiful, easy-to-read language. It’s also (usually) easy to write, most of the time. For the most part, it’s simple to make sure that your classes work well with the language by implementing several “magic methods”, but this gets tedious to do for a few of them.
A nice annotation that Kotlin comes with is the
data annotation, which, when used on a class, will implement the
toString() methods for it, based on the properties passed in through the constructor. I wanted to do something similar to this in Python, since implementing
__repr__ can be tedious. Also, including such method definitions make your class look less appealing (those double-unders are a bit unsightly).
Let’s look at how this can be done.
My first thought was to use descriptors, since I was reading about them (and finally getting to the point where I really understand them) when the idea came to me. It didn’t take too long for me to give up on that idea. Why would I need to make non-data descriptors when I could give a simple function?
Decorators and Monkey-Patching
I had wanted to use a decorator from the beginning, but its implementation changed fairly quickly once I decided against descriptors.
The decorator will need to know two things: 1) what class to modify and 2) which fields to base all of the method calculations on. Since, the class will be provided by the basic decorator call, I had to decide how I wanted to figure out what fields to use in the class. I briefly considered doing a search through the
__dict__ or something else like that, but quickly dismissed it; there were too many chances to end up missing a field or including a field that the user didn’t want included. So, I decided to ask for it. The structure of the decorator looks like this:
def data(*field_names): def data_class(cls): # do stuff to the class return cls return data_class
data takes in a varargs of (supposedly) strings that are names of fields in the class (properties and other descriptors will work for this too). It then defines the actual decorator function and returns it to be used on the class.
# do stuff… area can be filled with simple assignments to
__repr__ methods, such as
cls.__str__ = to_string(*field_names).
Now we need to define the functions that will provide the definitions of our methods. Let’s start with
__eq__, shall we?
First, we need to tell the produced
__eq__ method what fields are to be used, and since we can’t put any new fields into the function signature (
__eq__, which has a specific definition to follow to be used “magically” in Python, we need to provide the fields via a closure or class definition. Being a fairly simple definition, I decided to go with a closure. Here’s the start of it:
def equals(*field_names): def __eq__(self, other): # comparison logic goes here return __eq__
Now we need to implement the comparison logic. To do that, we’ll loop over the
field_names, mapping them to the actual values of
other, then comparing those values:
return all(getattr(self, field_name) == getattr(other, field_name) for field_name in field_names)
This works pretty well, but will raise an
other doesn’t have the field. If self doesn’t have it, this is big problem and should raise an error, but, since
other isn’t necessarily expected to be the same type of object, we should just return
False if this happens.
Since inlining this check will be ugly, we’ll move the mapping functionality to a function:
return all(_fields_are_equal(self, other, field_name) for field_name in field_names)
and define the function thusly:
def _fields_are_equal(self, other, field_name): self_value = getattr(self, field_name) try: other_value = getattr(other, field_name) except AttributeError: return False return self_value == other_value
Now we have our definition for the equality checker. Add the following line to the class decorator:
cls.__eq__ = equals(*field_names)
An interesting side effect is that this definition will allow the class to be equal to a
namedtuple with the same list of field names (assuming the values of those fields are the same). Personally, I’m glad of this, since the basic idea behind this is supply this functionality to simple data-based classes, which is mostly what a
namedtuple is. For a little while, I considered doing a type check, but decided against that. It’s not particularly pythonic, and I actually like being the same as a
namedtuple. You can change this, obviously.
Again, our hashing function is going to need the field names, so it’ll be structured like this:
def hash_code(*field_names): def __hash__(self): # hashing code here return __hash__
Since our class will compare as equal to
namedtuples with the same data, we should also have their hash code match, in order to fit with the agreement between
__hash__ (things that compare equal should have the same hash value). So, let’s simply make a tuple of the fields and run the hash function of that:
return hash(tuple(getattr(self, field_name) for field_name in field_names)
This gives us the same hash code as what a
namedtuple with the same field names would give us. Now, don’t forget to add this line to the decorator function:
cls.__hash__ = hash_code(*field_names)
I grouped these two together because, usually, they’re the same. With the
data class decorator, they’re always the same. So, let’s start with the same basic building block of a function:
def to_string(*field_names): def __str__(self): # da code return __str__
The format I’m going for here is a pretty typical format of
ClassName(field1=value1, field2=value2). The first thing we need is the name of the class. That’s easy, we just get it off of
class_name = type(self).__name__
We’re going to need an opening paren right after, so let’s combine that right away. Replace the previous line with:
start = type(self).__name__ + ')'
Next we’ll have to go through each field, getting the field name, then an
=, and then the value in that field. Let’s define a quick helper function to get that bit:
def _field_printout(self, field_name): return field_name + '=' + str(getattr(self, field_name))
Each of those fields needs to be separated by a comma and a space, so we’ll do a join:
middle = ', '.join(_field_printout(self, field_name) for field_name in field_names)
Lastly, we need to close it all with a closing paren:
return start + middle + ')'
Put it together and add the following line in the class decorator:
cls.__str__ = cls.__repr__ = to_string(*field_names)
Here’s an example of its use:
@data('x', 'y') class Point2D(): def __init__(self, x, y): self.x = x self.y = y
That’s all you need. This very simple case would likely have been better off as a
namedtuple, but if you wanted to add more methods to it, it’s easier to do so with the class than with the tuple.
Pick and Choose
If you don’t like the implementation of some of these functions, you can choose to leave out some to define your own. When you do this, you no longer use the class decorator; instead you can write your own that only sets the ones you want, or you set them manually. For example, if the
Point2D class only wanted the
__eq__ methods, it could be defined like this:
class Point2D(): def __init__(self, x, y): self.x = x self.y = y _field_names = ('x', 'y') __hash__ = hash_code(*_field_names) __eq__ = equals(*_field_names)
The creation of the
_field_names isn’t necessary, but it helps you to only need to write down the names once.
You are free to use this code all you want, or you can go !!!HERE!!!! to download the file that contains all these definitions, including the documentation.