Description
You currently can't derive a subclass from a built in type (eg list). There are lots of difficulties with implementing this. The main thing is to decide how to represent an instance of a subclassed built in.
Consider:
class mytuple(tuple):
pass
t = mytuple((1, 2, 3))
t.myelem = 1
This creates a new type called 'mytuple', deriving from the built-in 'tuple' class. It creates an instance of this type which has the elements 1,2,3 in the tuple, and also the additional member 'myelem'.
How to represent in memory such an instance? We must in some way have the memory layout of the instance include the built-in tuple instance, as well as a dictionary for the 'myelem' attribute. Two choices come to mind:
- Create an object which has a dictionary, and a pointer to the built-in tuple instance, call it T. Then, when a tuple method M is called on this instance, the tuple pointer T must be extracted and passed to the built-in M. M expects its first argument to be a tuple instance, and that's what we need to give it. The problem with this is that T is not the whole object, it's only a subpart of it. M has no way of knowing that this T came from a subclassed tuple, and in situations where M needs the actual object (eg to return it), it cannot do the correct thing. One solution is to pass to M the original object, as well as T, but this means that every single C method must take an extra argument. Another solution is that at the start of every C method, there is a call to some helper function which extracts the correct pointer for that built-in instance. But that means a lot of overhead in time and ROM.
- Create a composite instance which contains a dictionary and the actual built-in tuple instance. Ie in RAM they will be one after the other. Then you have 2 choices for the layout:
2a. Put the dictionary first, then the built-in tuple. This is easy because the dictionary takes a fixed amount of RAM and you can calculate easily the offsets of the dictionary and the built-in tuple within this composite instance. But it has the same problems as 1 above: when you pass the object to a built-in method M, you need to pass a pointer to the tuple offset within the full instance, and then M again does not know what the original object is. The 2 solutions given in point 1 also work here: either pass 2 pointers (the original instance and the sub instance), or pass a pointer to the original instance, and force the function to extract the sub instance itself (using a helper function that can be the same for everyone).
2b. Put the built-in tuple first, then the dictionary. This makes it tricky to compute the offset of the dictionary. Each built-in type would need to provide a function to tell how many bytes are taken up with that built-in instance. Note that it needs to be a function, not a fixed number, because types like tuple are variable in length. The good thing about this approach is that the pointer to the built-in instance is the same as the pointer to the full instance, so only 1 pointer needs to be passed to the function, and the function does not need to extract any sub instance (although, the type of the object (the first entry in the instance) will not be the built-in type).
2b is how CPython does it. I would say implement either 1 or 2b. Here's the basic overview of changes needed to existing built in objects:
- Each built-in method must, at the start, call a function to extract the correct pointer to the type that function expects. Something like:
mp_tuple_t *self = mp_get_type(self_in, &mp_type_tuple)
. This could be written as a macro which checks first for the quick case that self_in is exactly the requested type, and if this fails, it calls a more sophisticated function. Note that this function can throw an exception if it fails, so there is no need to check the return value. Also note that most functions currently have 2 lines at their beginning: one to assert the type, another to cast self_in to the correct type, so it would actually make existing code more succinct (but larger in ROM compared to when assertions are turned off).
2b. Two changes: first, the make_new slot in the type must now take an extra argument, being the number of extra bytes to allocate at the end of the instance. These will be used to hold the dictionary for the subclass (and perhaps other things). Second, each type must provide another function, to compute the offset to the start of the extra bytes at the end of a given instance.
I would call solution 1 the elegant but potentially inefficient solution, and solution 2b the efficient but messy solution.
Solution 1 allows to subclass multiple built in types, whilst solution 2b does not. (Therefore, CPython will not be able to support multiple built in bases without a huge change of the code base.)
Note further that solution 2b also requires additional code and complexity to type check parameters. Consider:
t = mytuple((1,2,3))
t.count(1)
tuple.count(t, 1)
tuple.count(None, 1)
The second line will succeed and does not need a check on the type of t. The third line will succeed, but the call must first check that t is actually a tuple instance. The fourth line will fail, because None is not derived from tuple, and this check must be done by someone. The check for the first argument being the correct type can be handled by one master function that wraps bound method calls. Soultion 2b requires this check, solution 1 already has this check at the start of every function.
Comments?