Category Archives: Python

The joy of legacy Zope/ZODB systems

I have a large number of legacy systems that – when all other avenues fail – become my responsibility to sort out. Some of those are very old Zope systems written by others and which never fail to reduce me to tears. This morning I came across some particularly good design decisions which I thought I’d share. Yes, that ‘good’ is sarcasm.

First of all let’s remember that Zope, by default, uses the ZODB. In my past lives when I used to use Zope I used to use it as a frontend to Postgres (which sounds nuts now but at the time we didn’t have lots of fancy MVC frameworks to spend our hours with). The legacy systems I have currently been gifted with however use the ZODB to save their data. You could argue that for content management systems (which is what we’re talking about here) that the ZODB is not a bad fit. I’m not going to argue one way or the other, although my own personal point of view is that it’s shit) – however, for the purposes of today’s “issue” the data was real data – i.e. information that would sit quite comfortably in a relational database as god intended.

We’re talking about parent data with child records. This particular issue was that the customer had made a mistake and I needed to remove a bunch of child records from one particular parent record and this had to be done by the sixth of March or the world would end or something similar. Fair enough, so I poke around the code and the database.

Poking around a ZODB database isn’t like poking around a relational database. You can’t look at a database schema and say “these records in this table have these fields”. All you can say is that this record in this big bucket of crap has these fields. And the record next to it could be the same type of record with different fields. Or a completely different type of record. So you poke around and try to work out what is held where. If you fancy a laugh you take a look at the code that writes the records and try and make some sense of it but frankly you’re better off pulling the nerve endings away from your fingers one by one, with a rusty pair of pliers.

So eventually I find the child records and I find a method which is called something like ‘deleteChildRecord’. It turns out this doesn’t just delete child records – it deletes parent records as well because in this brave ZODB world they’re all sitting in the bucket of slop together. Which is OK in the scheme of things because by this time its only 05:30 in the morning and we’re beyond caring. By the way – there is a doc string on deleteChildRecord but it doesn’t seem to make much sense … at all. Then I realise the same doc string is used to document virtually every method in the file. Somebody copied and pasted the same doc string twenty times and never thought it useful to change it to something else. But that’s OK. We’re used to that.

So I write some code that works out which child records to delete and run it and get the client to check. They respond saying that when they look at the parent record they can still see ‘stubs’ of the deleted child records. The codes of the deleted records are still showing against the parent but no details of the child records. This doesn’t surprise me that much because in my ZODB world it’s very hard to get rid of data. Like the brown stains around the porcelain after a particularly heavy dump, there’s always some lingering remnant that remains in the crusty crevices of the database even after repeated flushings. Often it’s because the indexing system that Zope and it’s content management framework (CMF) uses hasn’t removed it’s indexes of metadata even when the real record has been removed. Because you should never query the database directly (unless you fancy wallowing through 50GB of binary data record by record) code will just query the indexes (known as the catalog) which contains the main info you need and will then pull out the records for each catalog item it finds.

Am I boring you? By the way, it just occurred to me that the best way to ensure that all traces of a particular record are removed from Zope are to tell it that’s it’s vitally important that it should be kept. I can guarantee you’ll never see it again. Anyway – back to the story. Oh god. So I think … the catalog has not updated itself. I will rebuild the catalog. This I do. This is a simple thing to do – you just click on a button labelled ‘Rebuild catalog’.

I now have no data.

You see, whatever genius designed this part of the system decided to store the data in the place where you’re supposed to just store the indexes and metadata of a record. Just as you thought you were winning and the baddie had been despatched to the lower reaches of hell you see a shape in the window and it turns out you’re still thirty minutes from the final credits and you’re not halfway through the body count yet.

Fortunately, like the seasoned campaigner I am, I am not doing this on the live system – having transferred the 50GB bag off poo to a staging site before commencing this exercise. Hell, I don’t log onto a bloody legacy Zope site of ours without taking a backup, dumping it to tape and moving it 100 miles offsite.

Right start again and lets start looking at the code.

I can see code that creates parent records. I can see code that creates child records (all got the same doc strings). I can actually see quite a lot of code that creates child records. And I can see code that deletes child records (with the same doc string). Unfortunately that code doesn’t tell a parent record that the child record has been removed so if you do actually use that routine you’ll find the system fails spectacularly the next time you try and view anything. OK – so … the parent record stores the list of related children as a tuple of id’s. So we write some code that takes the tuple (which is immutable) to a list, and then modifies the list every time we remove a child record, converts it back to a tuple at the end and then writes it back to the parent record when we finish.

But of course that still doesn’t work because things such as ‘child count’ are not calculated automatically – they’re stored as properties as well on the parent record. So we manually count the records and update the child count property but then find that even though we have updated that – we have two other counts which are held as two other fucking properties, one for the two possible type of child records we can have and the parent record is wetting itself in the only way it knows how by vomiting python tracebacks over the screen.

So in short when you delete a child record you have to manually a) tell the parent record the child record has gone b) tell the parent record to decrement a specific count for the type of child record you have deleted and c) tell the parent record to manually decrement a specific count for the total of child records you now have.

This I do – give the results to the client who says it all looks fine apart from one record which they don’t recognise the code for and has a completely different child type assigned to it from anything else and would I know why that was?

No, I don’t. I really don’t. I don’t understand this data, I don’t understand the structure (partly because I don’t think there is one). Frankly I hardly know where the floor is at this point and even such concepts as light and dark have gone hazy.

Python – comparing floats and decimals

No matter how old I get, I keep being bitten by the joys of having some data as floats and some as decimals.

ipdb> value
Decimal('1.473')
ipdb> from_value
1.473
ipdb> value < from_value
True

because …

ipdb> from decimal import *
ipdb> Decimal(from_value)
Decimal('1.4730000000000000870414851306122727692127227783203125')

So work out what accuracy you need and do something like

from_value = Decimal(from_value).quantize(Decimal('0.0001'))

Django/Postgresql – inserting new record doesn’t return id.

You have an old database and you find the following happens.

>>> ap = ValidAirPressureCode(code='9', description='9', barcode='9', comments='
9'
)
>>> ap.save()
>>> ap.id
>>> ap = ValidAirPressureCode.objects.get(code='9')
>>> ap.id
11

In short Django doesn’t return the id of a newly inserted record. If you insert records in admin and choose to continue editing, you’ll get an error saying it can’t find a record with a null id. The problem (in my case) is that Django uses a postgres function called “pg_get_serial_sequence” to identify where the last inserted number given to a table’s record is stored but if your tables were not created using the serial type, this will return null. In my case it was because the database was nine years old and serial types did not exist then.

A proposed solution can be seen on this ticket but at the time of writing it is not implemented.

To fix it in my case I took a look at what my sequences were called. For example:

id | integer| \
    not null default \
      nextval(('t_valid_airpressure_pk_seq'::text)::regclass)

. I then wrote the following piece of middleware (just inserted a call to it in my settings.py middleware section).

class FixLastInsertId(object):

    def __init__(self):
        """ """
        def my_last_insert_id(self, cursor, table_name, pk_name):
            """ This code fails on the alc database, so we fall back to a method of
            getting the last id that does work"""

            sql = "SELECT CURRVAL(pg_get_serial_sequence('%s','%s'))" % (self.quote_name(table_name), pk_name)
            cursor.execute(sql)
            result = cursor.fetchone()[0]
            if not result:
                sql = "select currval('%s_pk_seq'::regclass)" % table_name
                cursor.execute(sql)
                result = cursor.fetchone()[0]
            return result

        from django.db import connection
        connection.ops.__class__.last_insert_id = my_last_insert_id

        # Tell Django to forget about this middleware now, we have
        # had our evil way.
        from django.core.exceptions import MiddlewareNotUsed
        raise MiddlewareNotUsed

Basically I monkey patch the last_insert_id method. I call the normal code and if that fails try with my own way. All my sequences have the same naming convention (tablename_pk_seq) so it works for me.

Thanks to Ross for pointing me in the right direction, although he didn’t approve of my monkey patching solution 😉

Converting Legacy Databases to Django 1.3 – Day 0.5

I have done this quite a few times in the past, but not recently. I was given three days to convert an existing postgres database (front ended with Zope) to Django 1.3. I thought it would be useful to document what I did here for my own future reference and to record any gotchas for posterity. The database I am converting is nine years old, so plenty of cruft through the years, although the basic structure is sound.

Note that the website of this application is not particularly complicated. Most of the core work of this application is done via backend processes written with Twisted. The front end is used for displaying the system status and allowing for data to be modified, with a few additional complications. The system will work without any web front end (although this is hardly ideal) without stopping production, so the risk is minimised.

This is the first day – although I only started from the beginning of the afternoon, so it’s the first half a day.
Continue reading

Python development with virtualenvwrapper, django, buildout and mercurial – a basic example

I have been trying to streamline my python development processes over the years. Earlier this year @rossjones introduced me to virtualenvwrapper which made things even simpler. I’m about to go into quite a concentrated period of development so I thought I’d take the opportunity to document what I do to make an easily reproduceable python environment. Please let me know of any errors or suggestions for better ways of doing this as I go along.

Note, I have just noticed this is not working on OS X Lion. Looking now at why. 20110816 17:28 BST

Continue reading