Converting Legacy Databases to Django 1.3 – Day 0.5

I have done this quite a few times in the past, but not recently. I was given three days to convert an existing postgres database (front ended with Zope) to Django 1.3. I thought it would be useful to document what I did here for my own future reference and to record any gotchas for posterity. The database I am converting is nine years old, so plenty of cruft through the years, although the basic structure is sound.

Note that the website of this application is not particularly complicated. Most of the core work of this application is done via backend processes written with Twisted. The front end is used for displaying the system status and allowing for data to be modified, with a few additional complications. The system will work without any web front end (although this is hardly ideal) without stopping production, so the risk is minimised.

This is the first day – although I only started from the beginning of the afternoon, so it’s the first half a day.

Day 0.5

Having created the initial Django project (simple using my previously documented buildout) the next stage was to get the models from the existing database structure. The first step was to run the inspectdb command against my existing database. My app is called alc, so the command was a simple (from the root of the buildout project).

bin/django inspectdb > project/alc/models.py 

This took a little while (five or ten minutes only though), mostly because my database has some very complicated postgres views which take an age to calculate. Django does its best to guess what is what but you’re still going to end up having to tie up a lot of loose ends. The simplest way I find of doing this is to open up your models file side by side with a dump of the existing database structure (“pg_dump -s db_name > db_schema.sql” will give you the database structure only in postgres) and get cracking. First let’s just take a look at the existing Django project structure.

(ttas-django)ian@vm:~/projects/ttas-django$ tree
.
├── bin
│   └── django
├── buildout.cfg
├── develop-eggs
├── parts
└── project
    ├── alc
    │   ├── __init__.py
    │   ├── models.py
    │   ├── tests.py
    │   └── views.py
    ├── development.py
    ├── __init__.py
    ├── local_settings.py
    ├── media
    ├── production.py
    ├── settings.py
    ├── templates
    └── urls.py

Because I’m running my code in an Ubuntu VM, but postgres is on Windows I have a custom_settings.py file, which is called from my settings.py file. This is not under version control but within it I can specify my custom connection string for when I am developing locally. Everything else above is provided by the buildout.

Right – let’s get things going. Firstly I am going to do some tidying up of the models name.

For a starter take a look at this old table

class TValidSparetyre(models.Model):
    id = models.IntegerField(primary_key=True)
    code = models.CharField(max_length=1)
    description = models.CharField(max_length=50)
    barcode = models.TextField() # This field type is a guess.
    comments = models.TextField()
    created_at = models.DateTimeField()
    created_by = models.CharField(max_length=50)
    modified_at = models.DateTimeField()
    modified_by = models.CharField(max_length=50)
    valid_tyrebrandid = models.IntegerField()
    partno = models.CharField(max_length=11)
    sizeid = models.IntegerField()
    space_saver = models.IntegerField()
    class Meta:
        db_table = u't_valid_sparetyre'

You can see how Django has decided the class name depending on the actual table name. I’ll change that to something more legible, such as ValidSpareTyre. The barcode field is marked as being a guess. If I look at the postgres structure it tells me that this field is

barcode CHARACTER(1) NOT NULL,

This will translate to the Django model field:

barcode = models.CharField(max_length=1)

This table has the id correctly marked as the primary key. I found some tables in the database which had id columns not marked as the primary key on the postgres side (although that was their purpose). I needed to manually set the “primary_key=True” option for the id field in those cases.

There are a lot of views in this database which I do not need to have under django control. Django does try to make models out of them but gets confused. e.g. for the view with the following field definition

alc=# \d v_edi_assm3
              View "public.v_edi_assm3"
       Column        |       Type        | Modifiers
---------------------+-------------------+-----------
 datepoint           | date              |
 location            | character(2)      |
 suppliercode        | character varying |
 supplierdescription | character varying |
 mailbox             | text              |
 partno              | text              |
 items               | bigint            |

We end up with a model saying

class VEdiAssm3(models.Model):
    datepoint = models.DateField()
    location = models.TextField() # This field type is a guess.
    suppliercode = models.CharField(max_length=-1)
    supplierdescription = models.CharField(max_length=-1)
    mailbox = models.TextField()
    partno = models.TextField()
    items = models.BigIntegerField()
    class Meta:
        db_table = u'v_edi_assm3'
        app_label = 'alc'

If you wanted to keep the model definition you’d need to fix those max_lengths yourself for the suppliercode and supplierdescription. I will need some of these views through the Django interface, but not all of them but at the moment I am not sure which are which. Having got this far I am also aware of some tables which I am 95% sure are no longer needed. Really I just want to focus on the core ‘real’ tables first and then worry later about the remaining items.

At this stage we have all our Django models sitting within one models file so I am going to split them up into three types. As I know the application I am aware I have my core data, I have a bunch of supporting tables with part information (which is handled through a maintenance section) and I have everything else. So I am going to split my models.py file into three (models/main.py, models/maintenance.py and models/legacy.py). The legacy module will not be used – it’s just a holding place where I can temporary leave things in.

To move from one models.py file to a models module I need to add the app label for each django model in the Meta class. e.g:

    class Meta:
        db_table = u't_valid_sparetyre'
        app_label = 'alc'

A quick emacs macro sorts this out and I get dividing leaving myself with an empty models.py file which I delete. Although I am keeping the legacy code I don’t want Django to notice it so in my models/__init__.py file I have the following (note no mention of legacy):

from main import *
from maintenance import *

I am left with the following structure for the alc application directory.

(ttas-django)ian@vm:~/projects/ttas-django/project/alc$ tree .
.
├── __init__.py
├── __init__.pyc
├── models
│   ├── legacy.py
│   ├── main.py
│   └── maintenance.py
├── models.pyc
├── tests.py
└── views.py

Let’s see what Django is immediately complaining about by running the syncdb command. I got some errors regarding missing primary_key keywords for some tables’ id fields – but having fixed those we got to.

(ttas-django)ian@vm:~/projects/ttas-django$ bin/django syncdb
Creating tables ...
Installing custom SQL ...
Installing indexes ...
No fixtures found.
(ttas-django)ian@vm:~/projects/ttas-django$ 

Now is not the time to celebrate because our database is still pretty dumb. I know we have a lot of relationships between the tables but they’re in my head – not in the python code. Note, I am not sure if I had foreign key references in the database, whether Django would pick them up automatically, maybe somebody could let me know. Irrelevant here as the database structure does not contain that information.. So what we need to do is some more work on our models.

Take the following three models (I’ve only reproduced the specific fields we are interested in here)

class TMUKSignal(models.Model):
    id = models.IntegerField(primary_key=True)
    requestid = models.IntegerField()
       ...
    class Meta:
        db_table = u't_signal'
        app_label = 'alc'

class TMUKRequest(models.Model):
    id = models.IntegerField(primary_key=True)
    signalid = models.IntegerField()
    sisid = models.IntegerField()    
       ...
    class Meta:
        db_table = u't_request'
        app_label = 'alc'

class WISheet(models.Model):
    id = models.IntegerField(primary_key=True)
       ...
    class Meta:
        db_table = u't_sis'
        app_label = 'alc'

We have a signal coming in. The signal is later processed creating a request (so the signal can have one or no requests). The request is then assigned to a WISheet. A WISheet can have from 0 to 6 requests assigned to them. I need to modify the models here to make that happen. By the way, I don’t know why the request has a reference to the signal and the signal has a reference to the request. I might know by the end of this exercise but it looks plain daft to me.

After some modifications I am left with

class TMUKSignal(models.Model):
    id = models.IntegerField(primary_key=True)
    request = models.OneToOneField('TMUKRequest', db_column='requestid', related_name='signal')
       ...
    class Meta:
        db_table = u't_signal'
        app_label = 'alc'

class TMUKRequest(models.Model):
    id = models.IntegerField(primary_key=True)
#    signalid = models.IntegerField()
    wisheet = models.ForeignKey('WISheet', db_column='sisid', related_name='request_set')
       ...
    class Meta:
        db_table = u't_request'
        app_label = 'alc'

class WISheet(models.Model):
    id = models.IntegerField(primary_key=True)
       ...
    class Meta:
        db_table = u't_sis'
        app_label = 'alc'

I have set the requestid field on TMUKSignal to be a OneToOne field called request. Because it’s a one to one field I don’t need the signalid field on the request. I’ve also made it a little more legible going back the other way by setting ‘related_name’. By default Django would expect the field to be called request_id, so I’ve set the db_column to be the real field name of ‘requestid’. I’ve also made a similar change to the TMUKRequest model and the sisid field (now called wisheet). Now I can do the following through the django shell.

>>> request = TMUKRequest.objects.all()[0]
>>> request.signal
<TMUKSignal: TMUKSignal object>
>>> request.signal.request
<TMUKRequest: TMUKRequest object>
>>> request.signal.request.wisheet
<WISheet: WISheet object>
>>> request.signal.request.wisheet.request_set.all()
[<TMUKRequest: TMUKRequest object>, <TMUKRequest: TMUKRequest object>, <TMUKRequest: TMUKRequest object>, <TMUKRequest: TMUKRequest object>, <TMUKRequest:TMUKRequest object>, <TMUKRequest: TMUKRequest object>]

Having got this far, I spent the rest of the time tidying up the maintenance tables and trying to remember what they all did. Quite a few of them turned out never to be used. Others are still a little puzzling so tomorrow I shall focus on getting everything required available through admin. As I do that and compare with the existing screens I will make iterative changes to the models as their purpose and relationships become clearer.

Leave a Reply

Your email address will not be published. Required fields are marked *