Changing the type of a column in Postgres
Posted by dlandgren on 2008-11-07
Someone handed me a large spreadsheet at $work and said “We need to turn this into a web site”. So I wrote some code to transfer it to a database. A number of columns were numeric, so I assigned numeric datatypes to those columns.
Afterwards, when I imported the information, Postgresql threw up its hands in horror, complaining about my attempts to stuff non-numeric data into a field designated as numeric. I went back and looked at the spreadsheet more closely, and sure enough, buried away many lines below in the fine print, a cell contained “N/A”. I decided that I would just coerce such a value to zero, and was done with it.
It turned out later on, during a review, that no, “N/A” is really what needs to be displayed. I couldn’t add code to say ‘if the field read from the database is zero, display “N/A”‘ because there were already other rows that legitimately contained zero.
What we have in fact is data that is nearly almost always numeric, but not quite. So the solution is to change the column data type. At first I thought I’d be able to just
alter table t1 alter column ca_scientif type varchar(5);
… but Postgres doesn’t like that (surprisingly it doesn’t even produce an error message, it’s just that nothing changes). Hmm. After searching around on the web for a bit, I found half a solution, which I was able to fill out into a complete solution. The idea is to add a new temporary column with the right datatype to the table. Then copy the contents of the old column name over to the new name. Then drop the old column.
Now that the old column name no longer exists, we can create it again, this time with the right datatype. And then we can copy of the contents of the temporary column over to the new version of the old column, now with the right datatype.
Finally, we can then drop the temporary column since we don’t need it any more. The exact sequence of DDL statements looks like (for each column that needed munging)
alter table t1 add column new_ca_scientif varchar(5);
update t1 set new_ca_scientif = cast(ca_scientif as varchar(5));
alter table t1 drop column ca_scientif;
alter table t1 add column ca_scientif varchar(5);
update t1 set ca_scientif = new_ca_scientif;
alter table t1 drop column new_ca_scientif;
After all that’s been done for as many columns as necessary, the tablespace will be littered with dead tuples, so it’s a good idea to tidy up afterwards:
vacuum full t1;
And the deed is done.
Update: I discovered a minor flaw in this plan: if the original column had a comment attached to it, that gets thrown away. Hmmm. I needed those comments to help me track Excel columns to Postgresql names.
Fixing this turned out to be more difficult than I thought. Fortunately a question posted on Stackoverflow received a good answer. You have to grovel through the system catalogs to pull out the information. So I wrote a Perl program to do that for me. It’s always more fun to write a program that writes SQL rather than writing the SQL directly. As a result I was able to fully automate the process.
I’ve made the program available on my website. Keeping the source on my own server allows me to manage the issues of revisions easily.