2003-02-09

Converting Gregor’s Word of the Week to Movable Type

Last night I managed to get my Gregor's Word of the Week converted to a weblog.

Gregor's Word of the Week used to be a custom-built XML application with one big document containing the word ideas, future entries and published entries. I built XSLT stylesheets to produce RDF summaries and one HTML document per published entry (with links between them).

I was able to write a new XSLT stylesheet to convert the entries to Movable Type's import format and get them loaded (after a couple false starts).

It turns out that Movable Type (2.51) does not have a 'test import' mechanism, so you cannot validate the import file before throwing it at the database. If you have errors, you'll probably still end up with some content loaded. Then you have a problem: how to proceed? In my case, I knew I was likely to have problems the first time through, so I made sure I was working in a fresh blog I hadn't spent much time configuring (so I could just kill it if things got ugly). Sure enough, I ran into a problem with the date format.

When you make entries, the format is "2003-02-09 10:28:00" — 24 hour format, the importer wants "02/09/2003 10:28:00 {AM,PM,} — I assume it means 24 hour format without AM or PM, when you are in the "Edit Entries" screen, the list uses "2003.02.09", and a format with month names is used in the dates displayed in entrys — with the default templates, anyway.

The simplest way to handle this was not to parse my dates, already in the "2003-02-09" format in my source XML (I was tacking on a time of "12:00:00" to each of them because Movable Type requires times to be present). Instead, I simply piped the output of xsltproc mt.xsl wow.xml into a sed-like Perl one-liner perl -p -e 's{DATE: (....)-(..)-(..)}{DATE: $2/$3/$1}' (although I had to double those dollar signs because it was inside a make rule).

This allowed the entries to load, but later I discovered I hadn't gotten the stylesheet right. There are main and extended entry bodies in Movable Type, and I wanted to use the main one for the <intro> element in my source XML (where I put things like hints or other introductory material) and the extended entry for the <notes> (where I put the definitition and other notes). I had omitted a separator, which caused my main entries to be empty.

Now, I needed to be able to delete the entries so I could fix the import file and try again. I did not find a bulk erase function in Movable Type (there is an advanced editing function that is close, but I still would have had to select each entry from a big list). Instead, I took a gamble.

When I set up Movable Type, I opted to use MySQL for its database backend (it also allows you to use Berkeley DB). So, I logged into MySQL and did a little digging. It looked to me like the entries were stored in the mt_entry table. So, I found out the blog_id for my weblog and did a SQL DELETE FROM mt_entry WHERE entry_blog_id = ? and crossed my fingers. Nothing crashed when I went back into Movable Type, so I moved on.

My entries imported fine now. The next thing I needed to do was to get my email notifications working — I send out an email each time I release a new Word of the Week, and I wanted to have all the functionality within Movable Type.

It turns out that Movable Type has a Notifications feature, so I tinkered around with it for a little while. It took me a while to realize that I had to trigger the notifications manually when I was in the Edit Entry screen. My first scan of the documentation left me with the impression it would be automatic, and I was frustrated that it wasn't happening (I even spent some time rooting around my postfix configuation to make sure emails weren't being eaten by my server... oh, well).

One thing I noticed, though, was that my splitting the entry body into two pieces (main and extended, as supported by Movable Type) was causing problems with the emails. I wanted them to contain the complete entry, but only the main entry text was being included (which in my case is sometimes emtpy — I use it for introductory remarks such as hints).

I discovered the subroutine responsible for sending notifications in the lib/MT/App/CMS.pm file, and changed two lines to make it concatenate both halves of the entry when it builds the email text (see the patch below). With that in place, the HTML markup for the entry appears in the email. I'm hoping that works out OK for my subscribers (that is, that their email clients recognized and render the HTML, or that they don't mind seeing the markup).

The Notifications section of the Movable Type documentation gives an example HTML form you can add to your templates to allow people to enter their email addresses and click a button to be added to your Notifications list. I incorporated that form (with a few changes) into my main index template and tested it; it worked right away.

I didn't see any function for bulk-importing notifications, which would be very handy (I store my subscriber list in an XML file that I transform to a text file which drives my email sending process in the old system). So, I experimented with the POST program on my server to construct a request (based on what I learned from the implementation of the Subscribe Form) that would push an email address at the CGI script that lands a new address in the database.

It looked like this:

You have to enter data like this: blog_id=1&email=the.email@the.domain&_redirect=http://www.your-server.com/blog/ — replacing the value for blog_id with your weblog's blog_id in the database and the value for _redirect with your weblog's main URL. Then, hit control-D to end the input and the HTTP POST request is sent.

Once I verified that this worked, I wrote a WWW::Mechanize-based script to automate the entire task (see below). It went without a hitch, and my whole subscriber list was imported.

At this point, all I had to do was make sure I converted the future entries to 'draft' status instead of 'publish' and send out the notification for the newest entry. Once again, no problems.

The fact that I have the source code to the system, and that it uses Perl and MySQL makes it possible for me to solve my own problems, when they solution is not part of the system already. Openness rocks!

Unified Diff for lib/MT/App/CMS.pm

Apply this diff to get email notifications to combine the main and extended body of an entry.

--- CMS.pm.orig 2002-10-30 15:36:37.000000000 -0500
+++ CMS.pm      2003-02-09 01:31:07.000000000 -0500
@@ -2471,7 +2471,7 @@
     $Text::Wrap::columns = $cols;
     if ($q->param('send_excerpt')) {
         my $excerpt = $entry->excerpt ||
-          MT::Util::first_n_words($entry->text, $blog->words_in_excerpt || 20);
+          MT::Util::first_n_words($entry->text || $entry->text_more, $blog->words_in_excerpt || 20);
         local $Text::Wrap::columns = $cols - 4;
         $body .= Text::Wrap::wrap("    ", "    ", $excerpt) . "\n\n";
         $body .= ('-' x $cols) . "\n\n";
@@ -2483,7 +2483,7 @@
     $body .= Text::Wrap::wrap('', '', $q->param('message')) . "\n\n";
     if ($q->param('send_body')) {
         $body .= ('-' x $cols) . "\n\n";
-        $body .= Text::Wrap::wrap('', '', $entry->text) . "\n";
+        $body .= Text::Wrap::wrap('', '', $entry->text . "\n" . $entry->text_more) . "\n";
     }
     my $subj = $blog->name . ' Update: ' . $entry->title;
     $subj =~ s![\x80-\xFF]!!g;

notify_import.pl

This program uses WWW::Mechanize to import a list of email addresses into a Movable Type weblog via the mt-add-notify.cgi script. You could write a similar script using pure LWP, but I've gotten to like WWW::Mechanize, so I used it here. This script assumes the first form on the main page is the subscription form.

The email addresses should be one per line, and they can have the extended form of "Full Name <email_addr@domain.name>".

#!/usr/bin/perl -w
#
# notify_import.pl
#
#   perl notify_import.pl http://www.mysite.com/blog/ < names.txt
#
# Assumes you have the mt-add-notify.cgi form as the first form on the
# page.
#
# Copyright (C) 2003 Gregor N. Purdy. All rights reserved.
# This program is free software. It is subject to the same license as Perl.
#

use WWW::Mechanize;

my $agent = WWW::Mechanize->new();

die "$0: usage: $0 http://www.mysite.com/blog/ < names.txt\n" unless $ARGV == 1;

my $url = shift @ARGV;

$agent->get($url);

while (<>) {
  chomp;
  next if m/^\s*$/;
  $agent->form(1);
  $agent->field('email', $_);
  $agent->click();
  $agent->back();
}

exit 0;

No comments: