Just One Question for Matt Turner
Matt Turner is the Applications Development Group Manager at PC World Online and has been developing content management systems and just about every other kind of web application imaginable for the past two years. Before that, he developed database applications for poor unconnected PCs. Matt is now convinced that relational databases are dinosaurs compared to XML datastores, but is willing to let others find this out on their own.
Matt was kind enough to take time out of his busy schedule to answer just one question…
Sippey: How is PC World Online using XML in production?
Turner: Last November I completed the conversion of PC World Online’s Daily News and Shareware sections to databased production systems. These systems allowed editors to directly enter copy, and featured server-side programs which wrote out the actual pages using HTML templates. The huge savings in time that used to be spent updating index pages and pouring text into templates by hand did not go unnoticed; the plan quickly became to bring this type of system to the rest of PC World.
The rest of PC World consists of articles from the print issue, as well as articles that are exclusive to the online edition. Unlike the Daily News or Shareware sections, which were articles of similar structure, these additional articles could be of any length, cover any topic and have a potentially limitless amount of meta-data – depending on the type of story. For instance, our reviews feature “capsules” of information about products (such as manufacturer, product name, estimated street price, etc.). These product reviews obviously contrast sharply with “how-to” articles, which feature product tips and tricks. But both types of articles have as their subject matter products. And, in the future, we may want to present a list of articles to readers that contains all product reviews as well as tips and tricks articles for a single product, or a category of products.
At the beginning of the project, we set out to create a relational database that would store all of this information. But after several days of contemplating data forms and wondering if the 5th normal form would help (if I could only imagine it), we were facing the prospect of complete failure.
And then a colleague suggested we should use XML.
Even though XML’s parent, SGML, has been around for quite a long time, we knew very little about it. But after a few conversations with an outside expert we realized that by storing information about the articles in the articles themselves, and by using a simple XML compiler and Perl, we could have article data as tightly structured as if it were in a relational database.
With our consultant we held an in-house workshop where we created our Document Type Declaration (DTD): the structure of the XML tags we would use in our documents. And we also started talking to Vignette Corporation about using their StoryServer production system to manage and display our XML documents.
Thanks to the hard work of the folks at Vignette, we had a working prototype by the beginning of January, and we were starting to write the programs to deliver our XML to the web. By the middle of February we were ready to put an entire issue through the system and, after a few very long days and weekends, we were able to launch the April 1998 online issue using our new XML system.
The system we now use is not complete by any measure, but even so it has exceeded our expectations. The simple fact that the editors can quickly edit and tag documents (using an XML editor) has streamlined our production 100%. We are also finally able to produce our complicated articles (some have 17 sections made up of major heads, subsections, etc.) through templates that can be tailored to different parts of the site, or even redesigned overnight (perhaps to accommodate a new ad placement, our favorite pastime!).
Beyond the immediate gain in production efficiency, we are now beginning to exploit the data transformation possibilities of XML. Licensing used to be as difficult as producing the articles in the first place. Now the same tools that produce our articles produce our licensee content in whatever format is needed. We are now part of the ICE initiative and look forward to ICE enabling our documents for even easier licensing.
In addition to reusing the articles as complete texts in licensing and stories for our site, we will, over the next few months, begin to allow users to compile custom information packets created by directly querying our XML data store. This targeted searching will be a huge step forward from simple free text searches and will revolutionize the way people find information on our site.
We also will start using XSL (extensible style sheets) to feed our XML directly to XML capable browsers (Netscape 5, IE 4) and allow the browsers to format the articles. This heralds the ends of tables and will most likely cause much rejoicing by designers everywhere.
But I’m getting a bit far into how we will use XML…
Right now XML has allowed us to take a huge step forward in overhauling the production of PC World Online, and it’s laying the groundwork for a very flexible and expansive data store that will be the backbone of PC World Online for years to come.