Mad, Beautiful Ideas
Developing SQL

At the Registrar’s Ofice at Washington State University, the web programming that I do requires almost as much T-SQL as it does VBScript. In fact, I spend a fair amount of time deciding every day if a task would be better accomplished on the database or on the web server. Often times, the choice is obvious. Sometimes it isn’t.

For instance, it’s grading time at WSU, and one of our reports we make available for the administration provides statistics on grading; what courses have been graded, what courses need to be graded, etc. The problem we ran in to, was that the mechanism used by that report to determine who was instructing the course was different from the mechanism being used in other parts of the system, like our schedule of classes.

In the mechanism that was being used, the instructors name was being looked up in a table received in a nightly download from IT. This download has two problems. First, it does not resolve crosslisted or conjoint course relationships (a course listed with more than one course prefix and number). Second, for courses taught to students on an individual basis (research seminars, mostly), the data we had from IT would just choose an instructor. The more robust mechanism resolves the crosslist/conjoint relationship and also makes a notation for individually taught courses. Most of the code was already written, but by doing a lot of that code on the server, the queries run pretty quickly.

The problem, is that the mechanism that we’re using can’t resolve relationships beyond one level. So a child-of-a-child reports the wrong parent. I wrote up an attempt to do this lookup via a recursive SQL function. This was a disaster, raising the running time of our Sections view from about 2 seconds, to almost thirty. This view is invoked thousands of times a day. Clearly, this sort of a slow down was unacceptable. Interestingly enough, I can offload this lookup to the webserver, sacrificing about a meg of memory to cause that lookup to take almost no time at all. The problem is, this solution won’t work everywhere. In some places it would result in more calls into the database, which could end up slowing things down as well. Ideally, our data from IT would have all children point to their ultimate parent, but we can’t depend on waiting for a solution from them arriving anytime soon.

However, occasionally recursive SQL works really well. A more recent project is moving our Contact Us pages into the database. One thing I wanted were subcategories for a few listings. I needed to get the data out of the database in a sorted order, because I knew the SQL would be faster at sorting the data. However, resolving the children was proving difficult. Even using SQL Server 2005’s Common Table Expressions didn’t work, because it didn’t preserve the sort order I required. Luckily, recursion works very well in this case, probably because the data set is far smaller than the data set for our section data.

FUNCTION [dbo].[GetChildEntries] 
        (
                @ParentId int
        )
        RETURNS 
        @ReturnTable TABLE 
        (
                ParentSectionId int, 
                SectionId int,
                SectionName char(64),
                ShowTitles bit,
                EntryName char(64),
                EntryTitle char(64),
                NetworkAddress char(128),
                Phone char(10)
        )
        AS
        BEGIN
                DECLARE ChildSectionsCursor CURSOR FOR 
                        SELECT Id FROM ContactUsTest.dbo.Sections
                        WHERE ParentId=@ParentId ORDER BY SortOrder;
                DECLARE @CurrentId Int;
                OPEN ChildSectionsCursor;
                FETCH NEXT FROM ChildSectionsCursor INTO @CurrentId;
                WHILE @@FETCH_STATUS = 0 
                        BEGIN
                                INSERT INTO @ReturnTable SELECT * FROM
        ContactUsTest.dbo.GetEntryList(@CurrentId);
                                INSERT INTO @ReturnTable SELECT * FROM
        ContactUsTest.dbo.GetChildEntries(@CurrentId);
                                FETCH NEXT FROM ChildSectionsCursor INTO
        @CurrentId;
                        END
                CLOSE ChildSectionsCursor;
                DEALLOCATE ChildSectionsCursor;
                RETURN 
        END

The final solution uses one Stored Procedure (for ease of calling), and two functions, of which this is the recursive one. The more development I do with SQL, the more I’ve grown to love writing Stored Procedures and Functions. They are far easier to read while in the website code, and best of all, they abstract away the implementation details, giving me freedom to modify the underlying structure without having to rewrite a large amount of code.

However, Microsoft SQL Server (I haven’t done quite as much direct development in MySQL or PostgreSQL), has some caveats that can make writing code a bit of a pain. For instance, the SQL Server tends to report strange line numbers for errors. I’m still a bit fuzzy of what the interpreter ignores in the preamble to the procedure when reporting error line numbers. This is a hassle because it makes the “go to line” option nearly useless in debugging the error. Also, I found that if an error occured within a function called by a function, SQL Server would report the error at the line of the function called. Needless to say, this isn’t as helpful as it would be if the actual source of the error were identified.

Still, all that can be worked around, once you’ve learned the behavior. Inconvenient, yes. But still workable. What I really have trouble with, is SQL Server not handling recursive calls very well. As I said, there is overhead, in this case, the dataset is small enough (and will remain small enough) that I’m not worried about it. But making modifications to recursive functions is a hassle. If I change the return type or arguments at all, I have to comment out all the recursive calls, recompile, and the uncomment them and recompile, otherwise SQL complains because it can’t recognize that it’s modifying the object that no longer matches the definition. This would be a fairly trivial change on Microsoft’s part, and would make writing these tasks far easier. I would also like to see the overhead of calling functions reduced, but I can not say yet if that is a problem with Microsoft’s SQL server, or SQL servers in general.

I know recursion is often viewed as harmful, due to the added complexity it can add to code. In some circumstances, I’ve found that the overhead involved is highly harmful as well, but sometimes it really is the easiest and clearest way to implement an algorithm.