Printable HTML Pages

The secret number is: 660px

And you thought I was going to say it was 42.

660 pixels is the page width for printing html pages without clipping from most browsers (actually, I got 672px before I saw clipping). Of course, your millage may vary. It appears that IE defaults to 0.75 inch borders all the way around. Firefox has 0.5 inch borders, and Opera says it has 0.5 inch borders, but looks like it has 0.25 inch borders. Anyway, when your browser magically converts from the screen context to the printer context, it seems to map that 660 to fit inside of the 0.75 inch borders that IE defaults to. Which works nicely for the other browsers as well, so that is the number I am going to start using.

Now, if you provide one css style sheet for printing and one for screen, you can eliminate the menus, nav bars, ads, and other non-critical elements of the page so that only the main content div is printed. Then, you can make your content div be 660 pixels fixed width.

That means you can still use the artistically pleasing “rule of thirds” for your theme’s layout and still have 1000px pages with one third devoted to a nav bar that doesn’t show up in the printed page. in order to better display on a monitor.

I’m not sure what the effect looks like on a mobile phone. You could use CSS to re-arrange the page for mobile devices so that the 660px was the entire width, and the nav bar / side section was available somewhere else.

Sample User Authentication Database Schema

My User/Password Database Schema for Business Applications

When I first bumped into the built in ASP.Net User Authentication Provider, I lived for a couple of hours in bliss thinking to myself “that’s convenient, I’ll never have to worry about the user authentication portion of my websites and business applications again” — until I tried to use it.

It really is pretty decent, but it is geared toward large “web 2.0″ applications where users sign up and start contributing. Not business applications where I.T. or H.R. adds users and controls groups very closely. Where there are more than just “admin” and “user” account types.

In my experience, businesses have many roles within their organization, and they want to be able to drill into the application and have a single button or text field to be hidden or disabled based on whether or not a user is in a particular group.

Some want them to be tied in to Active Directory, but some don’t even have computer logins for all their employees, and some don’t want applications (especially 3rd party applications) to be tied in to Active Directory (dispute the marketing that claims otherwise).

authActivity

authActivity

Id : Integer <<PK>> <<auto incr>>

DisplayName : String(64) <<not null>>

Detail : String

An Activity is a single thing a user might want to do in the system. For instance “Create users,” “Delete users,” “Re-activate users,” “Assign Employee ID Numbers,” or “View Social Security Numbers.”

As developers build features into the software, they create new records in this table. There is no reason to be skimpy and try to lump a bunch of activities into a single umbrella activity. You can create activities for every feature of the software if you want.

Activities are only created (or maintained) by development. The people who administrate the system do not have any power to create new activities (other than to bug development to create an activity) because it is the responsibility of the development team to make sure the code is aware of what activities could be performed in certain areas of the software and to enable/disable things based on the current user’s permission set. It wouldn’t do any good for a non-developer to create an activity (like say: “Reset Other People’s Passwords”) if the software has no concept of what that means.

The Detail field on this (and the other tables) is an XML field not just a description field. The actual data stored in that field is really up to you, and your application could support any number of configuration fields inside of that detail field. This flexible design is really convenient, but comes at the cost of not having straightforward SQL access (sorting, and filtering) to the contents of the field. You are almost required to pull it into your application and use LINQ to XML to get to the stuff you store in the details field.

authRole

authRole

Id : Integer <<PK>> <<auto incr>>

DisplayName : String(64) <<not null>>

Detail : String

A role is a set of permissions. A company might have an “HR” and an “HR Admin” role. As well as a “BookKeeper” role and a “Salesman” role. Perhaps they will also have a “Tech Support’ role and a “Systems Admin” role. Each role has any number of activities associated with it, overlap is OK (so the Systems Admin role and the Tech Support role can have a fair amount of overlap, that is OK). It is also OK for a single user to be assigned multiple roles. For instance a “Salesman” could also be a “Payroll Clerk.” This design is based entirely on what experience has dictated as I have built business software over the years.

authRoleActivity

authRoleActivity

Id : Integer <<PK>> <<auto incr>>

RoleId : Integer <<FK-authRole>>

ActivityId : Integer <<FK-authActivity>>

Detail : String

This table is simply the many-to-many join table that connects authRoles and authActivities.

authUser

authUser

Id : Integer <<PK>> <<auto incr>>

DisplayName : String(64) <<not null>>

UserName : String(32) <<not null>>

Password : String(32)

Detail : String

This should be self explanatory, except for the fact that there is not a field for keeping track of who is active. (See the next table for why.)

authUserActive

authUserActive

Id : Integer <<PK>> <<auto incr>>

UserId : Integer <<FK-authUser>>

AsOf : DateTime <<not null>>

Until : DateTime

Detail : String

As businesses get to be a little more mature, they are no longer worried about simply whether or not a user is active and therefore allowed to log in right now, but they are interested in WHEN a user was allowed to log in. They want to be able to create a user who will expire in a month. Or they want to be able to have their summer interns only be active during the summer months. And, when they are audited, they need to be able to show that a particular user had access to a particular system at a particular time.

In order to accommodate this, I created this table, which will have entries for each time period where a user is active. So, the system will look to see if there is a record in this table where the login day is between the AsOf date and the Until date. If there is one, then the user is active (or was active on the date you tested for).

You should use a constraint to make sure AsOf < Until. Also, you can either choose to allow Until to be null which means "Until we assign an end date" or you can set it to the max date for your DBMS as a default field value. Using a null is technically more "correct" but I think the logic for querying looks nicer if you can assume that Until contains a constant value guaranteed to be in the future.

authUserRole

authUserRole

Id : Integer <<PK>> <<auto incr>>

UserId : Integer <<FK-authUser>>

RoleId : Integer <<FK-authRole>>

AsOf : DateTime <<not null>>

Until : DateTime

Detail : String

This is a simple many-to-many join table assigning users to groups with one little twist: It also has the AsOf / Until logic. When you are audited, it isn’t good enough to know whether or not a user had access to the system. You need to be able to show that they had access to a particular feature or not.

And, here is a SQL script for generating the Schema in SQL Server 2008 (don’t say I never gave you anything).

create function
dbo.MaxDateTime()
returns datetime as
begin
    return convert(datetime, '99991231 23:59:59:997')
end
go

create function
dbo.MinDateTime()
returns datetime as
begin
    return convert(datetime, '17530101')
end
go

create table
authActivity
( Id int primary key identity
, DisplayName varchar(64) unique not null
, Detail text
);

create table
authRole
( Id int primary key identity
, DisplayName varchar(64) unique not null
, Detail text
);

create table
authRoleActivity
( Id int primary key identity
, RoleId int not null foreign key references authRole(Id)
, ActivityId int not null foreign key references authActivity(Id)
, Detail text
, constraint nk_authRoleActivity unique (RoleId, ActivityId)
);

create table
authUser
( Id int primary key identity
, DisplayName varchar(64) unique not null
, UserName varchar(32) unique not null
, Password char(32)
, Detail text
);

create table
authUserActive
( Id int primary key identity
, UserId int not null foreign key references authUser(Id)
, AsOf datetime not null
, Until datetime not null default dbo.MaxDateTime()
, Detail text
, constraint nk_authUserActive unique (UserId, AsOf)
, constraint ch_authUserActive_AsOfUntil check (AsOf < Until)
);

create table
authUserRole
( Id int primary key identity
, UserId int not null foreign key references authUser(Id)
, RoleId int not null foreign key references authRole(Id)
, AsOf datetime not null
, Until datetime not null default dbo.MaxDateTime()
, Detail text
, constraint nk_authUserRole unique (UserId, RoleId, AsOf)
, constraint ch_authUserRole_AsOfUntil check (AsOf < Until)
);

C# With Block

So, one of my big “beefs” with C# is the lack of support for “with” blocks. If you just want the solution to my complaints, scroll down most of the way through the article.

I come from a C/C++ background, but spent several years in the VB6 world, and came to the conclusion that with blocks are pretty handy. They do an EXCELLENT job of self-documenting the fact that you intend to do several things to a single object. They are used most frequently (for me at least) as a way of moving values from one object to another… like this:

With cmdOK
  .Top = cmdCancel.Top
  .Left = cmdCancel.Left + cmdCancel.Width + 90
End With

I wish it had support for aliasing, and using more than one with block variable at a time because I tend to use it to work with 2 objects at once… I envision something like this…

With cmdOK as btn1
  With cmdCancel as btn2
    btn1.Top = btn2.Top
    btn1.Left = btn2.Left + btn2.Width + 90
  End With
End With

You might be asking: what do you gain other than a handful of keystokes? I mean, technically, the With blocks require more keystrokes than the plain old code required in the first place.

Good point. Except the alias thing does something important. It makes it easy to copy and paste the code to do something similar someplace else, all you change is the aliases at the top, and you don’t have to worry about whether or not you found all the places where you used that variable.

It also makes it so that you can de-reference a property at the end of a long dot-chain, and give it a name that goes out of scope when you are done. For instance…

With MyApp.frmMain.cmdOK as btn1
  With MyApp.frmMain.cmdCancel as btn2
    btn1.Top = btn2.Top
    btn1.Left = btn2.Left + btn2.Width + 90
  End With
End With

That saves a temporary reference to the buttons rather than calling down the object tree with each iteration. I have often thougth that it would be convenient if you could have one with block with two aliases, like this…

With MyApp.frmMain.cmdOK as btn1, MyApp.frmMain.cmdCancel as btn2
  btn1.Top = btn2.Top
  btn1.Left = btn2.Left + btn2.Width + 90
End With

But that is (was) VB and now life is about C#. Except C# doesn’t support With Blocks – because I guess With Blocks are for wimpy programming languages.

So, I decided, wouldn’t it be interesting to create a with block construct?
Here is the best I can do:

first, you need an extension class

public static class Extender
{
    public delegate void Proc(T arg);

    public static void With(this T blockVariable, Proc block)
    { block.Invoke(blockVariable); }
}

Then, you can use it like this:

class Program
{
    static void Main(string[] args)
    {
          //using System.Xml.Linq
          XElement element1 = new XElement("hello");
          XElement element2 = new XElement("world");

          element1.With(e1 =>
          {
              e1.Add(element2);
          });
     }
}

You could even nest two or more of these with blocks…

        element2.With(e2 =>
        {
            element1.With(e1 =>
            {
                  e1.Add(e2);
            });
        });

It ain’t great, but it isn’t half bad. And, it does fulfill all of the needs I have for a with bock. It is self documenting in the sense that is says: “I intend to do several things with this object.” It also lets you assign a temporary name to a variable without creating an orphan variable that isn’t needed elsewhere in your function. And, it could be copied and pasted to do something very similar elsewhere with minimal modification (only modify the top, and the parts that need to be different in the new case).

The truth of the matter is, with extension methods and Lambda, C# really is starting to mature and get to a point where it has much of what makes Ruby and Python compelling, and, at the same time, it enjoys compile time type safety, and a powerful development environment. I dig it.

Or, you could do this (not quiet as self documenting, and a little odd, but it would work)

(new XElement("hello")).With(e1 =>
{
    e1.Add("more stuf");
    doSomethingWith(e1);
});

An object that falls out of scope immediately after the little block that uses it. This sort of move makes more sense back in vb6 when the framework actually deleted objects when they were no longer referenced, rather than let garbage collection do it later on. It would also be a bit more readable in a VB6 style With block. But the fact that it would work exactly as expected means that my C# with block is a genuine with block structure.

What do you think it will take to get the guys down at Microsoft to add With Block support to the core language? – Yeah, me neither.

Why Doesn’t LINQ Perform?

I’m not talking about speed / performance, LINQ is plenty fast for my needs. I’m wondering why linq can’t perform any ACTIONs, why it can’t DO anything.

Yes, I understand, its called “Language Integrated Query” because it is meant to be used as a way to ASK questions, not PERFORM operations.

That is all fine and dandy, but, despite its name, LINQ is actually a set manipulation language more than anything else. It is a language for working with entire sets of things in a uniform way.

Let’s get serious here, in 90% of cases, immediately following a LINQ query, we see a foreach statement that iterates over the items. Often times, this loop has only a single action inside of it. For instance:

var itemsThatQualify =
	from i in itemList
	where i.hasSomeQuality()
	select i;

foreach(var item in itemsThatQualify)
{
	Item.doSomething();
}

Its getting to the point where I am getting tempted to perform the Query inside of the foreach statement instead of before it. But I think the result is less readable, what do you think? …

foreach(var item in (
	from i in itemList
	where i.hasSomeQuality()
	select i))
{
	item.doSomething();
}

The best thing I can say about that is that it DOESN’T look like english at all. In fact, the lambda version of this statement is actually more readable because it is less wordy…

foreach(var item in itemList.Where(i => i.hasSomeQuality()))
{
	item.doSomething();
}

It ain’t great, but its a bit better than the last thing. But the first one is still more self-documenting.

I’m getting to the point where I wish I could do this…

from i in itemList
where i.hasSomeQuality()
perform i.doSomething();

I actually typed that out at work the other day and said “why did they NOT make that work?” That is elegant, and would save streamline tons of repetitive coding.

Then, the wheels got turning. LINQ is written using Templates and Extension methods. hmm. Both of those are available to me for building my own stuff. Why not write my own LINQ Extension called “Perform”… Here it is, complete with a Main function that shows it off.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LinqPerform
{
    public static class LinqExtension
    {
        public delegate void Proc(T arg);

        public static void Perform(
            this IEnumerable source,
            Proc performer)
        { foreach (TSource item in source) performer.Invoke(item); }
    }

    class Program
    {
        static void Main(string[] args)
        {
            // something enumarable to work with
            var intList = new List { 0, 1, 2, 3, 4, 5 };

            // DoSomething() with each element
            intList.Perform(i => Console.Out.WriteLine(i));

            // get a little tricker (evens only)
            Console.Out.WriteLine(); // a visual break
            intList.Where(i => i % 2 == 0)
                .Perform(i => Console.Out.WriteLine(i));

            // get a little wordier (with odds only)
            Console.Out.WriteLine(); // a visual break
            (from i in intList
             where i % 2 == 1 //odds only
             select i
            ).Perform(i => Console.Out.WriteLine(i));

            // do a little more, VERY little more
            Console.Out.WriteLine(); // a visual break
            intList.Where(i => i % 2 == 0)
                .Perform(i =>
                {
                    Console.Out.Write(i.ToString() + " ");
                    Console.Out.Write((i+1).ToString() + " ");
                    Console.Out.WriteLine();
                });

            // wait for a return key
            Console.In.ReadLine();
        }
    }
}

It Worked!!! I cant believe it.
Now if I could only get native queryable support…

from i in intList
where i % 2 == 1
perform Console.Out.WriteLine(i);

would look so much better than…

(from i in intList
 where i % 2 == 1 //odds only
 select i
).Perform(i => Console.Out.WriteLine(i));

But I guess I can make due.

HOW TO: Print-off a SQL Server 2008 Database Schema

Overview

This little utility can read either a Microsoft SQL Server 2008 database schema, or a Microsoft Access database schema, and generate an HTML file documenting the structure of the tables, along with some statistical information about some of the columns. Here is a sample of what the output looks like. The entire C# project is available as a zip file containing both source code and executable binaries. I have made it available under a BSD style license, please respect my copyright.

General Use

To use the utility, simply open an *.mdf or *.mdb file with the program. You can do that by specifying a path to the database from the command line, or by just running the program and using the open file dialog it presents to select a database file. Then, you will be prompted about where to save the resulting HTML output. Make sure you save it in a folder with the provided dbSchema.css file or it will be pretty bland. The CSS file provided will italicize the primary key(s), bold the required fields, hide the min / max columns, and put page breaks between database tables. I have made an effort to provide plenty of css classes to the HTML so the look and feel of the resulting document should be entirely customizable. Have fun!

Disclaimers

Of course, you use this utility at your own risk, yadda yadda yadda. But let me make one actual warning about this program: it calculates min/max/count statistics on every column of every table, so don’t run it against a LARGE, active database. If you want to crunch on a big production database, extract a backup copy to a development server and run the numbers against that — don’t say I didn’t warn you. It took a couple of minutes of number crunching to spit out info for the database I updated the program for, and it has a hundred or so tables of various sizes a handful of which have a million or so records in them. That should give you a ballpark figure about how much load is put on the server when it milks your data into a nice little report.

something about how to connect it to SQL Server 2008.

Discussion

I recently pulled out an old program that I wrote in VB6 for printing off an access database table structure just to see how much my coding style has changed over the last 6 or 7 years. I decided to rewrite the whole thing using C# on the .Net 3.5 framework using the OleDB data provider, and then abstracted it a bit and added a second class that can pull from a SQL 2008 express database. It uses SQL Server EXPRESS but with a little bit of tweaking of the code (especially the connection string) it could easily pull from a true Microsoft SQL 2008 Server, and probably a 2005 server.

The first thing I noticed while rewriting the program was the shift in focus. In the old version, the real “work” was spent interacting with the printer – getting the fonts right, the bolding to make it easier to read, the page breaks, columns, avoiding table / field orphaning. The newer version completely avoided that whole ball of wax. Instead, I opted to output a simple html file and use CSS to get the page breaks and fonts looking exactly the way I like them. Focusing instead on just tackling the problem using elegant code, it was actually a fun project rather than just tedious work.

I found that I made extensive use of LINQ, which seemed odd when I thought about it because I couldn’t use use LINQ to SQL (what most developers usually use LINQ for) because that would require that I knew the DB schema before I wanted to print it off, and that sort of destroys the purpose of writing a general purpose database schema printing program.

There is no “LINQ to OleDB” or “LINQ to ODBC” and there certainly isn’t a “LINQ to MySQL” or “LINQ to Oracle.” And, even though there is a LINQ to SQL, and I am technically trying to print off the schema of a SQL Server database, LINQ to SQL still isn’t an option because LINQ to SQL is really more of a “LINK” to a specific database schema, NOT a general LINK to any unknown SQL database, and certainly not as a way to discover the database layout at runtime.

At its very core, LINQ isn’t about database queries, it is really a lot more extensive than that. LINQ is a language for interacting with SETS of OBJECTS. I made use of Linq to DataSets, Linq to XML, and plain old LINQ to Objects. This particular project was a lot of fun, because it reminded me just how much I love LINQ, and not just as a substitute for SQL. I especially like the new XElement XML functions that came out with to Linq to XML they are quite a treat to work with (especially if you remember the old school XML document API).

The entire XML document that loops through all the tables, and all the fields, along with columns about each field is generated from a single statement using no loops whatsoever. Its really cool. Instead of using loops, it uses the XElement constructor that can handle a set of child elements as one of its parameters, combined with a LINQ statement to generate that set of elements in a single statement. Here is the statement that spits out the html file:

output.WriteLine(
  new XElement("html",
    new XElement("head",
      new XElement("title", "Database Schema"),
      new XElement("link",
        new XAttribute("rel", "stylesheet"),
        new XAttribute("type", "text/css"),
        new XAttribute("href", "dbSchema.css"),
        new XAttribute("media", "all"))),
    new XElement("body",
      new XElement("div", new XAttribute("id", "container"),
        from TableInfo table in reader.getSchema()
        select
        new XElement("div", new XAttribute("class", "dbtable"),
          new XElement("h1", table.TableName, new XAttribute("class", "tablename")),
          String.IsNullOrEmpty(table.Description) ? null :
          new XElement("h2", table.Description, new XAttribute("class", "tabledesc")),
          new XElement("table", new XAttribute("class", "tablecolumns"),
            new XElement("tr",
              new XElement("th", "Column Name", new XAttribute("class", "columnname")),
              new XElement("th", "Type", new XAttribute("class", "datatype")),
              new XElement("th", "Reference", new XAttribute("class", "reference")),
              new XElement("th", "Min", new XAttribute("class", "range")),
              new XElement("th", "Max", new XAttribute("class", "range")),
              new XElement("th", "Count", new XAttribute("class", "count")),
              new XElement("th", "Nulls", new XAttribute("class", "nulls")),
              new XElement("th", "Description", new XAttribute("class", "description"))),
            from col in table.Columns
            let isPk = table.PrimaryKeyFields.Contains(col.ColumnName)
            select
            new XElement("tr", new XAttribute("class", "column"
                  + (col.AllowNull ? " allownull" : " notnull")
                  + (isPk ? " primarykey" : "")
                  + (string.IsNullOrEmpty(col.ReferenceTo) ? "" : " foreignkey")),
              new XElement("td", col.ColumnName,
                new XAttribute("class", "columnname" )),
              new XElement("td", new XAttribute("class", "datatype"),
                col.DataType,
                col.MaxLength <= 0 || col.MaxLength > 16*1024 ? "" : "(" + col.MaxLength.ToString() + ")"),
              new XElement("td", nbsp(col.ReferenceTo), new XAttribute("class", "reference")),
              new XElement("td", nbsp(col.Stats.Min), new XAttribute("class", "range")),
              new XElement("td", nbsp(col.Stats.Max), new XAttribute("class", "range")),
              new XElement("td", nbsp(col.Stats.Count), new XAttribute("class", "count")),
              new XElement("td", nbsp(col.Stats.NullCount), new XAttribute("class", "nulls")),
              new XElement("td", nbsp(col.Description), new XAttribute("class", "description"))
            )
          )
        )
      )
    )
  ).ToString()
);

Yes, it does a lot in a single statement, but it isn’t hard to read. It “flows” in the same basic layout as an html document, and it reads more or less like a bunch of single lines – one for each element. But in truth, it is a single statement that creates an entire, complex html file in a single call to the constructor of the root element. The entire XML tree is created in its entirety and written out to the file without ever being assigned into a variable — without any element of the HTML assigned to a variable before being added to the tree.

Summary

Sometimes I need a utility that lets me just print off a database schema in a readable format to set it on the desk next to me while I develop against the database and use it as the authoritative reference. This is the utility I developed for that purpose. Let me know if you’d like to see it updated to allow printing the table structures from another database engine.

My Email Schema in SQL Server 2008

Here is the SQL Server script for creating my simple Email Database for the samples I will be blogging about on this site. The schema is documented here. And, sooner or later, I’ll post SQL scripts for other languages as well.

Dont forget to check out the Pro’s and Con’s for using SQL Server to host my sample email database down at the bottom of this blog entry

-- This script requires SQL Server 2008
create database Email;
use Email;

-- We store messages in folders
create table Folder
  ( Id        int primary key identity
  , ParentId      int foreign key references Folder(Id)
  , DisplayName    varchar(64) not null
  , Description    text
  , constraint nk_Folder unique (ParentId, DisplayName)
  );

-- an address is a unique emaill address
create table Address
  ( Id        int primary key identity
  , FriendlyName    varchar(128)
  , UserName      varchar(64) not null  -- always store in lower case
  , DomainName    varchar(128) not null -- always store in lower case
  , DisplayName    as
        case when FriendlyName is null then
              UserName + '@' + DomainName
        else
              FriendlyName + ' <' + UserName + '@' + DomainName + '>'
        end
  , constraint nk_Address unique (UserName, DomainName)
  );

-- Each message represents a single email
create table Message
  ( Id        int primary key identity
  , FolderId      int foreign key references Folder(Id)  not null
  , SenderId       int foreign key references Address(Id) not null
  , SentOn      datetime  not null
  , Subject      varchar(1024)  not null
  , EmailHeader    text
  , EmailBody      text
  , IsUnread      bit    not null  default 1
  , IsJunk      bit    not null  default 0
  , IsStar      bit    not null  default 0
  , Priority      tinyint not null default 3
        check (Priority in (1,2,3,4,5))
  );

-- One entry for each header (to/cc/bcc) on a message
create table MessageAddress
  ( MessageId      int foreign key references Message(Id)  not null
  , HeaderType    char(4)  not null
        check (HeaderType in ('to', 'from', 'cc', 'bcc', 'r-to'))
  , AddressId      int foreign key references Address(Id)  not null
  , constraint pk_MessageAddress
        primary key (MessageId, HeaderType, AddressId)
  );

-- An attachment is a single binary file
create table Attachment
  ( Id        int primary key identity
  , FileName      varchar(512)  not null
  , Extension      varchar(64)  not null
  , Content      varbinary(max) not null
  , FileSize      bigint  not null
  , Hash        char(32)  not null
  , constraint nk_Attachment unique (Hash, FileSize)
  );

-- Join Table
create table MessageAttachment
  ( MessageId      int  not null
        foreign key references Message(Id)
  , AttachmentId    int  not null
        foreign key references Attachment(Id)
  , constraint pk_MessageAttachment
        primary key (MessageId, AttachmentId)
  );

declare @RootId int;
set @RootId = 1;
set identity_insert Folder on;
insert into Folder (Id, DisplayName)
values (@RootId, '');
set identity_insert Folder off;

-- some root level folders

insert into
Folder (ParentId, DisplayName)
values (@RootId, 'Inbox')
  , (@RootId, 'Outbox')
  , (@RootId, 'Sent')
  , (@RootId, 'Trash')
  , (@RootId, 'Drafts')
  ;

declare @InboxId int;
set @InboxId = (
  select Id
  from Folder
  where ParentId = @RootId
  and DisplayName = 'Inbox'
  );

-- a few sub folders
insert into
Folder (DisplayName, ParentId)
values ('Home', @InboxId)
  , ('School', @InboxId)
  , ('Work', @InboxId)
  ;

-- I guess we should a couple of addresses
insert into
Address (FriendlyName, UserName, DomainName)
values ('Brent Larsen', 'brentoboy', 'gmail.com')
  , (null, 'jimbob', 'someplace.com')
  ;


Pro’s and Con’s

Of using SQL Server for this database

The only reason I can think of to use SQL Server to host an email database is if you wanted to have your entire office share a single database for their email. If you have a Windows Server running a decent copy of SQL Server, chances are you already have Exchange installed. So, it would be pointless to do such a thing.

The only reason I can think of to write an email application is if you don’t like the available options (Outlook, Thunderbird) and want your own. I cant think of a compelling reason to do that, but if I were going to build my own email application I would probably be building it for a single person. So a smaller database engine might be better suited, such as SQLite or Access.

But, as this theoretical database is merely a sample created for database discussion, lets assume that there is a need for an email application built on top of such a database. What reasons might we have to build a SQL Server backend?

Firstly, if we are building the application that will use the database in C#, SQL Server would be an excellent choice as it enjoys its current place as the only database supported by LINQ (which is more than just hype, its good stuff). In most cases, from what I can see, SQL Server is the database of choice simply because .Net is the framework of choice, and it is an integrated part of the whole Microsoft Development experience. And, I’m not saying that maliciously either. Microsoft Visual Studio has been the best programming IDE since Visual Basic 5. Compare it to Eclipse or any other IDE and you will see the competition lacking in bringing you a unified, well designed productive development environment. So there you have it. The most compelling reason I can think of to select SQL Server for our email program is because Visual Studio is so dang productive.

Example Email Database Schema

My Simple Example Database Schema

I decided to draw up a simple database schema to start using for examples on the site. In today’s blog entry I will describe the different tables and their relationships, and, in the future, I will post up sql scripts for creating the tables in different sql dialects.

I selected an email database as my example schema because I wanted a schema that most people are familiar with so I can explain database concepts instead of spending too much time explaining the schema itself.

My table names are singular; I like it that way, if you want to comment on that, do it someplace where others can read about the root issue (this blog is about this schema, not about the singular plural debate) like this post.

I’ll post Sql Scripts for generating this database in various DMBS’s here:

The Schema

Folder

Folder


Id : Integer <<PK>> <<auto incr>>

ParentId : Integer <<FK-Folder>>

DisplayName : String(64) <<not null>>

Description : String

Folders are organized into a hierarchical tree based on the ParentId as the self referencing field. I wish I could make the ParentId field not null constrained, but the root folder has no parent. I might someday blog about ways to approach this problem.

Each email message will be stored in exactly one Folder. That relationship is defined in the Message table.

Address

Address


Id : Integer <<PK>> <<auto incr>>

FriendlyName : String(128)

UserName : String(64) <<not null>>

DomainName : String(128) <<not null>>

DisplayName : String <<read only>>

A row in the Address table represents a unique email address.

The FriendlyName is the actual person’s name or nickname usually found before the email address.

The UserName is the portion of the email address before the @ symbol, should always be stored as lower case.

The DomainName is the portion of the email address found after the @ symbol, should always be stored as lower case.

The DisplayName is a read only field calculated (or concatenated) from the other fields. I figure it will be so common to concatenate the above fields together into a readable email address, I created this field to do it for you.

The table should be constrained to not allow multiple records with exactly the same lowercase(username@domainname).

Message

Message


Id : Integer <<PK>> <<auto incr>>

FolderId : Integer <<FK-Folder>> <<not null>>

SenderId : Integer <<FK-Address>> <<not null>>

SentOn : DateTime <<read only>> <<not null>>

Subject : String(1024) <<not null>>

EmailBody : String <<not null>>

IsUnread : Boolean <<not null>> <<Default: True>>

IsJunk : Boolean <<not null>> <<Default: False>>

IsStar : Boolean <<not null>> <<Default: False>>

Priority : Integer <<Between 1 and 5>> <<Default: 3>>

A row in the Message table represents a single email message.

I think the model here is fairly self explanatory, I will however point out that by defining the FolderId and SenderId here, we limit ourselves so that a Message can only be stored in a single folder at a time, and can only be sent from a single email address. I don’t see either of these as a problem.

If, in the future, we decide we want a message to be stored in multiple folders, we could consider making virtual folders that have links to messages rather than attempt to keep track of a single message actually being in two folders.

Of course, an email header contains the sender along with the time it was sent, as well as other data that could be significant. So, why do we have fields for SenderId, SentOn and Subject? Mostly so we can sort and index based on those fields.

MessageAddress

MessageAddress


MessageId : Integer <<FK-Message>> <<not null>>

HeaderType : char(4) <<not null>> <<{to, from, cc, bcc, r-to}>>

AddressId : Integer <<FK-Address>> <<not null>>

This table is used to define a many to many relationship between Messages and Address. This sort of table is necessary because a single message could be sent to multiple “to” addresses as well as multiple “cc” addresses. So, each link has a header type so we can know the nature of each relationship between an email and its related addresses.

Even though an email header can contain tons of somewhat useless information, there are a few pieces of the header that are of interest from a data seaching standpoint. I had considered making an associative array style table that could store all the header fields from every message, but decided against it. But, the From, To, CC, BCC fields are of interest and might want to be queried against some day, so I made this table to allow many email addresses to be linked with a single email based on their relationship as defined in the header.

Attachment

Attachment


Id : Integer <<PK>> <<auto incr>>

FileName : String(512) <<not null>>

Extension : String(64) <<not null>>

Content : LargeBinary <<not null>>

FileSize : LongInteger <<not null>>

Hash : String(32) <<not null>>

A row in the Attachment table represents a single file attached to an email.

I would have included a foreign key field to link an attachment to the email to which it is attached, but attachments can be rather large, and it is somewhat common for the same attachment to be attached to multiple emails. So I decided to create the Hash and Size fields as a way of detecting whether or not a file is an exact duplicate of an existing file.

The take home message here is that the relationship between Messages and Attachments is many-to-many meaning one message can have multiple attachments, and one attachment can belong to multiple message. In order to keep track of a many to many relationship, we need a separate table where one record represents a single link between the two tables. (See the MessageAttachment table defined below).

The Hash field will be an MD5 checksum generated from the Content of the file.

MessageAttachment

MessageAttachment


MessageId : Integer <<FK-Message>> <<not null>>

AttachmentId : Integer <<FK-Attachment>> <<not null>>

This table is a simple many to many relationship table.

Initial thoughts on ASP.NET MVC

This blog entry is entirely abstract, no code whatsoever.  And it contains opinion.  If opinion offends you, or if you only want functional code to play with, this entry will probably just frustrate you. That said, my topic of discussion is the new MVC option for ASP.NET.

The Model View Controller Framework: Brought to You by Microsoft

Well, Microsoft has an official Model / View / Controller add-on available for ASP.NET. This might not be news to you. And, I venture to say that most people who use it will be coming from a previous iteration of ASP.NET and not really know much of Ruby on Rails, or Django (the open source MVC architectures). I went ahead and took the plunge and did the test drive of ASP.NET MVC just to see what I thought, and here are my initial feelings.

My first thought is that the Microsoft MVC is much more like Rails than it is like Django. The URL is parsed for you into /Controller/Action/id, of course, you could override it (in true rails fashion) the default is built for you (unlike Django). It also has controller classes (like Rails) which have functions for each Action.

What ASP.NET MVC lacks is the Rails “Magic”. Honestly I am not a fan of Rails Magic, I prefer Django’s philosophy that magic should not be what makes everything work. The magic I am talking about actually runs deeper than you might think. Rails and Django are written in so called “scripting languages”. In scripting languages, objects are very fluid, objects can define themselves at run time rather than having strict classes built at compile time. The effect that this has is two-fold. (1) a lot of rails magic isn’t even possible, such as defining database relationships in the model using odd functions in the class definition such as HasAndBelongsToMany, or creating functions that are called before the actions in a class (actions seem to be called directly by the framework). Also, you must explicitly return a View object in an action, even if you just want the view by the same name as the action to be returned. (2) You get early binding instead of late binding. This means you get compile time warnings about calling functions or accessing data members that don’t exist. It also means you get compile time decisions about which overloaded functions to call as opposed to run time object evaluation. This means it is capable of being much faster.

In Rails and Django, it’s the base classes provided by the model that make application development so powerful. The ActiveRecord and Django.Model classes are the beginning of development for an MVC app in either of those two languages. In Django, I usually start building an app by pulling up models.py and defining my data. It gives me a single file that lets me define all the fields and relationships in my data model. The framework uses that to produce the actual tables and SQL necessary to finish the task. When you are done, the “model” layer includes all the data related stuff, and will also contain the validation information that you want controlled at the database level. In Django, the model is the starting point. When you want to add features, you start with the model and work out from there.

Rails is another animal altogether. You start with an update file, you create or modify each table by creating an update function (and corresponding downgrade function). These functions use ActiveRecord to keep things database neutral, but the end result is that the database is defined by the result of the update functions, and not really defined by classes with properties representing the database fields. The actual classes that represent records in the database are generated magically at runtime (hurray for scripting languages) and the files that define the classes only really contain information about relationships between different tables and other code that cant be generated on the fly by accepting the default implementation for everything. To be totally honest, I hate this method of doing things. It makes it terribly difficult to just print off something that lets you see your database structure and pick it all apart. The things that define your model are divided up into hundreds of little files, none of which tell enough of the story to be truly helpful, but all of which contain enough that they can’t be ignored. The only thing easy about the way Rails defines its model is the upgrade (and downgrade) path for your schema is significantly easier to walk than the upgrade path in Django (or ASP.NET MVC).

Microsoft’s way is a bit more like Django, with a Microsoft centered twist. Instead of defining the data classes in code and generating the DB creation scripts, you define the database in whatever database engine you intend to use, and then generate the classes to match using LINQ. This approach is the way .NET has always treated databases, and has the effect of making it hard(er) to switch to a new DB engine. You can see the repercussions of this: SQL Server Express ships free with Visual Studio Express (also free), and by default, people will use SQL Server. Once you generate your model classes and start building on top of them … your hooked. To be totally honest though, who cares.

Everyone seems to agree that model/view/controller is a decent design pattern and makes for easier web application development. But, no one seems to agree on what belongs in each section of the app. For instance, in Rails, the “Model” is largely generated by the database, and has nothing but database interaction code in it. The View files are all used to generate a page viewable in the browser. All the extra fluff (like business logic) is in the Controller layer. In Django, the Model defines the data, and contains very little more than table and field names. It could be used to enforce business logic, or not – you decide, its your app. The controller is really the Django framework itself, along with a few things you define, which are usually settings, not code. So, the View contains all the page handling logic, as well as forms validation etc.

So how does Microsoft split things up? The controller layer is for web page routing and forms validation (similar to Rails). The View layer contains aspx pages (again similar to rails), and the Model contains EVERYTHING else (business logic, LINQ classes, or some other database interaction code of your choice). The tutorial I went through even said that if the controller classes have action functions that start getting to be more than a few lines long you should consider moving that logic into a function that is stored in the model layer somewhere.

The model is the fundamental difference between ASP.NET MVC and previously existing MVC web frameworks. For Rails and Django, the model is what makes each one distinct and powerful. The model gets to focus of the development attention. The elegant design of the model is what is so darned attractive. In ASP.NET the “model” is left for you to define. Now, just because it is left very open ended, that doesn’t mean it is really a cop out. In fact, it seems that the addition of LINQ to the .NET framework is intended to be the model language for ASP.NET MVC. So, it isn’t like they gave you nothing to work with, it is just a much fuzzier line than what I expected coming from a Rails/Django background.

The other fundamental difference I see is the complete lack of scaffolding. There are some scaffold generators out there, and there seems to be a plug in or something in the works, but honestly, day one in Rails you learn to use scaffolding – it’s the first big “wow, that rocks” experience you get when developing in Rails. With Django, the built in admin pages have the same effect (if not significantly more so). Lack of scaffolding left my tour of the Microsoft MVC feeling a bit under-whelmed.

The other thing worth noting is that you loose a lot of what makes ASP.NET so productive.  In ASP.NET, the page is central.  All the logic for a page is contained on that page.  You have instant Ajax without even knowing it by using an <asp:textbox> tag with scripts that run on the server.  All the magic is generated for you to pass things back and forth between the logic defined in the page and actions on the client end.   This can’t be magically done for you when the processing center of the page is now a controller class that uses pages like functions instead of fully fledged independant applications.  You have to handle your post-backs manually.  Not that difficult, but it is a step backward (at least in that particular direction) from doing things the normal ASP.NET way.  My hope is that ASP.NET MVC will get to be so popular, and enough people will harp on this shortcomming that it will be fixed — or a suitable alternative for forms processing will be provided.  Currently I dont see a conveneint “form” base class that lets me generate and validate forms the way Django does, and I dont see the built in Ajax functions that makes Rails easy to use dispite the fact that you have to handle it all in the controller.   Forms processing is sort of underbaked.

This is a preliminary judgment of the new framework, and overall, I am excited to start using it. It seems good. As a semi-compiled language with early-binding, it has potential to be much faster than rails (time will tell). It does have the effect of organizing the code so that designers can work on the view, application programmers can work in the controller and systems architects can focus on the model without getting in each other’s way. I’m reasonably impressed. I think it’s a step forward. I think LINQ is better than ActiveRecord, but I doubt it will be as nice as Django Models. I guess only time will tell.

The other thing I want to see is whether or not the fanboys who author the books published by Microsoft Press will completely ignore the fact that MVC was first popularized by Ruby on Rails, and the ASP.NET MVC framework is obviously designed using Rails as the primary reference for how to do it right. Will Microsoft’s propaganda act as though they invented the idea, like everything else that comes out of Microsoft?

More importantly, I wonder if, like other things, Microsoft will learn from the other guys, and then go on to make something significantly more usable. Maybe in a couple of more major releases of the .NET framework, will ASP.NET MVC be truly superior? I bet it will.

Using XSLT to transform XML into HTML

It is getting to be more and more common for web services to product XML data as opposed to serving up fully fledged HTML. Client side applications are then responsible to take the XML data they requested and rewrap it into whatever formatting they want. XSLT is a sort of style sheet for taking a segment of XML data and converting it to another format. It is similar to CSS in several ways, but much more complex (and therefore much more powerful – and sometimes more annoying).

When I first bumped into XSLT I was immediately interested in its capacity to allow someone to enhance and simplify HTML. In much the same way that HTML got more readable when web developers started using CSS, it can be simplified even further by using XSLT. For instance, many CSS based design patterns require the creation of multiple unnecessary DIVs in order to generate a flexible button background. It is not uncommon to see the following instead of the old school way of flexible buttons via tables and corner images.

Click me!

Then, the CSS for the button class has background images assigned based on descendant DIVs and you create a button without including a bunch of images in the HTML. This is an improvement over traditional HTML/table buttons, but it has its drawbacks. Consider the fact that, in order for this button to be rendered correctly, you would need to remember to put exactly 6 child DIVs in your HTML.

If your designer wanted to spruce things up a bit, and add an extra image for a cool corner enhancer thing, you’d have to go find all the DIVs where class=”button” and then wrap the inner text in a SPAN or some equally mundane chore. Wouldn’t it be easier to define a new element altogether that just works (let’s call it COOL-BUTTON), and any changes to the way the cool button needs to be rendered in “real” html would just take care of themselves? What you need is a button “template” that gets applied to all your web pages when you use a COOL-BUTTON tag.

The COOL-BUTTON could be used like this:

Click me!

And somewhere you would have a template file somewhere that says:

 means
(contents here)

That is more or less what XSLT was made to do. Now, that I have talked it up, let me say that XML Style Sheets are not an oasis for web developers. They are actually kind of ugly, but powerful enough that it is worth knowing what they are and what they do so you can choose to use them when the time is right.

I decided to invent my own outline XML language that allows me to put together a simple outline, and then write an XSL file to convert the outline XML into viewable HTML using ordered lists. The outline I built looks like this.




  • First Heading
  • sub heading
  • point 1
  • point 2
  • another sub heading
  • point 1
  • point 2
  • sub heading
  • First Example
  • Second Example
  • Third Example
  • Second Heading
  • more
  • even more
  • Third Heading
  • Remember, this is XML not XHTML, so things might look a bit different. The first line is a DTD (document type declaration). Just saying “this is XML.” The second line says “format this XML using the outline.xsl style sheet” (outline.xsl is listed below).

    The outline xml I have defined here uses tags to wrap the whole page similar to the way tags wrap an entire html document. Inside the outline, you either have items (<li> elements) or sub sections (<sub> elements). Sub sections contain more items and subsections. You (the xml author) don’t have to worry about what kind of number system to use at each level of the outline, that is provided for you by the eXtensible style sheet. Nor do you have to worry about whether or not your designer uses tables or lists to accomplish displaying the elements (or something different altogether).

    In order for a browser to display this outline, it has to be converted to valid html. This is done by the browser – assuming it can find the style sheet. Here is the stylesheet I wrote to handle this:

    
    
    
    
    
    
    
    
    
    
    
    
  • The first thing you will probably notice is that this is not at all like CSS. This stylesheet language is valid XML. Personally, I think this is really gross. XML is a wonderful document markup language, and a beautiful language for defining hierarchical data. It is downright ugly for program control and looping, but someone thought XML would be a good language to write style sheets in, and here we are.

    The first two lines define the document type (basic DTD type stuff). The third line declares what sort of format our output is intended to be interpreted as (You have to say “this is HTML” or the browser can assume it isn’t). The “indent = yes” is a convenient way to tell the output stream to enforce proper indentation of sub elements — pretty cool.

    Lines 5 to 23 define one xsl:template element. The “match=’/’ “attribute says that the root node of the document (in this case ) should be reformatted to look like an html page with a few styles for our ordered lists. Line 19 tells it to take the stuff inside the root node (the body of the xml document) and apply templates to it, and insert it here.

    Lines 25 to 27 define another xsl:template element saying all first level li elements should be wrapped in H1 tags.

    Lines 29 to 33 say that any sub (no matter what level it is at) should just wrap things in an ordered list element.

    Lines 35 to 46 say that li elements that go deeper into the tree should be wrapped in smaller and smaller headers.

    You can view the results here
    Now, view the source for the resulting page. Notice it isn’t the HTML, it is the outline XML. Which means that the way I made it work isn’t even visible to the end user (unless they can read XSL files).

    So, what is this doing in a blog about databases? I’m glad you asked. Some people believe that XML is the database language of the future. For instance, the QuickBooks SDK passes data back and forth via XML. You could take a customer element directly from a QB request and use XSLT to make it viewable in a browser in whatever HTML format you want it to have. If nothing else, XSLT is an interesting language.

    XSLT could ultimately be used by someone to make a unified user interface markup language. You could define a markup language very similar to HTML that is more geared toward user interface programming rather that document formatting. Then, user interface designers would use the new language to design user interfaces, which could then be made viewable in a browser via one XSL style sheet, and viewable in an application framework designed to build forms based on the XML layout. It really is an interesting language, it opens the doors to all kinds of possibilities.

    Oracle PL/SQL Associative Arrays Example

    I used this example in a presentation the other day, so I figured I’d post it up for anyone who is interested. This uses features that are new to Oracle 11g (such as Indexing by a VARCHAR2 key).

    Sample Code showing use of Associative Arrays

    declare
    -- create an associative array data type
        type account_map
          is table of varchar2(40 char)
          index by varchar2(5 char);
    
    -- the associative array we will be working with
        asset_accounts account_map; --no need to initialize
    
    -- create a basic VARRAY with the keys we will be using
        type account_keys is varray(4) of varchar2(5 char);
        asset_account_keys account_keys := account_keys('CASH','AR','INV','PPE');
    
        current_key varchar2(5 char);
    begin
    --  add some values, notice we don't need to extend the associative array
        asset_accounts('CASH') := 'Cash or cash equivalents';
        asset_accounts('AR') := 'Accounts receivable';
        asset_accounts('INV') := 'Inventory';
        asset_accounts('PPE') := 'Property, plant and equipment';
    
    -- If we know the keys, we can display the values:
        dbms_output.put_line('--- by direct value ---');
        dbms_output.put_line('CASH: ' || asset_accounts('CASH'));
        dbms_output.put_line('AR: '  || asset_accounts('AR'));
        dbms_output.put_line('INV: '  || asset_accounts('INV'));
        dbms_output.put_line('PPE: '  || asset_accounts('PPE'));
    
    -- if we have a nice varray or nested table that knows the keys, we can use it
        dbms_output.put_line('--- by table of lookup keys ---');
        for i in 1..asset_account_keys.count loop
            dbms_output.put(asset_account_keys(i) || ': ');
            dbms_output.put_line(asset_accounts(asset_account_keys(i)));
        end loop;
    
    -- if we don't know what is in there, we can discover the keys.
        dbms_output.put_line('--- by first / next discovery ---');
        current_key := asset_accounts.first;  -- 'AR' is alphabetically first
    
        -- we loop until we run out of keys
        while not current_key is null loop
            -- display the element for the current key
            dbms_output.put(current_key || ': ');
            dbms_output.put_line(asset_accounts(current_key));
    
            -- get the next key
            current_key := asset_accounts.next(current_key);
        end loop;
    
    -- we can read them backwards too
        dbms_output.put_line('--- by last / prior discovery ---');
        current_key := asset_accounts.last;  -- 'PPE' is alphabetically last
    
        -- we loop until we run out of keys
        while not current_key is null loop
            -- display the element for the current key
            dbms_output.put(current_key || ': ');
            dbms_output.put_line(asset_accounts(current_key));
    
            -- get the previous key
            current_key := asset_accounts.prior(current_key);
        end loop;
    
    end;
    /