Example Email Database Schema

My Simple Example Database Schema

I decided to draw up a simple database schema to start using for examples on the site. In today’s blog entry I will describe the different tables and their relationships, and, in the future, I will post up sql scripts for creating the tables in different sql dialects.

I selected an email database as my example schema because I wanted a schema that most people are familiar with so I can explain database concepts instead of spending too much time explaining the schema itself.

My table names are singular; I like it that way, if you want to comment on that, do it someplace where others can read about the root issue (this blog is about this schema, not about the singular plural debate) like this post.

I’ll post Sql Scripts for generating this database in various DMBS’s here:

The Schema



Id : Integer <<PK>> <<auto incr>>

ParentId : Integer <<FK-Folder>>

DisplayName : String(64) <<not null>>

Description : String

Folders are organized into a hierarchical tree based on the ParentId as the self referencing field. I wish I could make the ParentId field not null constrained, but the root folder has no parent. I might someday blog about ways to approach this problem.

Each email message will be stored in exactly one Folder. That relationship is defined in the Message table.



Id : Integer <<PK>> <<auto incr>>

FriendlyName : String(128)

UserName : String(64) <<not null>>

DomainName : String(128) <<not null>>

DisplayName : String <<read only>>

A row in the Address table represents a unique email address.

The FriendlyName is the actual person’s name or nickname usually found before the email address.

The UserName is the portion of the email address before the @ symbol, should always be stored as lower case.

The DomainName is the portion of the email address found after the @ symbol, should always be stored as lower case.

The DisplayName is a read only field calculated (or concatenated) from the other fields. I figure it will be so common to concatenate the above fields together into a readable email address, I created this field to do it for you.

The table should be constrained to not allow multiple records with exactly the same lowercase(username@domainname).



Id : Integer <<PK>> <<auto incr>>

FolderId : Integer <<FK-Folder>> <<not null>>

SenderId : Integer <<FK-Address>> <<not null>>

SentOn : DateTime <<read only>> <<not null>>

Subject : String(1024) <<not null>>

EmailBody : String <<not null>>

IsUnread : Boolean <<not null>> <<Default: True>>

IsJunk : Boolean <<not null>> <<Default: False>>

IsStar : Boolean <<not null>> <<Default: False>>

Priority : Integer <<Between 1 and 5>> <<Default: 3>>

A row in the Message table represents a single email message.

I think the model here is fairly self explanatory, I will however point out that by defining the FolderId and SenderId here, we limit ourselves so that a Message can only be stored in a single folder at a time, and can only be sent from a single email address. I don’t see either of these as a problem.

If, in the future, we decide we want a message to be stored in multiple folders, we could consider making virtual folders that have links to messages rather than attempt to keep track of a single message actually being in two folders.

Of course, an email header contains the sender along with the time it was sent, as well as other data that could be significant. So, why do we have fields for SenderId, SentOn and Subject? Mostly so we can sort and index based on those fields.



MessageId : Integer <<FK-Message>> <<not null>>

HeaderType : char(4) <<not null>> <<{to, from, cc, bcc, r-to}>>

AddressId : Integer <<FK-Address>> <<not null>>

This table is used to define a many to many relationship between Messages and Address. This sort of table is necessary because a single message could be sent to multiple “to” addresses as well as multiple “cc” addresses. So, each link has a header type so we can know the nature of each relationship between an email and its related addresses.

Even though an email header can contain tons of somewhat useless information, there are a few pieces of the header that are of interest from a data seaching standpoint. I had considered making an associative array style table that could store all the header fields from every message, but decided against it. But, the From, To, CC, BCC fields are of interest and might want to be queried against some day, so I made this table to allow many email addresses to be linked with a single email based on their relationship as defined in the header.



Id : Integer <<PK>> <<auto incr>>

FileName : String(512) <<not null>>

Extension : String(64) <<not null>>

Content : LargeBinary <<not null>>

FileSize : LongInteger <<not null>>

Hash : String(32) <<not null>>

A row in the Attachment table represents a single file attached to an email.

I would have included a foreign key field to link an attachment to the email to which it is attached, but attachments can be rather large, and it is somewhat common for the same attachment to be attached to multiple emails. So I decided to create the Hash and Size fields as a way of detecting whether or not a file is an exact duplicate of an existing file.

The take home message here is that the relationship between Messages and Attachments is many-to-many meaning one message can have multiple attachments, and one attachment can belong to multiple message. In order to keep track of a many to many relationship, we need a separate table where one record represents a single link between the two tables. (See the MessageAttachment table defined below).

The Hash field will be an MD5 checksum generated from the Content of the file.



MessageId : Integer <<FK-Message>> <<not null>>

AttachmentId : Integer <<FK-Attachment>> <<not null>>

This table is a simple many to many relationship table.

Did you enjoy this post? Why not leave a comment below and continue the conversation, or subscribe to my feed and get articles like this delivered automatically to your feed reader.


No comments yet.

Leave a comment