How to select something “from dual” in a non-Oracle database

What is the dual equivelant in Postgres, MySQL, Sqlite, or Ms SQL Server?

I had someone ask me yesterday what Sqlite calls the dual pseudotable (for those who don’t know, dual is a heavily used, Oracle specific concept).

I started at him blankly for a minute and thought how best to answer without laughing. But honestly its a good question.

The answer is simple enough too:

You drop the FROM clause.

For Example

If you want to

select 'hello world'
from dual;

in any SQL dialect other than Oracle, you simply say this:

select 'hello world';

So, why does oracle need a from dual in the first place?

I don’t know, but I think it has something to do with a couple of water coolers (this is not a true story as far as I know).

You see, at oracle, when they were putting together early drafts of the SQL standard, they had their little select clauses, and in the original specification, some things were optional (like where clauses and order by clauses) but the from clause was not optional. I mean, why would you select something from nowhere.

That is where the water cooler comes in. A couple of developer guys down at the water cooler are taking a break because they are each struggling with something or other in the code, and the first guy vents his frustration I have to select something but it isn’t really coming from a specific table, what do I do for a from clause? — the other guy has his own equally perplexing problem, and he vents it out.

Then, they come up with a crazy idea to kill both birds with one stone. A dual solution. They will create a pseudo table named dual and it will fix both of their problems. And off they went.

The other half of the story happens at another water cooler at the development offices of another SQL database engine (not sure who). Two very similar programmers are at the water cooler, and the same question comes up. The other guy just says “Duh, let’s just make the from clause optional!” which they do.

So, the real question is not why don’t other databases have a dual pseudo table, the real question is this: why does Oracle still require the use of from dual when there isn’t a true target table? My guess is that, whoever came up with dual was so impressed with themselves, that they just can’t let it go – but that’s just a hunch.

Oracle: Insert All / Sequence Headaches (Error: ORA-02287)

I am really quite fond of the Oracle’s INSERT ALL statement, despite the occasionally critical blog entry.

The flaw I wish to harp on today is the error that pops up when you try to use a sequence in the INSERT ALL statement. I, along with countless other DB developers, use sequences just about every time I add a record to a database.  It seems very short sighted to restrict the INSERT ALL statement so that they can’t handle sequences.

This blog entry will show how I was able to bypass this limitation and make productive use of INSERT ALL.

I intend to show how to overcome this particular error:
ORA-02287: sequence number not allowed here

First, an example of a few statements that will fail, and some comments on why.

Suppose you have one table that has a one to many relationship with another table. Oracle allows you to use a single statement to add a row to the parent and also add one or more rows to the child table (or a whole bunch of child tables really) all in one convenient INSERT ALL statement.

Unsuccessful Attempts

Here is what I tried that I thought should have worked.

-- this doesn't work, but should
-- ORA-02287: sequence number not allowed here
insert all
  into ordered_list values (list_id, list_title)
  into ordered_list_node values (list_id, 1, red_id)
  into ordered_list_node values (list_id, 2, yellow_id)
  into ordered_list_node values (list_id, 3, blue_id)
select master_sq.NextVal as list_id
  ,    'Primary colors' as list_title
  ,    (select id from color where title = 'Red') as red_id
  ,    (select id from color where title = 'Yellow') as yellow_id
  ,    (select id from color where title = 'Blue') as blue_id
from dual;

The part that doesn’t work in on line 8, where I pull a value from the SEQUENCE named master_sq. It turns out that, even though this statement selects only a single row, it still could get confused and pull multiple values from the sequence. I know that it can get confused because of an attempt I made (described below) to cheat and bypass the restriction on using a sequence. Rather than fix that problem, they just make it raise an error.

I feel that raising an error is a real cop out, because it makes sense that an obvious use of the INSERT INTO statement is to insert a row into a parent table along with its child elements. Why else would you do an insert into multiple tables at once, except to reuse the primary key of the main table? Why they don’t allow reading from sequences is beyond me, because sequences are where we get our primary key values from in the first place. But, I digress, lets solve the problem and not rant about the need for a solution.

So, the next thing I tried was to pop the master_sq.NextVal BEFORE the INSERT INTO statement, and then just use CurrVal instead. I figured this would work because it doesn’t run the chance of being called multiple times and doing something unexpected.

Here’s the next piece of SQL I tried that failed, prompting this whole blog entry:

-- this doesn't work, but could if they wanted to allow it
-- ORA-02287: sequence number not allowed here
select master_sq.nextval from dual;

insert all
  into ordered_list values (list_id, list_title)
  into ordered_list_node values (list_id, 1, red_id)
  into ordered_list_node values (list_id, 2, yellow_id)
  into ordered_list_node values (list_id, 3, blue_id)
select master_sq.CurrVal as list_id
  ,    'Primary colors' as list_title
  ,    (select id from color where title = 'Red') as red_id
  ,    (select id from color where title = 'Yellow') as yellow_id
  ,    (select id from color where title = 'Blue') as blue_id
from dual;

I know this COULD work because one of the solutions I came up with uses this method. But, someone at Oracle decided it would be easier to just block all uses of sequences in the select clause of an insert all statement.

This Works, But I Don’t Like It

The next thing I tried was this, it does work, but defeats the purpose, and isn’t as elegant as what I ended up with in the end:

insert into ordered_list (title) values ('Primary colors');

insert all
  into ordered_list_node values (list_id, 1, red_id)
  into ordered_list_node values (list_id, 2, yellow_id)
  into ordered_list_node values (list_id, 3, blue_id)
select (select id from ordered_list where title = 'Primary colors') as list_id
  ,    (select id from color where title = 'Red') as red_id
  ,    (select id from color where title = 'Yellow') as yellow_id
  ,    (select id from color where title = 'Blue') as blue_id
from dual;

In this chunk of code, I insert the parent row first, and then I find its id using a select query when I create the child statements. I think this is a bad solution for two reasons. (1) you have to fetch the record you just inserted using something other than the primary key, which sometimes isn’t going to be unique, and creates overhead of searching a table for no good reason. And (2) it isn’t one statement, so there is overhead of sending and compiling 2 statements to the server, and receiving two responses from the server.

The Right Way

Here’s the most elegant thing I could come up with. Instead of using the Sequence down in the select clause, you put it up in the values clauses. The parent record grabs the nextval and the child records grab the currval. Like so:

insert all
  into ordered_list values (master_sq.nextval, list_title)
  into ordered_list_node values (master_sq.currval, 1, red_id)
  into ordered_list_node values (master_sq.currval, 2, yellow_id)
  into ordered_list_node values (master_sq.currval, 3, blue_id)
select 'Primary colors' as list_title
  ,    (select id from color where title = 'Red') as red_id
  ,    (select id from color where title = 'Yellow') as yellow_id
  ,    (select id from color where title = 'Blue') as blue_id
from dual;

Now, even though this solution uses one statement, and doesn’t need pl/sql, it still has its drawbacks. For instance, if a trigger on my child table pulled from the sequence in the background, that could be bad.

Also, Oracle just happens to process these inserts sequentially, but SQL is not a sequential language. What if down the road they did the inserts in reverse order, or each on its own thread? Who knows how portable this answer is? On the other hand, it seems to work well, and these questions are the reasons we write unit tests and test things before we move to a new generation of Oracle’s server — right?

I really wish that we could have “selected” the sequence.nextval down below into a single row result set that didn’t keep hitting the sequence for a new value each time it was used, becuase then we wouldn’t have to worry about any of this. But, I guess you cant have your cake and eat it too.

The 1st Runner Up (Previously the Right Way)

This solution is not as good as the “Right way” because it requires a little PL/SQL magic, and it uses two statements, so it has a little more overhead.

-- You'll need to do some things (described below) in order for this to work
select master_sq.nextval from dual;

insert all
  into ordered_list values (list_id, list_title)
  into ordered_list_node values (list_id, 1, red_id)
  into ordered_list_node values (list_id, 2, yellow_id)
  into ordered_list_node values (list_id, 3, blue_id)
select getCurrVal('master_sq') as list_id
  ,    'Primary colors' as list_title
  ,    (select id from color where title = 'Red') as red_id
  ,    (select id from color where title = 'Yellow') as yellow_id
  ,    (select id from color where title = 'Blue') as blue_id
from dual;

As you can see, I wasn’t able to overcome the need to execute a separate SQL statement for updating the sequence so it contains a unique value for us to use. But, I did eliminate the need to search the table in order to find the recently inserted id.

I use the first select statement to “claim” a new ID from the sequence that we will use when we insert the record with its children.

To get it to work, all you need to do is create a PL/SQL function to read the current value of the sequence rather than read it right there in the INSERT ALL statement. Here is the function I used…

create or replace function getCurrVal
( seq_name varchar2 )
return number is
  return_value number;
begin
  execute immediate 'select ' || seq_name || '.currval from dual'
    into return_value;
  return return_value;
end;
/

The Wrong Way

Before I came to the above solution, I tried this out, which works in the sense that it compiles and Oracle will run it without complaining, and records will be inserted. But, it fails in the sense that the parent table’s id that was pulled from the sequence gets pulled several times, and all the child records don’t actually get a reference to the correct parent.

--DON'T do this, it does bad things.
insert all
  into ordered_list values (list_id, list_title)
  into ordered_list_node values (list_id, 1, red_id)
  into ordered_list_node values (list_id, 2, yellow_id)
  into ordered_list_node values (list_id, 3, blue_id)
select getNextVal('master_sq') as list_id
  ,    'Primary colors' as list_title
  ,    (select id from color where title = 'Red') as red_id
  ,    (select id from color where title = 'Yellow') as yellow_id
  ,    (select id from color where title = 'Blue') as blue_id
from dual;

The fact that this happens is really quite disturbing. Not just because it is inconvenient.

Think about this with me. Oracle doesn’t just run the getNextVal PL/SQL function once when it selects all the values that will be spread out across the tables we are inserting into, it actually runs the PL/SQL function once for every row that reads the list_id field as part of its insert operation.

That could have unforeseen consequences if you don’t know that, and, it seems awfully performance heavy to be switching context between SQL and PL/SQL all the time like that. But, in the end, that is up to them to fix, this solution works, and seems to perform better than the other alternatives.

Example

Here is a full blown example from start to finish that illustrates using this method.

-- this sequence keeps a unique id across all tables that
-- are (or may some day be) listable so that the records they
-- reference are unique, even across multple tables.
create sequence master_sq;

-- create tables for organizing listable items into ordered lists
create table ordered_list
  ( id     number
  , title  varchar2(120)
  );

create table ordered_list_node
  ( list_id        number
  , list_index     number
  , item_id        number
  );

-- create an auto-sequencing id for the ordered_list table
create or replace trigger ordered_list_auto_sequence
  before insert on ordered_list
  for each row when (new.id is null)
begin
  select master_sq.nextval into :new.id from dual;
end;
/

--  some fun test data
create table color
  ( id number
  , title varchar2(20)
  );

create or replace trigger color_auto_sequence
  before insert on color
  for each row when (new.id is null)
begin
  select master_sq.nextval into :new.id from dual;
end;
/

-- add colors to our color table, in no particular order
insert all
  into color (title) values ('Red')
  into color (title) values ('Violet')
  into color (title) values ('Green')
  into color (title) values ('Blue')
  into color (title) values ('Indigo')
  into color (title) values ('Orange')
  into color (title) values ('Yellow')
select * from dual;

-- make a ordered_list of the primary colors,
-- in the order people usually talk about them
select master_sq.nextval from dual;

insert all
  into ordered_list values (master_sq.nextval, list_title)
  into ordered_list_node values (master_sq.currval, 1, red_id)
  into ordered_list_node values (master_sq.currval, 2, yellow_id)
  into ordered_list_node values (master_sq.currval, 3, blue_id)
select 'Primary colors' as list_title
  ,    (select id from color where title = 'Red') as red_id
  ,    (select id from color where title = 'Yellow') as yellow_id
  ,    (select id from color where title = 'Blue') as blue_id
from dual;

-- make a ordered_list of colors,  in rainbow order
select master_sq.nextval from dual;

insert all
  into ordered_list values (master_sq.nextval, list_title)
  into ordered_list_node values (master_sq.currval, 1, red_id)
  into ordered_list_node values (master_sq.currval, 2, orange_id)
  into ordered_list_node values (master_sq.currval, 3, yellow_id)
  into ordered_list_node values (master_sq.currval, 4, green_id)
  into ordered_list_node values (master_sq.currval, 5, blue_id)
  into ordered_list_node values (master_sq.currval, 6, indigo_id)
  into ordered_list_node values (master_sq.currval, 7, violet_id)
select 'Rainbow colors' as list_title
  ,    (select id from color where title = 'Red') as red_id
  ,    (select id from color where title = 'Orange') as orange_id
  ,    (select id from color where title = 'Yellow') as yellow_id
  ,    (select id from color where title = 'Green') as green_id
  ,    (select id from color where title = 'Blue') as blue_id
  ,    (select id from color where title = 'Indigo') as indigo_id
  ,    (select id from color where title = 'Violet') as violet_id
from dual;

-- Now, lets see the results...
-- pull some data from our test lists
select       clr.title as color_title
from         ordered_list list
  join       ordered_list_node node
    on       node.list_id = list.id
  join       color clr
    on       node.item_id = clr.id
where        list.title = 'Primary colors'
order by     node.list_index;

select       clr.title as color_title
from         ordered_list list
  join       ordered_list_node node
    on       node.list_id = list.id
  join       color clr
    on       node.item_id = clr.id
where        list.title = 'Rainbow colors'
order by     node.list_index;

The output from those last two select statements look like this…

COLOR_TITLE
--------------------
Red
Yellow
Blue

3 rows selected.

COLOR_TITLE
--------------------
Red
Orange
Yellow
Green
Blue
Indigo
Violet

7 rows selected.

Inserting multiple rows in one SQL statement

Background

So, the other day, I opened up a MySQL dump file just to graze over it and see how they did things (you can call be weird if you want to, I prefer to think of it as “curious”). Anyway, I noticed something very cool that I thought was absolutely ingenious. They were inserting multiple rows in a single SQL statement, sort of like this…

INSERT INTO item (id, title)
    VALUES (1,'item1')
    , (2, 'item2')
    , (3, 'item3')
    ;

I instantly had tons of places (and reasons) I wanted to do this. First my reasons:

  1. Most of the overhead of a SQL
    statement (especially an insert) is time spent sending the statement
    to the server. Each statement has overhead, so doing several things in
    a single statement can significantly improve performance.
    This is generally true even if the statement is significantly more complex
    (so long as it isn’t sort intensive).
  2. Updating indexes is often the slowest part of inserting rows into a table,
    and inserting several at one time only rebuilds the index once (after all the rows are
    added) rather than once per row. This can be a significant performance boost.
  3. If you are making a create-database SQL script, or generating
    scripts to import data from an external source (like an excel file) you can
    decrease the total size of the script if you eliminate hundreds (or thousands) of
    INSERT INTO item (id, title) … and can tie them altogether.

So, I started using the multi row insert statements, and have really loved it. I was using them in MySQL and PostgreSQL, and they worked perfectly just as described above, so I sort of assumed it was a SQL standard. Then I tried using it on an Oracle database (11g) and it failed, which is odd because Oracle prides itself in being the first database to implement all the new SQL standards as soon as possible. It turns out this is an extension not a standard.

But, I decided it was worth knowing how to insert multiple rows in a single statement using each of the SQL dialects that I usually work with.

Of course, my ultimate goal isn’t just to see if I could do it with each engine, but to come up with a database neutral way to do it. I wanted to find the holy grail of SQL developers: the plain vanilla way that works everywhere.

This builds on a previous post where I show how to build a simple table with an auto incrementing primary key in each of several SQL dialects.

The Vanilla Way

This works in all the engines I evaluated (MySQL, Sqlite, PostgreSQL, Microsoft SQL Server, and, with minor tweaking it works in Oracle)

INSERT INTO item (title)
    SELECT 'title1'
    UNION ALL SELECT 'title2'
    UNION ALL SELECT 'title3'
    ;

The Better way (that isn’t universally supported)

This works in MySQL 5.1 and PostgreSQL 8.3 and Microsoft SQL Server 2008, but DOES NOT WORK in Oracle 11g, or Sqlite 3

INSERT INTO item (title)
    VALUES ('title1')
    , ('title2')
    , ('title3')
    ;

The Oracle Way

Of course, Oracle has its own way of doing things, so here is the preferred way to insert multiple records in oracle.

INSERT ALL
    INTO item (title) VALUES ('title5')
    INTO item (title) VALUES ('title6')
    INTO item (title) VALUES ('title7')
SELECT * FROM dual;

Comments on the Plain Vanilla Way

Using UNION ALL to build one big table with all the rows you want to INSERT is not elegant,
you have to look at it for a while and do a little bit of head scratching before you
say oh, and then you will probably say why in the heck did they do that?

But it works great, and has two of the three benefits from inserting multiple rows at once.

5 points for all database engines that can do it.

Oracle gets only 3 points because they still require that stupid from dual effectively ruining the possibility of having a truly vanilla way of doing things. That means, if your server is running Oracle, you have to insert it like this:

INSERT INTO item (title)
    SELECT 'title1' FROM dual
    UNION ALL SELECT 'title2' FROM dual
    UNION ALL SELECT 'title3' FROM dual
    ;

Comments on the Better Way

The better way is easy to read, easy to write, easy to maintain, easy to add more rows to the existing query with minial typing. It also has the added benefit of being faster.

In short, it is absolutely elegant. 5 more points for engines that support it.

Comments on the Oracle Way

The INSERT ALL method probably enjoys some speed optimization above using the UNION ALL method, which is probably worth a point (that is why Oracle gets a 4 instead of just 3). I just can’t bring myself to offer more points than that in this case.

INSERT ALL still has the flexibility of allowing (our should I say requiring) a field signature to be provided for each row, this added flexibility probably equates to extra processing time before the insertions take place, but I can’t look under the hood to see for myself so don’t quote me.

This is an example of Oracle taking a cool statement for inserting into multiple tables at the same time, and using it to satisfy the multiple rows in the same table too. I agree it should work (it makes sense that you might want to insert several lines in a child table as you insert one line in a parent table), but I don’t think that that is a good excuse for not supporting the Better way. In fact, I think it is a real cop out.

Why I used UNION ALL instead of UNION

UNION ALL will offer significantly better performance than UNION, because the UNION command assumes that if there are duplicates they should be removed (mathematically speaking, the union of two sets includes only one copy of the overlapped region.) If you use UNION ALL, it keeps both copies of the overlapped region.

Of course, you shouldn’t have any overlapped region, and you certainly don’t want the db engine to sort and then compare all the records to make sure you don’t have any duplicates before it moves forward. UNION ALL doesn’t do any of that sorting and filtering so it will have significant performance improvements over just using UNION.

Points are awarded as follows:

MySQL (v 5.1)
10 out of 10
PostgreSQL (8.3)
10 out of 10
Microsoft SQL Server (2008)
10 out of 10
Sqlite (3)
5 out of 10
Oracle (11g)
4 out of 10

Oracle, catch up already.

Does anyone want to place bets about whether Sqlite will support the better insert statement before Oracle does?

For that matter, does anyone want to place bets about whether Oracle will ever stop requiring the FROM DUAL statement that no one else seems to need?

Maybe I should blog about one or more of the features that oracle really shines in just so people don’t think I have an anti-oracle bias.

Comparison of Auto Incrementing Primary Key feature in several SQL Dialects


The Problem


The whole point in using SQL to interact with a database is for developers to be able to use the same basic language to interact with any database. The intentions were good but, as it turns out, even something as universally desired as having a primary key that automatically increments itself is not standardized, and, in some cases it isn’t even directly supported.

So, here is how you create an auto generated primary key in the current version of each of several popular SQL dialects. I will demonstrate creating a table with an auto indexed primary key, and how to add rows to it in a plain vanilla way (that works from any client). Unfortunately, there is no plain vanilla way to create the auto incremented primary key, so table creation must be done on a per database basis.

Jump to Solution

MySQL solution sing AUTO_INCREMENT
Sqlite solution using INTEGER PRIMARY KEY
Oracle solution using a PL/SQL Trigger
Microsoft SQL Sever solution using IDENTITY
Postgres solution using SERIALIZED
INSERT INTO statement that works for all these SQL dialects

The CREATE TABLE Statments that work

Presented in order by how many points I award the solution

MySQL 5.1

CREATE TABLE item
    ( id INTEGER PRIMARY KEY AUTO_INCREMENT
    , title VARCHAR(50)
    );

Points: 10 out of 10

Comments on the MySQL Way

It seems to me that adding an AUTO_INCREMENT attribute to a column just like a NOT NULL constraint or a DEFAULT property is a very straightforward way of handling the problem. It is extremely obvious what the AUTO_INCREMENT field should do, even without consulting the documentation.

I give them full points because the MySQL way is easy to read, easy to write, easy to remember, and it handles both kinds of the generic insert statements below.



Sqlite3

CREATE TABLE item
    ( id INTEGER PRIMARY KEY
    , title
    );

Points: 9.5 out of 10

Comments on the Sqlite Way

Sqlite generally ignores any data types assigned to a column, and assigns variable length string types to almost everything. The INTEGER PRIMARY KEY option is one of the few exceptions.

Its elegant, clean, not quite as obvious, but very well documented (its question #1 in their FAQ page).

I ding them half a point because there is no obvious way to have another column auto increment. I’ve never had a reason to need that, but it isn’t really an option, and the MySql solution could handle that if I ended up needing it some day. Furthermore, someone coming from another SQL dialect might not consider it intuitive that ALL primary keys are naturally auto generating.



Oracle 11g

CREATE SEQUENCE item_seq;

CREATE TABLE item
    ( id NUMBER
    , title VARCHAR2(50)
    );

CREATE TRIGGER item_auto_increment
BEFORE INSERT ON item
     gap_between NUMBER;
FOR EACH ROW
BEGIN
    IF NVL(:new.id, 0) = 0 THEN
        :new.id := item_seq.NEXTVAL;
    ELSE
        -- make sure the sequence isn't going to overlap the key value just provided
        IF :new.id > item_seq.CURRVAL THEN
            gap_between := :new.id - item_seq.CURRVAL;
            ALTER SEQUENCE item_seq
              INCREMENT BY gap_between;

            -- read the sequence so it jumps the gap
            gap_between := item_seq.NEXTVAL; -- should now be :new.id

            -- now set it back to increment 1 at a time
            ALTER SEQUENCE item_seq
              INCREMENT BY 1;
        END IF;
    END IF;
END;
/

note: The above trigger is not the standard trigger seen in most oracle books, and online tutorials, it is my own derived version. This is the standard auto-incrementing trigger you might find elsewhere.

--not my style, but works great
CREATE TRIGGER item_auto_increment
BEFORE INSERT ON item
     gap_between NUMBER;
FOR EACH ROW WHEN (new.id is null)
BEGIN
    select item_seq.nextval into :new.id from dual;
END;

Points: 8 out of 10

Comments on the Oracle Way

First off, let me say gross. In fact, let me say it again gross. You have to create your own trigger to pull from a sequence. Furthermore, you have to explicitly create the sequence (PostgreSQL was nice enough to do that for you and tell you it did it, I’m not sure how mysql does it, but I doubt it uses a trigger because it supported AUTO_INCREMENT before it supported triggers).

They SHOULD have let us do something nice like this…

 -- THIS DOESN'T WORK!!
CREATE TABLE item
    ( id NUMBER DEFAULT item_seq.nextval NOT NULL
    , title VARCHAR2(50)
    );

But no, it can’t be that easy. (you’ve got to love oracle)

I ding 2 points for sheer uglyness because having to do the whole thing yourself is tedious and error prone. On the other hand, once it is there, it works, and you can do all of what is needed as part of a table creation script, and once it is done, developers can use the table just as easily as the other SQL dialects discussed, so I won’t doc them too much for it.

The fact that you have to build the trigger yourself has one advantage: You can make it behave exactly the same way MySQL behaves, and so it contributes to the “vanilla” factor. Specifically, you can make it handle both NULL and 0 values the same way, and you can make sure that when a value is specified, our sequence can be updated to make sure we don’t try to reuse that value. Ugly, YES, but it works.

Why I like my trigger more than the industry standard

Upon careful inspection of my trigger, there are a lot of people who would argue that it is long and kind of pointless. The normal way uses the WHEN clause to only execute when a null is provided, mine doesn’t, so it is less efficient when an ID is provided. And, it has a lot more code, which means the database has to do more thinking each time I insert a record. Seems inefficient.

To them I say Yes, but… I pretty much always use the trigger to add the key, so it isn’t a big deal that it executes my trigger when I don’t. Furthermore, my trigger protects from the possibility that someone will supply an ID on a new record that is not in sequence, and that, months down the road will create a conflict with the key values generated by the sequence. That will cause a series of strange errors that no one can account for, and that I don’t want to be called about. Its an ounce of prevention in a critical location, and its worth the overhead it might cause on the few cases where people aren’t dong things the normal way.



Microsoft SQL Server 2008

CREATE TABLE item
    ( id INT PRIMARY KEY IDENTITY
    , title VARCHAR(50)
    );

Points:7 out of 10

Comments on the Microsoft Way

My only real comment here is that IDENTITY is not as obvious a name as AUTO_INCREMENT, but the use is simple enough, looks nice, works as expected, … I like it.

I ding a point for using a word that sounds more synonymous with PRIMARY KEY then it does with AUTO_INCREMENT the word IDENTITY has little or nothing to do with picking sequentially growing key values for your primary key.

I ding 2 points for not being able to use the second form of INSERT command (the one that uses the default signature for the table). This happens because they protect you from inserting anything into this column. I have nothing against protection, and in this case I would even agree that it is important protection. But, the end result of this protection is that you can’t use the second form of insert statement because you are specifying a column value (NULL) on a read only field.

Furthermore, often times, when you initially populate tables, you create a bunch of records with specific primary key values, during that phase of database creation, you don’t want the protection so you could create the column as IDENTITY_MODIFIABLE, but that eliminates the protection when you are done with the initial records and you do want some protection.



PostgreSQL 8.3

CREATE TABLE item
    ( id SERIAL PRIMARY KEY
    , title VARCHAR(50)
    );

Points: 6.5 out of 10

Comments on the Postgres Way

Its clean, straightforward enough, and makes sense. I like it but I don’t really love it, here’s why:

I dinged half a point for picking the keyword SERIAL. It is obviously not as good as AUTO_INCREMENT because in their own documentation they explain it by saying it works like AUTO_INCREMENT in “other” databases. I decided not to ding them too bad for this because SERIAL is a much better choice of words than IDENTITY. At least serial implies that it will be assigned sequential values. The first time I saw it, it made sense to me.

I ding one point for replacing the data type “INTEGER” with the pseudo data type “SERIAL” rather than using SERIAL as a modifier. I do this because it is good to know what kind of field it is so you can make your foreign keys match. The current implementation requires you to know what is going on behind the scenes in order to use it anyway. Furthermore, instead of adding one modifier, they created two pseudo types (the other is BIGSERIAL).

To be honest, I really like the idea of having the data type be SERIAL instead of INTEGER, but they way they implemented it ruins the reason I like the serial type. If SERIAL were its own data type, then Postgres could move it to a larger int size down the road and handle that for you when you upgrade. They could even substitute in a GUID or some other magic ID. BUT, instead, it makes it an integer, and you have to know what the data type is in order to reference it. There is no SERIAL_REFERENCE data type. So you hard code it to be an integer in other tables… so there is no benefit from a somewhat mysterious SERIAL datatype.

I ding two more points because the insert statement using the default table signature fails complaining that you have provided a value for the column, which is not allowed (even though NULL isn’t much of a provided value).



Plain Vanilla Insert Statement (works for all of the above)

INSERT INTO item (title)
    VALUES ('some item title');

Insert Statement that works for MOST of the above

INSERT INTO item
    VALUES (null, 'some item title');

Inserting based on the default signature for the table is a common (albeit a little bit quick and dirty) way to add the initial records to a table in the schema creation SQL script. This sort of insert will work with an auto generated key for some of the databases, but not all of them. I dinged the Microsoft implementation and the PostgreSQL implementation of SQL each two points for not handling it gracefully.

I think that supplying a NULL value for a column is exactly the same thing as not supplying a value for the column, and 3 of the 5 database engines seem to agree with me on that, Microsoft and Postgres need to catch up with the game here.

back to the top of this post

QuickBooks SDK from Python 2.5

So, for a particular project, I found myself needing to get into a client’s QuickBooks file from a python script. I had previously done this sort of thing using a COM from another language using the QuickBooks SDK.

Being somewhat new to python, I had never even tried to use it to interface with Windows COM Objects before, but, I decided that I knew enough about COM that I could fill in the gaps on whatever documentation I found and so I decided to give it a shot. I downloaded and installed the QBSDK, from the Intuit site, and got started putting together a script that “dumps” some of the data into some export files.

Of course you are going to have to download and install the Python Win32 Extensions. (Yes, that means you can’t do this on Linux, but honestly, you can’t really run QB on Linux either, so I don’t feel too bad.)

Here is what I came up with:

import win32com.client

query_rqs = (
  'Host',
  'Company',
  'Account',
  'Entity',
  'Terms',
  'Class',
  'CustomerType',
  'VendorType',
  'JobType',
  'PaymentMethod',
  'ShipMethod',
  'SalesTaxCode',
  'Item',
  )

qbxml_header = """



"""

qbxml_footer = """

;
"""

qb = win32com.client.Dispatch("QBXMLRP.RequestProcessor")
qb.OpenConnection("Data Sucker", "Data Sucker")
ticket = qb.BeginSession("",0)

for rq in query_rqs:
    print "Exporting %s(s)" % rq
    output_file = open("c:\\qb-export-data\\%s.xml" % rq, 'w')
    output_file.write(qb.ProcessRequest(ticket, "%s<%sQueryRq>%s" % (qbxml_header, rq, rq, qbxml_footer)))
    output_file.close()
print "Data Export Completed"

qb.EndSession(ticket)
qb.CloseConnection()
qb = None #drop our QB object

The First part where we define ‘query_rqs’ just makes a convenient list of the object types we are going to query from the QB company file. Requesting data is a uniform enough process that I decided to just loop through these names below.

To understand the part about the ‘qbxml_header’ and ‘qbxml_footer’ you simply have to understand that every exchange with the QB SDK is done in XML. this header/footer combination will be used to wrap every request we make, so I put it in a variable for easy use a little lower.

Then, I actually call use the COM API to create a COM object of type “QBXMLRP.RequestProcessor” the request processor is the primary object in the QBSDK. You use it to connect to QuickBooks, and then to send XML requests, and receive XML responses. The QBSDK does not have COM Objects for every QB object, instead it uses XML to pass properties back and forth about the internal QB objects. In this way, they have an API that doesn’t need to change much in order to support different versions and/or editions of their product.

Anyway, the next line opens up a connection and tells QuickBooks that the application called “Data Sucker” wants permission to connect.

If this is the first time that “Data Sucker” has requested access to the company file, QB needs to be open. It will prompt the user for permission.. .. .. and eventually you will connect (or be denied, depending on your user).

Then, we begin a session. the empty string parameter is the name of the company file we want access to. Empty string implies “whatever company is opened by the user right now.” Which was good enough for my purposes. I suppose I could have supplied a path to a QB file, but it wasn’t necessary in this case. I also elected to use Single User Mode (that is what the 0 in the second parameter means). If I am sucking data out of the company file, I don’t want it to be changing between requests — I want a snapshot that agrees with itself.

Then, I loop though all the requests (rq) that I listed up at the beginning of the file. I create an output file with the XML returned by each request that I made (one file per request).

After the loop, I tidy up a bit and look at the generated export files, and can see that life is good.

I have no idea whether or not this would work on a QB install without the QB SDK on the same PC. The Request Processor object might be installed as part of QuickBooks, and the QB SDK might only contain extra resources. If anyone tries this at home, it would be interesting to know if you can run the script without installing the QBSDK.

Also, I know that this only works on the Pro and Premier Editions of QuickBooks. The Basic Edition is intentionally not capable of interfacing with QBSDK — its an upgrade incentive.

As this was also an exercise in COM, I want to point out the last line of the script where I set qb = None. This is the only way I could think of to be sure I had decremented the object reference count and cause the COM engine to destroy the object. I don’t know if Python has issues with COM object destruction, but I do know that early releases of VB6 had memory leaks if you didnt explicitly destroy objects (in some scenarios) so I figured, better safe than sorry.

-Brent