Tuesday, November 13, 2007

Seven Oracle Database peculiarities for Microsor SQL Server Developer

Few weeks ago I've finally managed to complete the Oracle Database driver for my custom Object Relational Mapping engine (I hope to write a tutorial on writing your own ORM engines soon) and the next release of one of our applications finally supports three database systems: Microsoft SQL Server, Oracle Database and the Microsoft JET Engine. Three different database systems from one executable, switchable in the runtime.

'What's the big deal' - you could ask - 'there are plenty of ORM engines here so why did you waste you time to implement yet another one'?

Well, I hope to write more on it soon, nevertheless the core of the engine dates 2003 when I had no general idea of the ORM pattern and there were definitely no .NET ORMs available.

The engine evolved through years and has been used in few succesive business application widely used. In the same time I've used NHibernate, Willson's ORM and DevExpress XPO for other projects and 've been playing with few more so I think I can clearly see the pros and cons of variety of engines.

But back to the main topic. I suppose that you reading this are the .NET developer who usually works with the Microsoft SQL Server. However your goal, given to you by someone else or by yourself, is to finally get in touch with this mysterious world of Oracle Databases.

Do not worry, this does not hurt. After just few experiments you will notice that the standard System.Data.OracleClient library works correctly in most cases so there are no major technical problems.

What will surely surprise you are the peculiarities or rather fundeamental differences between Oracle and Microsoft databases, which will make you rethink several important solutions you are accustomed to.

Let's then start enumerating these differences:

1. Oracle Database Instance = Microsoft SQL Server Instance

In the world of Oracle the two terms, a server and a database have completely different meaning which is really confusing SQL Server Developers. When you run something you would normally call "a server instance" it is in fact called "the database instance". Inside this "database instance" ("server instance" in SQL Server terminology) you have "schemas" ("databases" in SQL Server terminology). The rest works in the same way - "schemas" ("databases") are colletions of tables.

Yet another confusing issue however, is the equivalence of Oracle terms "schema" and "user". This stems from the fact that each physical user connecting to the Oracle database using his/her own credentials owns his/her set of tables. Everytime you refer to a table in a SQL query you have to provide its full name consisting of the name of table's owner (schema or user whatever you call it) and the table name. You just cannot write:

SELECT * FROM THETABLE

or actually you can but this will always refer to THETABLE in your own schema (your own "database" in SQL Server terminology).

Whenever you plan to use different schemas ("databases" in SQL Server) which is quite common practice, you have to remember that no matter what identity you use to connect to the database, you have to fully qualify the names of objects in different schemas (databases). So you always have:

SELECT * FROM SCHEMANAME.THETABLE

Why this is peculiar? It seems that the typical question:

"How do you allow you application to create different databases and let the user select one database to use in his session"? makes Oracle guys helpless. It seems that the idea of granting users rights to access other databases is rather uncommon. Later I will show you how to grant users access to other schemas (databases) and how to retrieve a list of available schemas (databases).

2. Oracle names cannot exceed 30 characters

This is really annoying one. Having once my SQL Server database with only about 70 tables, I've found about 50 objects (table names, column names, primary and foreign key names) which were longer than 30 characters.

A typical example. How would you call a many-to-many table linking A_THE_TABLE_WITH_NAME and B_THE_ANOTHER_TABLE?

MM_THE_TABLE_WITH_NAME_THE_ANOTHER_TABLE?

Wrong! Oracle will not like you.

The refactorization of your database to meet this peculiar limitation will surely hurt. Be aware that converters (for example the Migration Workbench) usually just trim the names without even making any notes. Later on you'll just find out that the application does not work on the Oracle because you got

MM_THE_TABLE_WITH_NAME_THE_ANO

I hope I've "emulated" the trimming correctly :)

3. Empty literals are stored as NULLs

What would you expect from

INSERT INTO THETABLE (name) VALUES ('')

with [name] as the NOT NULL column?

If you'd expect an empty literal inserted into the table, you are wrong. Oracle magically converts empty literals into NULLs and in the above example you'll just get an error saying that NULLs are not allowed here. There is a long discussion whether this is or is not a bug. Please refer to Google to find out what both sides of the discussion think about it.

I prefer to solve the issue by replacing empty strings with the hard space (\u00A0) but nevertheless, you have to take care of it by yourself.

4. There is no IDENTITY datatype

Accustomed to defining a primary key as the INT IDENTITY (1,1) NOT NULL?

Forget it. There's no auto identity in Oracle. Instead there are sequences. A sequence acts like a table which can be asked for a nextval which makes the sequence increment its current counter. A typical usage is like this:

SELECT SCHEMANAME.SEQNAME.nextval FROM dual;

There are unfortunately different strategies of modelling the SQL Server's IDENTITES in Oracle using sequences. One of the most widely suggested approaches is to define a trigger which modifies the ID field automatically after a record is inserted into the database with just dummy ID value. How are you supposed to get the inserted value in such approach? Well, you can ask for the current value of the sequence but what if someone else managed to retieve few values from the sequence in the meantime?

Frankly, I do not like this approach just because Oracle guys are not able to calm my doubts on the concurrency here.

What I prefer is to:

  • ask the sequence for a next value
  • use this value inside the INSERT query

This approach has its own distadvantage - it requires TWO queries to insert anything into the database. The good thing is that the concurrency issues are gone.

5. Structural queries are not transactional

Application usually live long and many times you will upgrade the database structure.

My typical approach is to store the database version numer in one of tables and prepare set of scripts in the application. When the app starts it retrieves the version number from the database and executes scripts in a sequence, one after the other.

Scripts are usually complicated. They create tables, columns, drop objects and manipulate the data. In SQL Server you are safe just when you put the whole script inside a transaction - no matter what you do, all changes are rollbacked in case of an error.

Oracle does not work like this. If a query modifies the structure of the database, it will not be rolled back. When a script fails, every query that modifies the structure remains commited and every query that manipulates the data is rolled back.

What is the state of the database AFTER such rollback? God only knows.

6. There are no different numerical types

In Oracle Database all numeric types are of type NUMBER(X,Y) where X and Y define the precision. In your client code you'll then see every number as System.Decimal even though you'd expect bytes, shorts or ints like are accustomed to.

and my favourite 7. All names of objects are converted to upper case by default

Be aware that

CREATE TABLE MySchema.TheTable ...

will not create TheTable table but THETABLE table. The same rule applies to all other database objects. How are you supposed to refer to objects that use CamelCasing for example?

Well, just remember to use ' around the names:

CREATE TABLE 'MySchema'.'TheTable'.

Again, Oracle guys seem to be rather surprised that you'd expect the database system to keep your own casing rather than converting everyting to uppercase.

'It makes you less error prone' - the say.

'It confuses me' - I say.

Conclusion

If I am wrong in any of above statements, please feel free to correct me. After a year of experiences I think I slowly start to live with all these issues, however the first contact with the Oracle Database was not easy. I wonder if I would feel the same but doing the opposite ie. moving from Oracle to SQL Server.

2 comments:

Grzegorz Danowski said...

I liked the article because I currently working on my first project using Oracle database and I have similar observations.
Additionally I don't like Oracle stored procedures that retrieve data to .NET application (using Sys_RefCursor objects). In my opinion they are more complicated and harder to test compare to SQL Server procedures.

Unknown said...

I have to agree, although Oracle is a very strong rdbms, all uppercasing and no out-of-the-box 'identities' are a real pain, it feels like the 80s or early 90s again. Back then most systems (dbms/programming languages/OSes) were upper-cased and case-insensitive. I think most people agree that things are 'better' nowadays, so Oracle:
get to work !