CTP for SQL Server 2008 now available
Its final release has been pushed back to as much as six months after Microsoft's big launch party for it, which is still slated for next week. In the meantime, potential customers are being given a taste of some new and game-changing functionality.
Yesterday afternoon, Microsoft posted its Community Technology Preview for SQL Server 2008, which will be general businesses' first, best look at the next edition of what analysts perceive to be Microsoft's fastest growing product line -- faster than even Windows itself.
But one of the key benefits of the new RDBMS' architecture may be something businesses will want to try in a safe environment such as a virtual server, unless they're absolutely certain they're ready to divorce themselves from SQL Server 2005. It's Microsoft's technique for data compression, and although it was introduced in SS 2005 SP2, with SS 2008 it has extended its reach to literally all data types within a table.
Typically, when you think of "data compression," your mind conjures ZIP files or Lempel-Ziv algorithms. That's not what this is about: With an active database, which is almost as fluid a substance as can be stored in memory or on disk, compression is a very tricky thing. On a static basis, compression has been tried at the page level (with groups of records in a table), but the excess overhead for the compression dictionaries and the time spent in maintaining those -- and repairing them -- has often proven not to be worth the bother.
It was IBM that pressed the issue on data compression, having tinkered with a new scheme on DB2 since about 1998 before finally, formally rolling it out in 2002 for DB2 Version 8. It's a row compression concept that doesn't require much excess coding: Essentially, when you create a new SQL table or set up a procedure for altering an existing one, you declare the new table scheme to be compressed using the declaration COMPRESS YES.
What IBM's scheme then does is take certain columns of information within a table -- columns whose data tends to be variable in length anyway -- and replaces it with links to a hidden index table called a compression dictionary. That table is then treated differently, using data compression techniques that would otherwise wreak havoc if applied to the entire data table.
Prior editions of DB2 tried similar approaches to data compression, with some admins reporting the results were quite the opposite. Data dictionary size grew out of proportion, to the extent that it might have been better if "compression" had never been applied.
The popularity of the existence of data compression, in any form, in DB2 got Microsoft to thinking about how to go about the task for itself, in a way that was compatible with its own, very different schematic. Granted, SS 2005 already offered file compression (note the distinction), though it only worked properly for read-only databases. But for SQL Server 2005 SP2, Microsoft began rolling out an alternate take on the concept, in the form of a new variable type for data in records, called vardecimal.
"This storage format can be enabled at a table-level granularity," wrote Microsoft engineers Sunil Agarwal and Hermann Daeubler in 2007. "When enabled, SQL Server stores decimal and numeric data in the variable portion of the row instead [of] the fixed portion. You can use vardecimal storage format to reduce the size of your database if you have tables with decimal and numeric data types. How much space you save depends on the number of decimal or numeric columns, the data distributions, and the size of the table(s)."
Imagine in your mind a column full of people's addresses. In the existing varchar format, the spaces that would fill the remainder of each entry following the end of an address, would be truncated instead of stored as zeroes, which would consume excess space. With vardecimal, the concept is applied to value columns. Although four bytes may be required to represent the highest values of any entry in a column, it doesn't necessarily have to consume four bytes for every entry, especially for some whose contents are short integers like 0 or 15.
While vardecimal was introduced as a "try-it-you'll-like-it" concept in SS 2005 SP2, for SS 2008, it becomes the standard storage format for all values. That fundamentally changes the constitution of the database.
But it does not change the semantics of the database program, which will still look and behave as though fixed storage is being applied. As Agarwal wrote last November, "One important point to notes is that even though the SQL Server stores these data types in variable length format, the semantics of the data type remains unchanged (i.e., it is still the fixed length data type from the perspective of the application). This means that you can avail the benefits for data compression without requiring any changes in your application(s)."
Some other additions testers can expect to find in SS 2008 include integrated full-text search, which utilize high-speed text-searching algorithms (the kind you'd find in Internet search engines) in SQL Server for the first time; and policy-based management, which extends to SS the principle of administration by rule that may become the hallmark of Windows Server 2008.