SQL Server partitioning (III) — T-SQL to setup table partitioning (more complicated example)

To optimize performance of your warehouse, you can create multiple file groups within the database, perhaps one file group for each month. You could do so with statements similar to the following:

ALTER DATABASE Transaction_Warehouse  ADD FILEGROUP FG2005January   
ALTER DATABASE Transaction_Warehouse  ADD FILEGROUP FG2005February

Next add files to each file group with statements similar to this:

ALTER DATABASE Transaction_Warehouse  ADD FILE    (NAME = N'2005January',    FILENAME = N'E:\MSSQL\Data\2005January.ndf',    SIZE = 500MB,    MAXSIZE = 50000MB,    FILEGROWTH = 100MB)  TO FILEGROUP [FG2005January]

Define a partition function to assign each partition a range of date and time values.

CREATE PARTITION FUNCTION Transaction_Warehouse_PFN(datetime)  AS  RANGE LEFT FOR VALUES ( '20050101 12:00:00.000',      '20050228 23:59:59.997',                          '20050331 23:59:59.997',      '20050430 23:59:59.997',      '20050531 23:59:59.997',      '20050630 23:59:59.997',      '20050731 23:59:59.997',      '20050831 23:59:59.997',  '20050930 23:59:59.997',  '20051031 23:59:59.997',  '20051130 23:59:59.997',  '20051231 23:59:59.997')

The partition function will be used by the partition scheme to assign the range of dates to each file group:

CREATE PARTITION SCHEME Transaction_Warehouse_Partition_Scheme  AS  PARTITION Transaction_Warehouse_PFN  TO (  [FG2005January],  [FG2005February],  [FG2005March],  [FG2005April],  [FG2005May],  [FG2005June],  [FG2005July],  [FG2005August],  [FG2005September],  [FG2005October],  [FG2005November],  [FG2005December],   [PRIMARY])

Next create the table against the partition scheme as opposed to placing it on a single file group, as follows:

CREATE TABLE [dbo].[fact_warehouse_transaction](   [transaction_ident] [int] NOT NULL IDENTITY(1, 1),   [natural_key] [decimal](18, 0) NULL,   [dimension1_ident] [int] NULL,   [dimension2_ident] [int] NULL,  --… (many other columns)  [transaction_date_and_time] [datetime] NOT NULL  ) ON  [Transaction_Warehouse_Partition_Scheme] (transaction_date_and_time)

Using this schema, SQL Server will stripe fact_warehouse_transaction table records across twelve files (each associated with a single file group) based on the value of transaction_date_and_time column. This configuration is completely transparent from the SELECT / INSERT/ UPDATE / DELETE statements affecting the table. The statement doesn’t have to specify which partition is to be affected.

You can partition indexes much the same way as you partition tables. For example, you could create the following partitioned index on transaction_date_and_time column:

CREATE INDEX ix_transaction_warehouse_partitioned ON  [dbo].[fact_warehouse_transaction] (transaction_date_and_time)  ON Transaction_Warehouse_Partition_Scheme (transaction_date_and_time)

The advantage of table partitioning is that instead of scanning a gigantic table (or index), each query will affect a considerably smaller subset of the table. Furthermore, SQL Server can actually retrieve data from multiple partitions in parallel if the query needs to return data from more than one partition.

You can exploit partition swapping functionality in order to significantly reduce the downtime associated with loading data into the fact tables. If two tables are built using identical partition schemes, you can swap partitions so that the data in table A will be transferred into table B. In reality there is no data movement, only the metadata is being updated. Because of this, swapping partitions typically takes less than a second.

Let’s suppose you need to load data for December of 2005 into fact_warehouse_transaction. So far the table contains data for all other months of 2005. You can create another table using the same partition scheme, with identical column names using the statement similar to the following:

CREATE TABLE [dbo].[fact_warehouse_transaction_clone](   [transaction_ident] [int] NOT NULL IDENTITY(1, 1),   [natural_key] [decimal](18, 0) NULL,   [dimension1_ident] [int] NULL,   [dimension2_ident] [int] NULL,  --… (many other columns)  [transaction_date_and_time] [datetime] NOT NULL  ) ON  [Transaction_Warehouse_Partition_Scheme] (transaction_date_and_time)

Next you can load December 2005 data into fact_warehouse_transaction_clone. Since this table is simply a temporary working area locking mechanism SQL Server employs to populate this table really doesn’t matter. We’re only concerned with fact_warehouse_transaction availability. Once the clone table is populated you can issue a statement like following:

ALTER TABLE fact_warehouse_transaction_clone  SWITCH PARTITION 12 TO fact_warehouse_transaction PARTITION 12

This statement will take just a few seconds at most and result in having fact_warehouse_transaction being populated with all December 2005 data. At that point you can drop fact_warehouse_transaction_clone table.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s