With SQL Server 2005 and 2008, you can partition tables across multiple file groups. Partitioning is not a new concept. Pprevious versions supported partitioned views which could reside on the same server or multiple servers. However, partitioned views had some limitations that prevented their use in many enterprise level applications. Table partitioning allows you to create a single table across multiple disk arrays. When a query is issued, the database engine automatically determines which partition(s) to query and can even retrieve data from multiple partitions in parallel. You can partition indexes built on partitioned tables. It is also possible to associate all partitions with the same file group.
Note: Partitioning is only supported in Enterprise and Developer Editions of SQL Server 2005 and 2008. If you try to restore a partitioned database to an unsupported version of SQL Server, like SQL Server 2008 Standard, the restore will fail.
Suppose you have a data warehouse with the fact table called warehouse_transactions. This table grows by 50 million records per month. Your data warehouse is queried directly for reporting purposes and is also used as the data source for Analysis Services cubes. You notice that loading this many records into the fact table causes table level locks and causes long periods of downtime when it is impossible to query the table. Furthermore, queries that read data from the fact table take longer each month due to the huge volume of data that needs to be examined.
Sounds fantastic and perfect for this SQL Server magic? Guess you want to partitioning a few huge tables now….but wait a minute, let’s consider about the following:
“Can partitioning help us?”
Yes, partitioning may be a good answer for future growth and scalability, but partitioning isn’t the right answer for all particular cases. The reasons are as below:
Partitioned tables require Enterprise Edition.
Standard Edition can’t do partitioning, and Enterprise is a few times more expensive than Standard. On a 2-CPU box, we’re talking about a $30-$50k difference in costs. If we’re only using Enterprise for the partitioning, then there’s a better return on investment if we put that $30-$50k into faster storage instead. This is especially relevant for servers with direct attached storage, even in a raid 10 config. Consider partitioning unless the server had at least 10-20 drives for the data array alone.
Partitioning best practices means using both fast and slow storage.
In the case of a data warehouse, a sales table is loaded every night with the previous day’s sales data. The most recent data is under heavy load because it’s constantly being accessed by both the ETL processes and the daily reports – because users really want to know what happened in the last few days to a month. We have to keep years and years of sales data on hand for curious users who want to check trends, though, and we don’t want to keep all of that data on extremely fast (and extremely expensive) storage. If we partitioned by date range, we could keep the new data on fast storage, and shuffle the older data off to slower storage.
Partition functions and schemes are easy to design incorrectly.
Before implementing partitioning for the first time, either get personalized advice from a DBA who’s done it several times, or get an identical development system for repeated testing. A luxury of an identical QA environment may need to repeatedly test different partitioning strategies and replay the end user load to see which strategy performed best. It’s a mistake to try it on a production system first. For example, When the DBA first started using it, he thought he knew exactly the right strategy for our data, but he could be wrong – really wrong – but at least he could find the right answer before going into production by testing in the luxury QA environment.
So before you take effort or step forward to partitioning your tables, consider more……