DB2 Partition-By-Growth Tables: Very Nice, But Not For Everything
Not long ago, I was doing some work at a site that had DB2 for z/OS V8 running in production (as is true, I think, of most mainframe DB2-using enterprises at present - with a tip of the hat to those organizations that have installed or migrated to V9). A DBA came by with an idea for table partitioning that he wanted to bounce off me. What he wanted to achieve by way of this scheme was minimization of long-term administrative hassle for a large table. His proposal: define the table with an identity column (with the default start-at-1 and increment-by-1 options in effect), and specify that identity column as the partitioning key (this would utilize the table-controlled - as opposed to index-controlled - partitioning capability introduced with DB2 V8). He would decide ahead of time how many rows he wanted in a partition and specify partition limit key values accordingly. He'd set up an initial number of partitions for the table (enough for several weeks' worth of inserts, or maybe a few months), and he would periodically add more partitions (with progressively greater limit key values), via ALTER TABLE ADD PARTITION, to stay well ahead of the advancing insert activity. He even talked of automating this periodic partition-add process. This proposed set-up would allow the table to become very large (resting easy with the knowledge that the table could have up to 4096 partitions, and that the identity column/partitioning key value could go up to one less than 10 to the 31rst power), with populated partitions of about equal size (good), and with the benefit of partition-level utility operations - all in all, about as close to a set-it-and-forget-it approach as you could have for a big table in a DB2 V8 environment.
My initial response to the idea was, "Too bad you guys aren't yet on DB2 Version 9. Partition-by-growth would take care of all that for you." The DBA's eyes widened a little. "Partition by what?" he asked. I went on to explain that he was not alone in wanting a really easy way (administratively speaking) to handle really big tables, and that IBM had responded to that recognized need with a new type of table organization that reduces DBA oversight requirements even more than would his partition-by-identity plan: you just tell DB2 how large you want a table's partitions to be (via the DSSIZE specification for the associated tablespace), and that's that. There's one partition to begin with, and when that one gets full DB2 defines another one and starts using it, and this continues as needed to accommodate INSERT and/or LOAD activity (the MAXPARTITIONS value for the tablespace sets the limit on the number of partitions that can be defined for the table). No muss, no fuss - you get an object that's as easy to manage as a segmented tablespace (with the favorable space management characteristics of segmented, since a partition-by-growth table uses the new universal tablespace technology delivered with DB2 V9), with the advantage of very large size (a segmented tablespace can't grow beyond 64 GB) and partition-level utility operations (except for LOAD, which has to run at the tablespace level for a partition-by-growth table).
So, plenty of DB2 for z/OS DBAs are keen (or will be) to use partition-by-growth (PBG) tables. I don't want to rain on that parade, but you should consider that PBG is not the right solution for every large-table situation. In particular, in a data warehouse environment or otherwise when a large table's rows will be often retrieved in large bunches (versus singly or in small sets, as is often true for online transaction processing systems), a goal of performance optimization will likely lead toward the use of partition-by-range (the other flavor of DB2 V9 universal tablespace) for large tables. Here's why:
My initial response to the idea was, "Too bad you guys aren't yet on DB2 Version 9. Partition-by-growth would take care of all that for you." The DBA's eyes widened a little. "Partition by what?" he asked. I went on to explain that he was not alone in wanting a really easy way (administratively speaking) to handle really big tables, and that IBM had responded to that recognized need with a new type of table organization that reduces DBA oversight requirements even more than would his partition-by-identity plan: you just tell DB2 how large you want a table's partitions to be (via the DSSIZE specification for the associated tablespace), and that's that. There's one partition to begin with, and when that one gets full DB2 defines another one and starts using it, and this continues as needed to accommodate INSERT and/or LOAD activity (the MAXPARTITIONS value for the tablespace sets the limit on the number of partitions that can be defined for the table). No muss, no fuss - you get an object that's as easy to manage as a segmented tablespace (with the favorable space management characteristics of segmented, since a partition-by-growth table uses the new universal tablespace technology delivered with DB2 V9), with the advantage of very large size (a segmented tablespace can't grow beyond 64 GB) and partition-level utility operations (except for LOAD, which has to run at the tablespace level for a partition-by-growth table).
So, plenty of DB2 for z/OS DBAs are keen (or will be) to use partition-by-growth (PBG) tables. I don't want to rain on that parade, but you should consider that PBG is not the right solution for every large-table situation. In particular, in a data warehouse environment or otherwise when a large table's rows will be often retrieved in large bunches (versus singly or in small sets, as is often true for online transaction processing systems), a goal of performance optimization will likely lead toward the use of partition-by-range (the other flavor of DB2 V9 universal tablespace) for large tables. Here's why:
- Partition-by-range (PBR) tables enable multi-dimensional clustering. Yes, I know - multidimensional clustering is, in the strictest sense of the term, a feature of DB2 for Linux, UNIX, and Windows (LUW); however, a similar effect can be achieved with PBR tables (and indeed, with DB2 for z/OS V8 table-controlled partitioning) by partitioning on one key (i.e., one data dimension) and clustering on another. So, for example, a table could be partitioned by date and clustered within partitions by customer ID. This type of arrangement can lead to both query parallelism (a user might want data for a given customer ID over the past 6 months, and if each partition holds data for a particular week, the target rows will be spread over 20-plus partitions) and locality of reference (within partitions, the rows for the specified customer ID will be located near each other). That can be a powerful performance combination, and it's not possible with PBG tables because for such tables there is no notion of a partitioning key.
- PBR tables allow for data rotation. Suppose you want to keep a year's worth of data in a table, and you want to partition that data by week, but you don't want more than 53 partitions (we tend to think of a year as being 52 weeks, but of course fifty-two 7-day periods doesn't quite get you there). With a PBR table (and again, with DB2 V8 table-controlled partitioning), you can use actual date values as partition key limits and keep a rolling 53 weeks of data by taking advantage of ALTER TABLE ROTATE PARTITION functionality. Again, can't do that with PBG.
- PBR tables enable page-range screening. When a query predicate references the limit key of a PBR table, DB2 can avoid accessing partitions that would not contain qualifying rows - a potentially huge performance benefit for SELECTs that target large tables (page-range screening, also known as limited partition scan, has been around for a while, but DB2 for z/OS V9 further leverages this optimization technique by extending its use to table join predicates and to non-matching page-range predicates).
0 Comments:
Post a Comment
<< Home