Robert's Blog


Wednesday, January 27, 2010

Some Basic Information About SQL in DB2-Accessing Programs

DB2 has been around for a long time (more than 25 years), and a lot of people who work with DB2 have been doing so for a long time (myself included). Many of the younger folks I meet who are professionally engaged in DB2-related activity are developers. Some of these who came to DB2 after working with other relational database management systems might have been initially confused on hearing their DB2-knowledgeable colleagues talk about SQL as being "embedded" or "static" or "dynamic." Waters may have been further muddied when familiar words such as "package," "collection," and "plan" took on unfamiliar meanings in discussions about DB2-based applications. Throw in a few terms like "DBRM" and "consistency token," and you can really have a DB2 newbie scratching his or her head. Of late, I've seen enough misunderstanding in relation to programming for DB2 data access. My hope is that this post will provide some clarity. Although I am writing from a DB2 for z/OS perspective, the concepts are essentially the same in a DB2 for Linux/UNIX/Windows environment (some of the terminology is a little different).

First up for explanation: embedded SQL. Basically, this refers to SQL statements, included in the body of a program, that are converted into a structure, called a package, that runs in the DB2 database services address space when the program executes. The package is generated through a mechanism, known as the bind process, that operates on a file called a database request module, or DBRM. The DBRM, which contains a program's embedded SQL statements in a bind-ready form, is one of two outputs produced when the program is run through the DB2 precompiler. The other of these outputs is a modified version of the source program, in which the embedded SQL statements have been commented out and to which calls to DB2 have been added -- one call per SQL statement. Each of these DB2 calls contains a statement number, the name of the program's DB2 package, and a timestamp-based identifier called a consistency token. The statement numbers and the consistency token are also included in the aforementioned DBRM, and these serve to tie the program in it's compiled and linked form to the package into which the DBRM is bound: at program execution time, a DB2 call indicates the package to use (the match is on package name and and consistency token value), and identifies the section of the package that corresponds to the SQL statement to be executed.

The above paragraph is kind of a mouthful. Here's the key concept to keep in mind: the package associated with a program containing embedded SQL is generated before the program ever runs. To put it another way, DB2 gets to see the embedded SQL statements and prepare them for execution (by doing things like access path selection) before they are issued by the application program.

A few more items of information related to packages:

  • Packages are persistent -- they are stored in a system table in the DB2 directory database and loaded into memory when needed. Once cached in the DB2 for z/OS environmental descriptor manager pool (aka the EDM pool) a package is likely to stay memory-resident for some time, but if it eventually gets flushed out of the pool (as might happen if it goes for some time without being referenced), it will again be read in from the DB2 directory when needed.
  • Packages are organized into groups called collections.
  • For application processes that are local to a DB2 for z/OS subsystem (i.e., that run in the same z/OS system as the target DB2 data server), packages are executed through plans. So, a batch job running in a JES initiator address space -- or a CICS transaction, or an IMS transaction -- will provide to DB2 the name of a plan, which in turn points to one or more collections that contain the package or packages associated with the embedded SQL statements that the application process will issue. Applications that are remote to the DB2 subsystem and communicate with DB2 through the Distributed Data Facility using the DRDA protocol (Distributed Relational Database Architecture) make use of packages, but they do not refer to DB2 plans.
What about dynamic versus static SQL? Plenty of people who know a lot about DB2 will tell you that "static SQL" means the same thing as "embedded SQL." In my mind, the two terms are almost equivalent (some would say that this "almost" of mine is a matter of splitting hairs). It's true that static SQL is seen by DB2 and prepared for execution before the associated program is executed. That's what I said about embedded SQL, isn't it? Yes, but there is something called embedded dynamic SQL. That would be an SQL statement that is placed in a host variable that is subsequently referenced by a PREPARE statement. The SQL statement string in the host variable is dynamic (that is to say, it will be prepared by DB2 for execution when it is issued by the program), but -- and this is the splitting-hairs part -- PREPARE itself is not a dynamic SQL statement.

Dynamic SQL statements (again, those being statements that are prepared when issued by a program, versus being prepared beforehand through the previously described bind process) can of course be presented to DB2 without the use of PREPARE -- they can, for instance, take the form of ODBC (Open Database Connectivity) or JDBC (Java Database Connectivity) calls. They can also be issued interactively through tools such as SPUFI (part of the TSO/ISPF interface to DB2 for z/OS) and the command line processor (a component of DB2 for Linux/UNIX/Windows and of the DB2 Client).

Some DB2 people hear "dynamic SQL" and think "ad-hoc SQL." In fact, these terms are NOT interchangeable. Ad-hoc SQL is free-form and unpredictable -- it could be generated by someone using a query tool in a data warehouse environment. Ad-hoc SQL will be dynamic (prepared for execution by DB2 when issued), but dynamic SQL certainly doesn't have to be ad-hoc. There are tons of examples of applications -- user-written and vendor-supplied -- that send SQL statements to DB2 in a way that will result in dynamica statement preparation. That doesn't mean that users have any control over the form of statements so issued (users might only be able to provide values that will be substituted for parameter markers in a dynamic SQL statement). "Structured dynamic" is the phrase I use when referring to this type of SQL statement. Just remember: static SQL CANNOT change from execution to execution (aside from changes in the values of host variables). Dynamic SQL CAN change from execution to execution, but it doesn't HAVE to.

I'll close by pointing out that dynamic SQL statements are not, in fact, always prepared by DB2 at the time of their execution. Sometimes, they are prepared before their execution. I'm referring here to DB2's dynamic statement caching capability (active by default in DB2 V8 and V9 systems). When a dynamic SQL statement is prepared for execution, DB2 will keep a copy of the prepared form of the statement in memory. When the same statement is issued again (possibly with different parameter values if the statement was coded with parameter markers), DB2 can use the cached structure associated with the previously prepared instance of the statement, thereby saving the CPU cycles that would otherwise be consumed in re-preparing the statement from scratch. Dynamic statement caching is one of the key factors behind the growing popularity and prevalence of dynamic SQL in mainframe DB2 environments.

I hope that this overview of SQL in DB2-accessing programs will be useful to application developers and others who work with DB2. When all is said and done, the value of a database management system to an organization depends in large part on the value delivered by applications that interact with the DBMS. Code on!

Wednesday, January 13, 2010

DB2 for z/OS Data Sharing: Then and Now (Part 2)

In part 1 of this two-part entry, posted last week, I wrote about some of the more interesting changes that I've seen in DB2 for z/OS data sharing technology over the years (about 15) since it was introduced through DB2 Version 4. More specifically, in that entry I highlighted the tremendous improvement in performance with regard to the servicing of coupling facility requests, and described some of the system software enhancements that have made data sharing a more CPU-efficient solution for organizations looking to maximize the availability and scalability of a mainframe DB2 data-serving system. In this post, I'll cover changes in the way that people configure data sharing groups, and provide a contemporary view of a once-popular -- and perhaps now unnecessary -- application tuning action.

Putting it all together. Of course, before you run a DB2 data sharing group, you have to set one up. Hardware-wise, the biggest change since the early years of data sharing has been the growing use of internal coupling facilities, versus the standalone boxes that were once the only option available. The primary advantage of internal coupling facilities (ICFs), which operate as logical partitions within a System z server, with dedicated processor and memory resources, is economic: they cost less than external, standalone coupling facilities. They also offer something of a performance benefit, as communication between a z/OS system on a mainframe and an ICF on the same mainframe is a memory-to-memory operation, with no requirement for the traversing of a physical coupling facility link.

When internal coupling facilities first came along in the late 1990s, organizations that acquired them tended to use only one in a given parallel sysplex (the mainframe cluster on which a DB2 data sharing group runs) -- the other coupling facility in the sysplex (you always want at least two, so as to avoid having a single point of failure) would be of the external variety. This was so because people wanted to avoid the effect of the so-called "double failure" scenario, in which a mainframe housing both an ICF and a z/OS system participating in the sysplex associated with the ICF goes down. Group buffer pool duplexing, delivered with DB2 Version 6 (with a subsequently retrofit to Version 5), allayed the double-failure concerns of those who would put the group buffer pools (GBPs) in an ICF on a server with a sysplex-associated z/OS system: if that server were to fail, taking the ICF down with it, the surviving DB2 subsystems (running on another server or servers) would simply use what had been the secondary group buffer pools in the other coupling facility as primary GBPs, and the application workload would continue to be processed by those subsystems (in the meantime, any DB2 group members on the failed server would be automatically restarted on other servers in the sysplex). Ah, but what of the lock structure and the shared communications area (SCA), the other coupling facility structures used by the members of a DB2 data sharing group? Companies generally wanted these in an external, standalone CF (or in an ICF on a server that did not also house a z/OS system participating in the sysplex associated with the ICF). Why? Because a double-failure scenario involving these structures would lead to a group-wide failure -- this because successful rebuild of the lock structure or SCA (which would prevent a group failure) requires information from ALL the members of a DB2 data sharing group. If a server has an ICF containing the lock structure and SCA (these are usually placed within the same coupling facility) and also houses a member of the associated DB2 data sharing group, and that server fails, the lock structure and SCA will not be rebuilt (because a DB2 member failed, too), and without those structures, the data sharing group will come down.

Nowadays, parallel sysplexes configured with ICFs and no external CFs are increasingly common. For one thing, the double-failure scenario is less scary to a lot of folks than it used to be, because a) actually losing a System z server is exceedingly unlikely (it's rare enough for a z/OS or a DB2 or an ICF to crash, and rarer still for a mainframe itself to fail), and b) even if a double-failure involving the lock structure and SCA were to occur, causing the DB2 data sharing group to go down, the subsequent group restart process that would restore availability would likely complete within a few minutes (with duplexed group buffer pools, there would be no data sets in group buffer pool recover pending status, and restart goes much more quickly when there's no GRECP). Secondly, organizations that want insurance against even a very rare outage situation that would probably not exceed a small number of minutes in duration can eliminate the possibility of an outage-causing double-failure by implementing system duplexing of the lock structure and the SCA. System duplexing of the lock structure and SCA increases the overhead of running DB2 in data sharing mode (more so than group buffer pool duplexing), and that's why some organizations use it and some don't -- it's a cost versus benefit decision.

Another relatively recent development with regard to data sharing set-up is the availability of 64-bit addressing in Coupling Facility Control Code, beginning with the CFLEVEL 12 release (Coupling Facility Control Code is the operating system of a coupling facility, and I believe that CFLEVEL 16 is the current release). So, how big can an individual coupling facility structure be? About 100 GB (actually, 99,999,999 KB -- the current maximum value that can be specified when defining a structure in the specification of a Coupling Facility Resource Management, or CFRM, policy). For the lock structure and the SCA, this structure size ceiling (obviously well below what's possible with 64-bit addressing) is totally a non-issue, as these structures are usually not larger than 128 MB each. Could someone, someday, want a group buffer pool to be larger than 100 GB? Maybe, but I think that we're a long way from that point, and I expect that the 100 GB limit on the size of a coupling facility structure will be increased well before then.

DEALLOCATE or COMMIT? DB2 data sharing is, for the most part, invisible to application programs, and organizations implementing data sharing groups often find that use of the technology necessitates little, if anything, in the way of application code changes. That said, people have done various things over the years to optimize the CPU efficiency of DB2-accessing programs running in a data sharing environment. For a long time, one of the more popular tuning actions was to bind programs executed via persistent threads (i.e., threads that persist across commits, such as those associated with batch jobs and with CICS-DB2 protected entry threads) with the RELEASE(DEALLOCATE) option. This was done to reduce tablespace lock activity: RELEASE(DEALLOCATE) causes tablespace locks (not page or row locks) acquired by an application process to be retained until thread deallocation, as opposed to being released (and reacquired, if necessary) at commit points. The reduced tablespace lock activity would in turn reduce the type of data sharing global lock contention called XES contention (about which I wrote in last week's part 1 of this two-part entry). There were some costs associated with the use of RELEASE(DEALLOCATE) for packages executed via persistent threads (the size of the DB2 EDM pool often had to be increased, and package rebind procedures sometimes had to be changed or rescheduled to reflect the fact that some packages would remain in an "in use" state for much longer periods of time than before), but these were typically seen as being outweighed by the gain in CPU efficiency related to the aforementioned reduction in the level of XES contention.

All well and good, but then (with DB2 Version 8) along came data sharing Locking Protocol 2 (also described in last week's part 1 post). Locking Protocol 2 drove XES contention way down, essentially eliminating XES contention reduction as a rational for pairing RELEASE(DEALLOCATE) with persistent threads. With this global locking protocol in effect, the RELEASE(DEALLOCATE) versus RELEASE(COMMIT) package bind decision is essentially unrelated to your use of data sharing. There are still benefits associated with the use of RELEASE(DEALLOCATE) for persistent-thread packages (e.g., a slight improvement in CPU efficiency due to reduced tablespace lock and EDM pool resource release and reacquisition, more-effective dynamic prefetch), but take away the data sharing effieicncy gain of old that is now provided by Locking Protocol 2, and you might decide to be more selective in your use of RELEASE(DEALLOCATE). If you implement a DB2 data sharing group, you may just leave your package bind RELEASE specifications as they were in your standalone DB2 environment.

What new advances in data sharing technology are on the way? We'll soon see: IBM is expected to formally announce DB2 Version "X," the successor to Version 9, later this year. I'll be looking for data sharing enhancements among the "What's New?" items delivered with Version X, and I'll likely report in this space on what I find. Stay tuned.

Friday, January 8, 2010

DB2 for z/OS Data Sharing: Then and Now (Part 1)

A couple of months ago, I did some DB2 data sharing planning work for a financial services organization. That engagement gave me an opportunity to reflect on how far the technology has come since it was introduced in the mid-1990s via DB2 for z/OS Version 4 (I was a part of IBM's DB2 National Technical Support team at the time, and I worked with several organizations that were among the very first to implement DB2 data sharing groups on parallel sysplex mainframe clusters). This being the start of a new year, it seems a fitting time to look at where DB2 data sharing is now as compared to where it was about fifteen years ago. I'll do that by way of a 2-part post, focusing here on speed gains and smarter software, and in part 2 (sometime next week) on configuration changes and application tuning considerations.

Speed, and more speed. One of most critical factors with respect to the performance of a data sharing group is the speed with which a request to a coupling facility can be processed. In a parallel sysplex, the coupling facilities provide the shared memory resource in which DB2 structures such as the global lock structure and the group buffer pools are located. Requests to these structures (e.g., the writing of a changed page to a group buffer pool, or the propagation of a global lock request for a data page or a row) have to be processed exceedingly quickly, because 1) the volume of requests can be very high (thousands per second) and 2) most DB2-related coupling facility requests are synchronous, meaning that the mainframe engine that drives such a request will wait, basically doing nothing, until the coupling facility response is received (this is so for performance reasons: the request-driving mainframe processor is like a runner in a relay race, waiting with outstretched hand to take the baton from a teammate and immediately sprint with it towards the finish line). This processor wait time associated with synchronous coupling facility requests, technically referred to as "dwell time," has to be minimized because it is a key determinant of data sharing overhead (that being the difference in the CPU cost of executing an SQL statement in a data sharing environment versus the cost of executing the same statement in a standalone DB2 subsystem).

In the late 1990s, people who looked after DB2 data sharing systems were pretty happy if they saw average service times for synchronous requests to the group buffer pools and lock structure that were under 250 and 150 microseconds, respectively. Nowadays, sites report that their average service times for synchronous group buffer pool and lock structure requests are less than 20 microseconds. This huge improvement is due in large part to two factors. First, coupling facility engines are MUCH faster than they were in the old days. If you know mainframes, you know about this even if you are not familiar with coupling facilities, because coupling facility microprocessors are identical, hardware-wise, to general purpose System z engines -- they just run Coupling Facility Control Code instead of z/OS. Today's z10 microprocessors pack 10 times the compute power delivered by top-of-the-line mainframe engines a decade ago. The second big performance booster with regard to coupling facility synchronous request service times is the major increase in coupling facility link capacity versus what was available in the 1999-2000 time frame. Back then, many of the links in use had an effective data rate of 250 MB per second. Current links can move information at 2 GB (2000 MB) per second.

This big improvement in performance related to the servicing of synchronous coupling facility requests helped to improve throughput in data sharing systems. Did it also reduce the CPU cost of data sharing? Yes, but it's only part of that story. DB2 data sharing is a more CPU-efficient technology now than it was in the 1990s: overhead in an environment characterized by lots of what I call inter-DB2 write/write interest (referring to the updating of individual database objects -- tables and indexes -- by multiple application processes running concurrently on different members of a DB2 data sharing group) was once generally in the 10-20% range. Now the range is more like 8-15%. That improvement wasn't helped along all that much by faster coupling facility engines. Sure, they lowered service times for synchronous coupling facility requests, but the resulting reduction in the aforementioned mainframe processor "dwell time" was offset by the fact that much-faster mainframe engines forgo handling a lot more instructions during a given period of "dwelling" versus their slower predecessors (in other words, as mainframe processors get faster, you have to drive down synchronous request service times just to hold the line on data sharing overhead). Faster coupling facility links helped to reduce overhead, but I think that improvements in DB2 data sharing CPU efficiency have at least as much to do with system software changes as with speedier servicing of coupling facility requests. A tip of the hat, then, to the DB2, z/OS, and Coupling Facility Control Code development teams.

Working smarter, not harder. Over the years, IBM has delivered a number of software enhancements that lowered the CPU cost of DB2 data sharing. Some of these code changes boosted efficiency by reducing the number of coupling facility requests that would be generated in the processing of a given workload. A DB2 Version 5 subsystem, for example (and recall that data sharing was introduced with DB2 Version 4), was able to detect more quickly that a data set that it was updating and which had been open on multiple members of a data sharing group was now physically closed on the other members. As a result, the subsystem could take the data set out of the group buffer pool dependent state sooner, thereby eliminating associated coupling facility group buffer pool requests. DB2 Version 6 introduced the MEMBER CLUSTER option of CREATE TABLESPACE, enabling organizations to reduce coupling facility accesses related to space map page updating for tablespaces with multi-member insert "hot spots" (these can occur when multiple application processes running on different DB2 members are driving concurrent inserts to the same area within a tablespace). DB2 Version 8 took advantage of a Coupling Facility Control Code enhancement, the WARM command (short for Write And Register Multiple -- a means of batching coupling facility requests), to reduce the number of group buffer pool writes necessitated by a page split in a group buffer pool dependent index from five to one.

These code improvements were all welcome, but my all-time favorite efficiency-boosting enhancement is the Locking Protocol 2 feature delivered with DB2 Version 8. Locking Protocol 2 lowered data sharing overhead by virtually eliminating a type of global lock contention known as XES contention. Prior to Version 8, if an application process running on a member of a data sharing group requested an IX or IS lock on a tablespace (indicating an intent to update or read from the tablespace in a non-exclusive manner), that request would be propagated by XES (Cross-System Extended Services, a component of the z/OS operating system) to the coupling facility lock structure as an X or S lock request (exclusive update or exclusive read) on the target object, because those are the only logical lock states known to XES. Thus, if process Q running on DB2 Version 7 member DB2A requests an IX lock on tablespace XYZ, that request will get propagated to the lock structure as an X lock on the tablespace. If process R running on DB2 Version 7 member DB2B subsequently requests an IS lock on the same tablespace, that request will be propagated to the lock structure as an S lock on the resource. If process Q on DB2A still holds its lock on tablespace XYZ, the lock structure will detect the incompatibility of the X and S locks on the tablespace, and DB2B will get a contention-detected response from the coupling facility. The z/OS image under which DB2B is running will contact the z/OS associated with DB2A in an effort to resolve the apparent contention situation. DB2A's z/OS then drives an IRLM exit (IRLM being the lock management subsystem used by DB2), supplying the target resource identifier (for tablespace XYZ) and the actual logical lock requests involved in the XES-perceived conflict (IX and IS). IRLM will indicate to the z/OS system that these logical lock states are in fact compatible, and DB2B's z/OS will be informed that application process R can in fact get it's requested lock on tablespace XYZ. The extra processing required to determine that the conflict perceived by XES is in fact not a conflict adds to the cost of data sharing.

With locking protocol 2, the aforementioned scenario would play out as follows: the request by application process Q on DB2 Version 8 (or 9) member DB2A for an IX lock on tablespace XYZ gets propagated to the lock structure as a request for an S lock on the object. The subsequent request by application process R on DB2 Version 8 (or 9) member DB2B is also propagated as an S lock request, and because S locks on a resource are compatible with each other, no contention is indicated in the response to DB2B's system from the coupling facility, and the lock request is granted. Because IX and IS tablespace lock requests are very common in a DB2 environment, and because IX-IX and IX-IS lock situations involving application access to a given tablespace from different data sharing members are not perceived by XES as being in-conflict when Locking Protocol 2 is in effect, almost all of the XES contention that would be seen in a pre-Version 8 DB2 data sharing group goes away, and data sharing CPU overhead goes down. I say almost all, because the S-IS tablespace lock situation (exclusive read on the one side, and non-exclusive read on the other), which would not result in XES contention with Locking Protocol 1, does cause XES contention with Locking Protocol 2 (the reason being that Locking Protocol 2 causes an S tablespace lock request to be propagated to the lock structure as an X request -- necessary to ensure that an X-IX tablespace lock situation will be correctly perceived by XES as being in-conflict). This is generally not a big deal, because S tablespace lock requests tend to be quite unusual in most DB2 systems.

So, DB2 for z/OS data sharing technology, which was truly ground-breaking when introduced, has not remained static in the years since -- it's gotten better, and it will get better still in years to come. That's good news for data sharing users. If your organization doesn't have a DB2 data sharing group, make a new year's resolution to look into it. In any case, stop by the blog next week for my part 2 entry on the "then" and "now" of data sharing.

Tuesday, December 29, 2009

Clearing the Air Re: Indexes on DB2 for z/OS Partitioned Tables

With the end of the year in sight, it's a good time to tie up loose ends, as we say here in the USA. Thus it is that I've decided to focus, in this last post to my blog in 2009, on indexes as they pertain to DB2 for z/OS partitioned tables. That subject qualifies as a "loose end," because several years after the introduction of table-controlled partitioning with DB2 for z/OS V8, some folks are still not certain as to what can and cannot be done with indexes defined on partitioned tables. I'll try, in this entry, to clear things up.

When a table is partitioned by way of an index specification (the only way to partition a table prior to DB2 V8), index options for the table are pretty straightforward. The index that describes the partitioning scheme (i.e., the one with the PART integer VALUES (constant) clause in the CREATE INDEX statement) is called the partitioning index. No other index on that table is called a partitioning index, and no other index on the table is physically partitioned. Any index on a unique key can be defined as UNIQUE.
Only the partitioning index can be the table's clustering index.

Starting with DB2 V8, table partitioning can be controlled by way of a table's definition (through the PARTITION BY and PARTITION integer ENDING AT (constant) clauses of CREATE TABLE). Table-controlled partitioning (enhanced in DB2 V9 via the partition-by-range universal tablespace) is way better than index-controlled partitioning, but this important DB2 advancement did change -- considerably -- the landscape as far as index options are concerned. First and foremost: for a table-controlled partitioned table, any index on a key that starts with the table's partition-by column or columns is called a partitioning index
(and "starts with" means that the columns of a multi-column partitioning key appear in the order specified in the CREATE TABLE statement); so, if a table's partitioning key is COL_X, COL_Y then an index on COL_X, COL_Y, COL_A is a partitioning index, and so is an index on COL_X, COL_Y, COL_B (but an index on COL_Y, COL_X, COL_D would not be a partitioning index, because the order of the partition-by columns does not match the order specified in the table's definition). Among the implications of this rule: a table-controlled partitioned table may have several partitioning indexes, or it may not have any partitioning indexes. Furthermore, a partitioning index may or may not be physically partitioned. Continuing with the example of the table partitioned on COL_X, COL_Y, an index on COL_X, COL_Y, COL_D that does not have the PARTITIONED clause in its definition is partitioning (because its key begins with the table's partitioning key) but not partitioned (because it was not defined with the PARTITIONED clause). Possible, of course, doesn't necessarily mean advisable -- I don't see why you would have a partitioning index that is not also partitioned.

Next: for a table-controlled partitioned table, a secondary index (i.e., one that is not a partitioning index) can itself be partitioned -- you just have to specify PARTITIONED in the definition of the index (this will cause the index to be physically partitioned along the lines of the underlying table, so that partition 1 of the secondary index will contain the keys for rows in partition 1 of the table). A secondary index that is partitioned is called a data-partitioned secondary index, or DPSI (an index that is not partitioned is called a non-partitioned index, or NPI).

Third: there is a restriction on DPSIs with regard to uniqueness (a restriction that was loosened somewhat with DB2 V9). In a DB2 for z/OS V8 environment, no DPSI can be defined as unique (and remember: a DPSI is a partitioned index that is not a partitioning index -- a partitioning index can be unique). If you try to create a secondary index with both UNIQUE and PARTITIONED in the index definition, you'll get a -628 SQL code (and an accompanying error message indicating that "clauses are mutually exclusive"). In a DB2 V9 environment (and this was recently pointed out by DB2 consultant Peter Backlund in a thread on the DB2-L discussion forum), a DPSI can be defined as UNIQUE if the index key contains the table's partition by column or columns. Once again, consider the table partitioned on COL_X, COL_Y. A DPSI defined on COL_A, COL_Y, COL_B, COL_X could have the UNIQUE attribute because the index key contains all of the table's partition-by columns (and note that the partition-by columns do not have to be in any particular order within the DPSI's key -- they just have to be present within the key). Can there be multiple unique DPSIs defined on a DB2 V9 table-controlled partitioned table? Yes -- again, what's required is that the underlying table's partition-by columns be included in the key of a DPSI that is to be defined as UNIQUE.

Finally: with regard to the CLUSTER attribute, you have flexibility with a table-controlled partitioned table that you don't have with an index-controlled partitioned table. For an index-controlled partitioned table, the partitioning index will be the table's clustering index. For a table-controlled partitioned table, any one index can be the table's clustering index (and of course, a table can only have one index with the CLUSTER attribute). The clustering index could be a partitioning index or a secondary index (whether a DPSI or an NPI). The ability to cluster a table with one key and partition it by another key is, in my opinion, one of the key advantages of table-controlled partitioning over index-controlled partitioning (other pluses include the ability to add partitions to a table-controlled partitioned table, and the ability to rotate partitions in a "first to last" manner).

Is all that clear? I hope so. Table-controlled partitioning is a VERY good thing -- well worth the effort of getting your arms around the new rules regarding indexes on table-controlled partitioned tables.

Throughout my 27 years in IT, I've enjoyed the constancy of opportunities to learn new things. I look forward to more of the same in 2010. Have fun ringing in the new year!

Thursday, December 10, 2009

DB2 for z/OS and the Disk Subsystem

Earlier this week, we in the Atlanta DB2 Users Group were treated to a day with Roger Miller, the ebullient IBMer who, more than anyone else, has been the "face" of the DB2 for z/OS development organization since the product debuted more than 25 years ago (Roger spends a great deal of time in front of DB2 users -- at customer sites, at the briefing center at IBM's Silicon Valley Lab, and at user group meetings). Roger provided us with six hours of DB2 information, laced with a little bit of Shakespeare (appropriate, as Roger's presentation style certainly brings to mind the line, "All the world's a stage," from Shakespeare's As You Like It). Some of the material that I found to be particularly interesting had to do with advances in disk I/O technology over the years, and the effect of those enhancements on DB2-related I/O performance. In this post, I'll share some of that information with you, and I'll include a few of my own observations on DB2 and the disk subsystem, based on my work with DB2 for z/OS-using organizations.

Faster and faster. Roger talked about the major gains seen in mainframe processor performance over the past 10 years (the "clock rate" of the engines in IBM's top-of-the line z10 mainframe server is 4.4 GHz, versus 550 MHz for the G6 line in 1999), and he went on to point out that improvements in disk I/O performance have been just as dramatic. As I noted in a blog entry on DB2 I/O performance that I posted last year, a target of 20-30 milliseconds of wait time per synchronous read (i.e., for an on-demand read of a single 4K DB2 page into memory from the disk subsystem) was pretty much the norm well into the 1990s (this figure is usually obtained from a DB2 monitor accounting report or an online display of DB2 accounting data). Nowadays, 2 milliseconds of wait time per synchronous read is fairly common. There are a number of factors behind this order-of-magnitude improvement in disk I/O performance, none more important than the large increase in disk controller cache memory sizes since the mid-90s (32 MB of cache once seemed like a lot in the old days -- now you can get more than 300 GB of cache on a control unit), and the development of sophisticated algorithms to optimize the effective use of the cache resource (enterprise-class disk controllers run these algorithms on multiple high-performance microprocessors).

The cache impact on disk I/O performance was underscored by some numbers that Roger shared with our group: while the time required to retrieve a DB2 page from spinning disk will generally be between 4 and 8 milliseconds (a big improvement versus 1990s-era disk subsystems), a disk controller cache hit results in a wait time of 230 to 290 microseconds for a synchronous read (the low end of this range is for a system with the z10 High-Performance FICON I/O architecture, also known as zHPF).

For DB2 prefetch reads, the gains are even more impressive. A decade or so ago, reading 64 4K pages into a DB2 buffer pool in 90 milliseconds was thought to be good performance. These days, it's possible to get 64 pages into memory in 1.5 milliseconds -- a 60X improvement over the old standard.

Looking beyond cache, a fairly recent arrival on the enterprise storage scene is solid-state disk technology (SSD). Actually, SSD itself isn't all that new -- devices of this type have been in use for about 30 years. What's new is the cost-competitiveness of enterprise-class SSD systems -- still several times more expensive than a traditional spinning-disk system, on a cost-per-gigabyte basis, but close enough now to warrant serious consideration for certain performance-critical database objects that tend to be accessed in a random fashion (as versus a sequential processing pattern). Random access is the SSD sweet spot because the technology eliminates the seek time that elongates random I/O service times when pages are read from spinning disk; thus, the wait time for a DB2 synchronous read from an SSD devices will likely be about 740-840 microseconds, versus the aforementioned 4-8 milliseconds for a read from spinning disk.

64-bit addressing means that DB2 buffer pools can be way bigger than before, and that's good for performance because no I/O operation -- even one that results in a cache hit -- will come close to the speed with which a page in memory can be accessed. That said, a large-scale DB2 application accessing a multi-terabyte database is likely to drive a lot of I/O activity, even if you have a really big buffer pool configuration. Advanced disk storage and I/O subsystem technology make those page reads much less of a drag on application performance than they otherwise would be.

Disk space utilization: don't overdo it. High-performance storage systems cost plenty of money, and people who purchase the devices don't want to have a lot of unused disk capacity on the floor. Some, however, push the target space-utilization threshold too far. See, a storage system isn't just about application performance -- it's also about application availability. Fill your disk volumes too full, and you're asking for DB2 data set space allocation failures (and associated application outages). How full is too full? I can tell you that in my own experience, DB2 data set space allocation failures tend to be a problem when an organization goes for a disk space utilization rate of 90% or more (and keep in mind that we're not just talking about data set extensions related to data-load operations -- we're also talking about utility-related disk space usage for things like sort work files and online REORG shadow data sets). At the other end of the spectrum, I've seen a situation in which an organization set a 60% threshold for disk volume space utilization. This company's DBAs liked that policy a lot (it really cut down on middle-of-the-night rousting of whoever was on-call), but I can't say that I'd advocate such a target -- it seems too low from a cost-efficiency perspective.

Where do I stand? I'm pretty comfortable with a disk space utilization target in the 70-80% range. I might lean towards the lower end of that range in a DB2 Version 9 environment, as partition-level online REORG jobs will reorganize non-partitioning indexes in their entirety, thereby necessitating more shadow data set space (a potentially compensating factor would be DB2 9 index compression, which can significantly reduce disk space requirements for index data sets). Something else to keep in mind: in addition to avoiding overly-high utilization of disk space, another good practice with regard to minimizing DB2 data set space allocation failures is to let DB2 determine the amount of space that will be requested when a data set has to be extended. This capability, sometimes called sliding-scale space allocation, was introduced with DB2 for z/OS Version 8. I blogged about it a few weeks ago, and I highly recommend its usage. People who have taken advantage of this functionality have expressed great satisfaction with the results: much less DBA time spent in managing DB2 data set allocation processing, and a sharp drop in the occurrence of DB2 space allocation-related failures. Check it out.

Monday, November 30, 2009

DB2: DPSI Do, or DPSI Don't?

One of the more interesting features delivered in recent years for DB2 for z/OS is the data-partitioned secondary index, or DPSI (pronounced "DIP-see"). Introduced with DB2 for z/OS Version 8, DPSIs are defined on table-controlled partitioned tablespaces (i.e., tablespaces for which the partitioning scheme is controlled via a table-level -- as opposed to an index-level -- specification), and are a) non-partitioning (meaning that the leading column or columns of the index key are not the same as the table's partitioning key), and b) physically partitioned along the lines of the table's partitions (meaning that partition 1 of a DPSI on partitioned table XYZ will contain entries for all rows in partition 1 of the underlying table -- and only for those rows).

[For more information about partitioned and partitioning indexes on table-controlled partitioned tablespaces, see my blog entry on the topic, posted a few weeks ago.]

Here's what makes DPSIs really interesting to me:
  • They are a great idea in some situations.
  • They are a bad idea in other situations.
  • You may change the way you think about them in a DB2 9 context, versus a DB2 V8 environment.
DPSI do: a great example of very beneficial DPSI use showed up recently in the form of a question posted to the DB2-L discussion list (a great forum for technical questions related to DB2 for z/OS and DB2 for Linux, UNIX, and Windows -- visit the DB2-L page on the Web to join the list). The DB2 DBA who sent in the question was working on the design of a data purge process for a particular table in a database managed by DB2 for z/OS V8. The tablespace holding the data was partitioned (in the table-controlled way), with rows spread across seven partitions -- one for each day of the week. The sole purge criterion was date-based, and the DBA was thinking of using a partition-level LOAD utility, with REPLACE specified and an empty input data set, to clear out a partition's worth of data (a good thought, as purging via SQL DELETE can be pretty CPU-intensive if there are a lot of rows to be removed from the table at one time). He indicated that several non-partitioning indexes (or NPIs) were defined on the table, and he asked for input from "listers" (i.e., members of the DB2-L community) as to the impact that these NPIs might have on his proposed partition-level purge plan.

Isaac Yassin, a consultant and noted DB2 expert, was quick to respond to the question with a warning that the NPIs would have a very negative impact on the performance of the partition-level LOAD REPLACE operation. Isaac's answer was right on the money. It's true that the concept of a logical partition in an NPI (i.e., the index entries associated with rows located in a partition of the underlying table) makes a partition-level LOAD REPLACE compatible, from a concurrency perspective, with application access to other partitions in the target tablespace (a LOAD REPLACE job operating on partition X of table Y will get a so-called drain lock on logical partition X of each NPI defined on table Y). Still, technically feasible doesn't necessarily mean advisable. For a partitioned tablespace with no NPIs, a LOAD REPLACE with an empty input data set is indeed a very CPU- and time-efficient means of removing data from a partition, because it works at a data set level: the tablespace partition's data set (and the data set of the corresponding physical partition of each partitioned index) is either deleted and redefined (if its is a DB2-managed data set) or reset to empty (if it is a user-managed data set, or if it is DB2-managed and REUSE was specified on the utility control statement), and -- thanks to the empty input data set -- you have a purged partition at very little cost in terms of CPU consumption.

Put NPIs in that picture, and the CPU and elapsed time story changes for the worse. Because the NPIs are not physically partitioned, the index-related action of the partition-level LOAD REPLACE job (the deletion of index entries associated with rows in the target table partition) is a page-level operation. That means lots of GETPAGEs (CPU time!), potentially lots of I/Os (elapsed time!), and lots of index leaf page latch requests (potential contention issue with respect to application processes concurrently accessing other partitions, especially in a data sharing environment). Nick Cianci, an IBMer in Australia, suggested changing the NPIs to DPSIs, and that's what the DBA who started the DB2-L thread ended up doing (as I confirmed later -- he's a friend of mine). With the secondary indexes physically partitioned along the lines of the table partitions, the partition-level LOAD REPLACE with the empty input data set will be what the DBA wanted: a very fast, very efficient means of removing a partition's worth of data from the table without impacting application access to other partitions.

DPSI don't: there's a flip side to the partition-level utility benefits that can be achieved through the use of data partitioned (versus non-partitioned) secondary indexes, and it has to do with query performance. When one or more of a query's predicates match the columns of the key of a non-partitioned index, DB2 can quickly zero in on qualifying rows. If that same index is data-partitioned, and if none of the query's predicates reference the partitioning key of the underlying table, DB2 might not be able to determine that a partition of the table contains any qualifying rows without checking the corresponding partition of the DPSI (in other words, DB2 might not be able to utilize page range screening to limit the number of partitions that have to be searched for qualifying rows). If a table has just a few partitions, this may not be such a big deal. But if the table has hundreds or thousands of partitions, it could be a very big deal (imagine a situation in which DB2 has to search 1000 partitions of a DPSI in order to determine that the rows qualified by the query are located in just a few of the table's partitions -- maybe just one). In that case, degraded query performance might be too high of a price to pay for enhanced partition-level utility operations, and NPIs might be a better choice than DPSIs.

That's exactly what happened at a large regional retailer in the USA. A DBA from that organization was in a DB2 for z/OS database administration class that I taught recently, and he told me that his company, on migrating a while back to DB2 V8, went with DPSIs for one of their partitioned tablespaces in order to avoid the BUILD2 phase of partition-level online REORG. [With NPIs, an online REORG of a partition of a tablespace necessitates BUILD2, during which the row IDs of entries in the logical partition (corresponding to the target tablespace partition) of each NPI are corrected to reflect the new physical location of each row in the partition. For a very large partition containing tens of millions of rows, BUILD2 can take quite some time to complete, and during that time the logical partitions of the NPIs are unavailable to applications. This has the effect of making the corresponding tablespace partition unavailable for insert and delete activity. Updates of NPI index key values are also not possible during BUILD2.]

The DPSIs did indeed eliminate the need for BUILD2 during partition-level online REORGs of the tablespace (this because the physical DPSI partitions are reorganized in the same manner as the target tablespace partition), but they also caused response times for a number of queries that access the table to increase significantly -- this because the queries did not contain predicates that referenced the table's partitioning key, so DB2 could not narrow the search for qualifying rows to just one or a few of a DPSI's partitions. The query performance problems were such that the retailer decided to go back to NPIs on the partitioned table. To avoid the BUILD2-related data availability issue described above, the company REORGs the tablespace in its entirety (versus REORGing a single partition or a subset of partitions). [By the way, DPSI-related query performance problems were not an issue for the DBA who initiated the DB2-L thread referenced in the "DPSI do" part of this blog entry, because 1) the table in question is accessed almost exclusively for inserts, with only the occasional data-retrieval operation; and 2) the few queries that do target the table contain predicates that reference the table's partitioning key, so DB2 can use page-range screening to avoid searching DPSI partitions in which entries for qualifying rows cannot be present.]

DPSIs and DB2 9: the question as to whether or not you should use DPSIs is an interesting one, and it's made more so by a change introduced with DB2 9 for z/OS: the BUILD2 phase of partition-level online REORG has been eliminated (BUILD2 is no longer needed because the DB2 9 REORG TABLESPACE utility will reorganize NPIs in their entirety as part of a partition-level online REORG operation). Since some DB2 V8-using organizations went with DPSIs largely to avoid the data unavailability situation associated with BUILD2, they might opt to redefine DPSIs as NPIs after migrating to DB2 9, if DPSI usage has led to some degradation in query performance (as described above in the "DPSI don't" part of this entry). Of course BUILD2 avoidance (in a DB2 V8 system) is just one justification for DPSI usage. Even with DB2 9, using DPSIs (versus NPIs) is important if you need a partition-level LOAD REPLACE job to operate efficiently (see "DPSI do" above).

Whether you have a DB2 V8 or a DB2 V9 environment, DPSIs may or may not be a good choice for your company (and it may be that DPSIs could be used advantageously for some of your partitioned tables, while NPIs would be more appropriate for other tables). Understand the technology, understand your organization's requirements (and again, these could vary by table), and make the choice that's right in light of your situation.

Tuesday, November 17, 2009

Mainframe DB2 Data Serving: Vision Becomes Reality

One of the things I really like about attending DB2 conferences is the face-to-face time I get with people who otherwise would be on the other side of e-mail exchanges. I get a whole lot more out of in-person communication versus the electronic variety. Case in point: at IBM's recent Information on Demand event in Las Vegas, I ran into a friend who is a DB2 for z/OS database engineering leader at a large financial services firm. He talked up a new mainframe DB2 data serving reference architecture recently implemented for one of his company's mission-critical applications, and did so with an enthusiasm that could not have been fully conveyed through a text message. I got pretty fired up listening to the story this DBA had to tell, in part because of the infectious excitement with which it was recounted, but also because the system described so closely matches a vision of a DB2 for z/OS data-serving architecture that I've had in mind -- and have advocated -- for years. To see that vision validated in the form of a real-world system that is delivering high performance and high availability in a demanding production environment really made my day. I am convinced that what the aforementioned financial services firm (hereinafter referred to as Company XYZ) is doing represents the future of mainframe DB2 as a world-class enterprise data-serving platform. Read on if you want to know more.

Three characteristics of the reference DB2 for z/OS data architecture (so called because it is seen as the go-forward model by the folks at Company XYZ) really stand out in my mind and make it an example to be emulated:
  1. It is built on a DB2 data sharing / parallel sysplex foundation, for maximum availability and scalability (not only that -- these folks have done data sharing Really Right, as I'll explain).
  2. It leverages Big Memory (aka 64-bit addressing) for enhanced performance.
  3. The software stack on the mainframe servers is pretty short -- these are database machines, plain and simple.
A little elaboration now on these three key aspects of Company XYZ's DB2 for z/OS reference architecture:

The robust foundation: a DB2 data sharing group on a parallel sysplex mainframe cluster. It's well known that a standalone System z server running z/OS and DB2 can be counted on to provide very high levels of availability and scalability for a data-serving workload. These core strengths of the mainframe platform are further magnified when concurrent read/write access to the database is shared by multiple DB2 members of a data sharing group, running in the multiple z/OS LPARs (logical partitions) and multiple System z servers of a parallel sysplex. You're not going to beat the uptime delivered by that configuration: formerly planned outages for maintenance purposes are virtually eliminated (service levels of of DB2, z/OS, and other software components can be updated, and server hardware can be upgraded, with no -- I mean zero -- interruption of application access to the database), and the impact of an unplanned failure of a DB2 member or a z/OS LPAR or a server is greatly diminished (only data pages and/or rows that were in the process of being changed by programs running on a failed DB2 subsystem are temporarily unavailable following the failure, and those retained locks will be usually be freed up within a couple of minutes via automatic restart of the failed member). And scalability? Up to 32 DB2 subsystems (which could be running on 32 different mainframe servers) can be configured in one data sharing group.

Now, you can set up a DB2 data sharing group the right way, or the Really Right way. Company XYZ did it Really Right. Here's what I mean:
  • More z/OS LPARs and DB2 members than mainframes in the sysplex. I like having more than one z/OS LPAR (and DB2 subsystem) per mainframe in a parallel sysplex, because 1) you can route work away from one of the LPARs for DB2 or z/OS maintenance purposes and still have access to that server's processing capacity, and 2) more DB2 members means fewer retained locks and quicker restart in the event of a DB2 subsystem failure.
  • Dynamic VIPA network addressing. Availability and operational flexibility are optimized when remote DRDA clients use a dynamic VIPA (virtual IP address) to connect to the DB2 data sharing group (as long as at least one member of the data sharing group is up, a connection request specifying the group's VIPA can be successfully processed). A sysplex software component called the Sysplex Distributor handles load balancing across DB2 members for initial connection requests from remote systems (these will often be application servers), while load balancing for subsequent requests is managed at the DB2 member level.
  • Internal coupling facilities. ICFs (basically, coupling facility control code running in an LPAR on a mainframe server) are less expensive than external coupling facilities, not only with respect to acquisition cost, but also in terms of environmental expenses (floor space, power, cooling). [It's true that if the mainframe containing the ICF holding the lock structure and the shared communications area (SCA) should fail, and if on that mainframe there is also a member of the DB2 data sharing group, the result will be a group-wide outage unless the lock structure and SCA are duplexed in the second ICF. Company XYZ went with system-managed duplexing of the lock structure and SCA (DB2 manages group buffer pool duplexing in a very low-overhead way). Some other organizations using ICFs exclusively (i.e., no external coupling facilities) decide not to pay the overhead of system-managed lock structure and SCA duplexing, on the ground that a) a mainframe server failure is exceedingly unlikely, b) the group-wide outage would only occur if a particular mainframe (the one with the ICF in which the lock structure and SCA are located) were to fail, and c) the group-restart following a group-wide outage should complete within a few minutes. The right way to go regarding the use or non-use of system-managed lock structure and SCA duplexing will vary according to the needs of a given organization.]
Taking advantage of 64-bit addressing. Each of the LPARs in the parallel sysplex on which Company XYZ's model DB2 for z/OS data-serving system is built has more than 20 GB of central storage, and each DB2 subsystem (there is one per LPAR) has a buffer pool configuration that exceeds 10 GB in size. In these days of Big Memory (versus the paltry 2 GB to which we were limited not long ago), I don't think of a production-environment DB2 buffer pool configuration as being large unless the aggregate size of all pools in the subsystem is at least 10 GB. The reduced level of disk I/O activity that generally comes with a large buffer pool configuration can have a significant and positive impact on both the elapsed time and CPU efficiency of data access operations.

Lean, mean, data-serving machines. A production instance of DB2 for Linux, UNIX, and Windows (LUW) usually runs on a machine that is a dedicated data server -- data access code executes there, and that's it. Business-logic programs? They run on application servers. Presentation-logic programs? They might run on yet another tier of servers. The DB2 for LUW server Just Does Data. When I started my IT career in the early 1980s, a mainframe-based application was almost always entirely mainframe-based, by which I mean that all application functionality -- data access logic, business logic, and presentation logic -- was implemented in programs that ran on a mainframe server. Nowadays, I believe that the unmatched availability, scalability, reliability, and security offered by the System z platform is put to most advantageous use in the servicing of data access requests. In other words, I feel that a DB2 for z/OS system should be thought of, in an architectural sense, as a dedicated data server, just as we tend to think of DB2 for LUW systems (and other database management systems that run on Linux, UNIX, and/or Windows platforms) as dedicated data servers.

That's how DB2 for z/OS functions in Company XYZ's reference architecture: it just does data. Consequently, the software stack on the data-serving mainframes is relatively short, consisting of z/OS, DB2, RACF (security management), a data replication tool, some system automation and management tools, some performance monitoring tools, and little else. Transaction management is handled on application servers. Database access requests come in via the DB2 Distributed Data Facility (DDF), and much of the access logic is packaged in stored procedures (the preference at Company XYZ is DB2 9 native SQL procedures, because they perform very well and -- when invoked through calls that come through DDF -- much of their processing can be handled by zIIP engines).

Does this system, which looks so good on paper, deliver the goods? Absolutely. Volume has been taken north of 1400 transactions per second with excellent response time, and my DBA friend is confident that crossing the 2000-trans-per-second threshold won't be a problem. On the availability side, Company XYZ is getting industry-leading uptime. The message: System z is more than just capable of functioning effectively as a dedicated data server -- it works exceptionally well when used in that way. This is a clean, modern architecture that leverages what mainframes do best -- scale, serve, protect, secure -- in a way that addresses a wide range of application design requirements.

Here's a good coda for you: still at IOD in Las Vegas, and shortly after my conversation with the DBA from Company XYZ, I encountered another friend -- a lead DB2 technical professional at another company. He told me about the new DB2 for z/OS reference architecture that had recently been approved by his organization's IT executive management. The pillars of that architecture are a DB2 data sharing / parallel sysplex mainframe cluster, z/OS systems functioning as dedicated data servers, data requests coming in via the DB2 DDF, and data access logic packaged in DB2 9 native SQL procedures. I told this friend that he and his colleagues are definitely on the right track. It's a track that more and more DB2 for z/OS-using companies are traveling, and it could well be the right one for your organization.