Robert's Blog


Friday, March 5, 2010

Another Note on DB2 for z/OS Buffer Pool Page-Fixing

In the summer of 2008, I posted a blog entry on page-fixing DB2 buffer pools, a feature introduced with DB2 for z/OS Version 8. A recent discussion I had with a client about buffer pool page-fixing brought to light two aspects of this performance tuning option that, I believe, are overlooked by some DB2 users. In this post I'll describe how you can make a quick initial assessment as to whether or not the memory resource of a mainframe system is sufficient to support buffer pool page-fixing, and I'll follow that with a look at the "bonus" performance impact that can be realized by buffer pool page-fixing in a DB2 data sharing environment.

Gauging the server memory situation. As pointed out in the aforementioned 2008 blog entry on the topic, page-fixing a buffer pool can reduce CPU consumption by eliminating the requests that DB2 would otherwise have to make of z/OS to fix in memory -- and to subsequently release -- a buffer for every read of a page from, or write of a page to, the disk subsystem. These page fix/page release operations are individually inexpensive, but the cumulative CPU cost can be significant when the I/Os associated with a pool number in the hundreds (or thousands) per second. The prospect of removing that portion of a DB2 workload's CPU utilization may have you thinking, "Why not?" Well, there's a reason why PGFIX(NO) is the default setting for a DB2 buffer pool, and it has to do with utilization of a mainframe server's (or z/OS LPAR's) memory resource.

With PGFIX(NO), the real storage page frames occupied by DB2 buffers are candidates for being stolen by z/OS, should the need arise. If something has to be read into memory from disk, and there is no available page frame to accommodate that read-in, z/OS will make one available by moving its contents to a page data set on auxiliary storage (if that relocated page is subsequently referenced by a process, it will be brought back into server memory from auxiliary storage -- this is known as demand paging). z/OS steals page frames according to a least-recently-used algorithm: the longer a page frame goes without being referenced, the closer it moves to the front of the steak queue. If a DB2 buffer goes a long time without being referenced, it could be paged out to auxiliary storage.

So, page-fixing a buffer pool in memory would preclude z/OS from considering the associated real storage page frames as candidates for stealing. The important question, then, is this: would some of those pages be stolen by z/OS if they weren't fixed in memory from the get-go? If so, then page-fixing that pool's buffers might not be such a great idea: in taking away some page frames that z/OS might otherwise steal, buffer pool page fixing could cause page-steal activity to increase for other subsystems and application processes in the z/OS LPAR. Not good.

Fortunately, there's a pretty easy way to get a feel for this: using either your DB2 monitor (an online display or a statistics report) or the output of the DB2 command -DISPLAY BUFFERPOOL DETAIL, look for fields labeled "PAGE-INS REQUIRED FOR READ" and "PAGE-INS REQUIRED FOR WRITE" (or something similar to that). What these fields mean: a page-in is required for a read if DB2 wants to read a page from disk into a particular buffer, and that buffer has been paged out to auxiliary storage (i.e., the page frame occupied by the buffer was stolen by z/OS). Similarly, a page-in is required for a write if DB2 needs to write the contents of a buffer to disk and the buffer is in auxiliary storage.

If, for a pool, the PAGE-INS REQUIRED FOR READ and PAGE-INS REQUIRED FOR WRITE fields both contain zeros, it is likely that the pool, from a memory perspective, is "V=R" anyway (that is to say, the amount of real storage occupied by the pool is probably very close to, if not the same as, its size in terms of virtual storage). In that case, going with PGFIX(YES) should deliver CPU savings without increasing pressure on the server memory resource, since the page frames being stolen are probably not those that are occupied by that pool's buffers. If you want an added measure of assurance on this score, issue a -DISPLAY BUFFERPOOL DETAIL(*) command. The (*) following the DETAIL keyword tells DB2 that you want statistics for the pool since the time it was last allocated. That might have been days, or even weeks, ago (the command output will tell you this), and if you see that the "PAGE-INS REQ" fields in the read and write parts of the command output contain zeros for that long period of time, it's a REALLY good bet that the pool's occupation of real storage won't increase appreciably if you go with PGFIX(YES). For even MORE assurance that the memory resource of the z/OS LPAR in which DB2 is running is not under a lot of pressure, check the "PAGE-INS REQUIRED" numbers for the lower-activity pools (those with fewer GETPAGE requests than others). If even these show zeros, you should be in really good shape, memory-wise.

With all this said, keep a couple of things in mind. First, even though your "PAGE-INS REQUIRED" numbers may give you a high degree of confidence that going to PGFIX(YES) for a buffer pool would be a good idea, make sure to coordinate this action with your z/OS systems programmer. That person has responsibility for seeing that z/OS system resources (such as server memory) are effectively managed and utilized, and you need to make sure that the two of you are on the same page (no pun intended) regarding buffer pool page-fixing. If you've done your homework, and you let the z/OS systems programmer do his (or her) homework (such as looking at z/OS monitor-generated system paging statistics), getting to agreement should not be a problem. Second, be selective in your use of the PGFIX(YES) buffer pool option. The greater the amount of I/O activity for a pool, the greater the benefit of PGFIX(YES). I'd recommend considering page-fixing for pools for which the rate of disk I/O activity is at least in the high double digits (writes plus reads) per second (and be sure to include prefetch reads when calculating the rate of disk I/O operations for a buffer pool). By staying with PGFIX(NO) for your lower-activity pools, you ensure that DB2 will make some buffer pool-associated page frames available to z/OS for page-out, should something cause the LPAR's memory resource to come under significant pressure.

And for you data sharing users... Just a couple of weeks ago, someone told me that he was under the impression that page-fixing buffer pools would have a negative performance impact in a DB2 data sharing environment. NOT SO. Assuming (as mentioned above) that your server memory resource is sufficient to accommodate page-fixing for one or more of your buffer pools, the resulting CPU efficiency benefit should be MORE pronounced for in a data sharing group versus a standalone DB2 system. How so? Simple: the buffer pool page fix/page release activity that occurs for DB2 reads to, and writes from, the disk subsystem with PGFIX(NO) in effect also occurs for writes of pages to, and reads of pages from, coupling facility group buffer pools. Like disk I/Os, page read and write actions involving a group buffer pool can number in the thousands per second. PGFIX(YES) eliminates the overhead of page fix/page release requests for disk I/Os AND for group buffer pool page reads and writes. So, if you're running DB2 in a data sharing configuration, you have another incentive to check out the page-fix option for your high-use buffer pools.

Monday, February 22, 2010

A Couple of Notes on DB2 Group Buffer Pools

I have recently done some work related to DB2 for z/OS data sharing, and that has me wanting to share with you a couple of items of information concerning group buffer pools (coupling facility structures used to cache changed pages of tablespaces and indexes that are targets of inter-DB2 read/write interest). First I'll provide some thoughts on group buffer pool sizing. After that, I'll get into the connection between local buffer pool page-fixing and group buffer pool read and write activity. [Lingo alert: GBP is short for group buffer pool, and "GBP-dependent" basically means that there is inter-DB2 read/write interest in a page set (i.e., a tablespace or or an index or a partition).]

How do you know if bigger is better? A lot of folks know that a group buffer pool should be at least large enough to prevent directory entry reclaims (reclaims are basically "steals" of in-use GBP directory entries to accommodate registration of newly, locally cached pages of GBP-dependent page sets, and you want to avoid them because they result in invalidation of "clean" pages cached in local buffer pools). The key to avoiding directory entry reclaims is to have enough directory entries in a GBP to register all the different pages that could be cached in the GBP and in the associated local buffer pools at any one time (you also want to make sure that there are no GBP write failures due to lack of storage, but there won't be if the GBPs are large enough to prevent directory entry reclaims). For a GBP associated with a 4K buffer pool, and with the default 5:1 ratio of directory entries to data entries, sizing to prevent directory entry reclaims is pretty simple: you add up the size of the local pools and divide that figure by three to get your group buffer pool size; so, if there are two members in a data sharing group, and if BP1 has 6000 buffers on each member, directory entry reclaims will not occur if the size of GBP1 is at least 16,000 KB (the size of BP1 on each of the two DB2 members is 6000 X 4 KB = 24,000 KB, so the GBP1 size should be at least (2 X 24,000 KB) / 3, which is 16,000 KB). Let's say that your GBPs are all large enough to prevent directory entry reclaims (you can check on this via the output of the DB2 command -DISPLAY GROUPBUFFERPOOL GDETAIL). If you have enough memory in your coupling facility LPARs to make them larger still, should you? If you do enlarge them, how do you know if you've done any good?

Start by checking on the success rate for GBP reads caused by buffer invalidations (when a local buffer of DB2 member X holds a table or index page that is changed by a process running on DB2 member Y, the buffer in member X's local pool will be marked invalid and a subsequent request for that page will cause member X to request the current version of the page, first from the GBP and then, in case of a "not found" result, from the disk subsystem). Information about these GBP reads can be found in a DB2 monitor report or online display of GBP information, or in the output of a -DISPLAY GROUPBUFFERPOOL MDETAIL command. In a DB2 monitor report the fields of interest may be labeled as follows (field names can vary slightly from one monitor product to another -- note that "XI" is short for "cross-invalidation," which refers to buffer invalidation operations):

GROUP BP1..........................QUANTITY
---------------------------........--------
SYN.READS(XI)-DATA RETURNED............8000
SYN.READS(XI)-NO DATA RETURN...........2000

In -DISPLAY GROUPBUFFERPOOL MDETAIL output, you'd be looking for this:

DSNB773I - MEMBER DETAIL STATISTICS
.............SYNCHRONOUS READS
...............DUE TO BUFFER INVALIDATION
.................DATA RETURNED..................= 8000
.................DATA NOT RETURNED..............= 2000

The success rate, or "hit rate," for these GBP reads would be:

(reads with data returned) / ((reads with data returned) + (reads with data not returned))

Using the numbers from the example output above, the success rate for GBP reads due to buffer invalidation would be 8000 / (8000 + 2000) = 80%.

Here's why this ratio is useful: buffer invalidations occur when a GBP directory entry pointing to a buffer is reclaimed (not good, as previously mentioned), or when a page cached locally in one DB2 member's buffer pool is changed by a process running on another member of the data sharing group (these invalidations are good, in that they are required for the preservation of data coherency in a data sharing environment). If you don't have any buffer invalidations resulting from directory entry reclaims, invalidations are occurring because of page update activity. Because updated pages of GBP-dependent pages sets are written to the associated GBP as part of commit processing, a DB2 member looking for an updated page in a GBP should have a reasonably good shot at finding it there, if the GBP is large enough to provide a decent page residency time.

So, if you make a GBP bigger and you see that the hit ratio for GBP reads due to invalid buffer has gone up for the member DB2 subsystems, you've probably helped yourself out, performance-wise, because GBP checks for current versions of updated pages are more often resulting in "page found" situations. Getting a page from disk is fast, but getting it from the GBP is 2 orders of magnitude faster (3 orders of magnitude if you have to get the page from spinning disk versus disk controller cache).

By the way, the hit ratio for GBP reads due to "page not in buffer pool" (labeled as such in -DISPLAY GROUPBUFFERPOOL MDETAIL output, and as something like SYN.READS(NF) in a DB2 monitor report or display) is not so useful in terms of gauging the effect of a GBP size increase. These numbers reflect GBP reads that occur when DB2 member is looking in the GBP for a page it needs and which it doesn't have in a local buffer pool. This has to be done prior to requesting the page from disk if the target page set is GBP-dependent, but a GBP "hit" for such a read is, generally speaking, not very likely.

One more thing: if you make a GBP bigger and you are duplexing your GBPs (and I hope that you are), be sure to enlarge the secondary GBP along with the primary GBP. If you aren't duplexing your GBPs (and why is that?), make sure that all your structures can still fit in one CF LPAR (in a two-CF configuration) after the target GBP has been made larger.

Buffer pool page-fixing: good for more than disk I/Os. Buffer pool page-fixing, introduced with DB2 for z/OS V8, is one of my favorite recent DB2 enhancements (I blogged about it in an entry posted in 2008). People tend to think of the performance benefit of buffer pool page-fixing as it relates to disk I/O activity. That benefit is definitely there, but so is the benefit -- and this is what lots of people don't think about -- associated with GBP read and write activity. See, every time DB2 writes a page to a GBP or reads a page from a GBP, the local buffer involved in the operation must be fixed in server memory (aka central storage). If the buffer is in a pool for which PGFIX(YES) has been specified, that's already been done; otherwise, DB2 will have to tell z/OS to fix the buffer in memory during the GBP read or write operation and then release the buffer afterwards. A single "fix" or "un-fix" request is inexpensive, CPU-wise, but there can be hundreds of page reads and writes per second for a GBP, and the cumulative cost of all that buffer fixing and un-fixing can end up being rather significant. So, if you are running DB2 in data sharing mode and you aren't yet taking advantage of buffer pool page-fixing, now you have another reason to give it serious consideration.

Tuesday, February 9, 2010

Good News on the Mainframe DB2 Data Warehousing Front

Last week, I attended a 1-day IBM System z "Technology Summit" education event in Atlanta. It was a multi-track program, and the DB2 for z/OS track ("Track 2") was excellent, in terms of both content and quality of presentation delivery (and it was FREE -- check out the remaining North American cities and dates for this event at http://www-01.ibm.com/software/os/systemz/summit/). The first presentation of the day, delivered by Jim Reed of IBM's Information Management software organization, focused on mainframe market trends and IBM's DB2 for z/OS product strategy. Jim's talk contained several nuggets of information that underscore the solid present and bright future of DB2 for z/OS as a platform for business intelligence applications. In this post, I'll share this information with you, along with some of my own observations on the topic.

BI important? How about most important? Jim started out with a reference to a recent IBM Global CIO survey which asked participants to identify their top priority. You know what's hot in IT circles these days: virtualization, mobility apps, regulatory compliance. So, what came out on top with regard to CIO priorities? Analytics and business intelligence. That's not very surprising, as far as I'm concerned. Having spent years on optimizing efficiency, squeezing costs out of every facet of their operations, organizations are increasingly focused on optimizing performance. Are they offering the right mix of products and services to their customers? Are they selling to the right people? Are they delivering value in a way that separates them, in the eyes of their customers, from their competitors? Data warehouse systems are key drivers of success here, enabling companies to generate actionable intelligence from their data assets (and the breadth of these data assets keeps expanding, including now not just traditional point-of-sale and other business transactions, but e-mails, customer care interactions -- even company- and product-related comments posted on external-to-the-enterprise Web sites).

Big iron has big mo. At the same time that BI is heating up as an area of corporate endeavor, the mainframe -- long seen as a workhorse for run-the-business OLTP and batch workloads -- is growing in popularity as a platform for BI applications. Jim spoke of several factors that are putting wind in System z's sails with regard to data warehousing. He cited a Gartner report that spotlighted key BI issues with which companies are grappling now. This list of front-burner concerns included:
  • High availability. Data warehouses are more likely these days to get a "mission critical" designation. Many (including one I worked on just last month) are customer-facing systems, and a lot of those are the subject of service level agreements.
  • Mixed workload performance. This was identified as the number one performance issue for data warehouses. Mixed BI workloads, in which fast-running, OLTP-like queries vie with complex, data-intensive analytic processes for system resources, are becoming common as so-called "operational BI" gains prominence.
Then, of course, there's the matter of data protection, on which so much else depends. Jim mentioned that 33% of people recently surveyed indicated that they would QUIT doing business with a company if that company experienced a data security or privacy breach and was seen as being responsible for the incident.

So, to address these key issues, you'd probably want to build your data warehouse on a hardware/software platform known for high availability, sophisticated workload management capabilities, and strong, multi-layered data protection and access control. Hmmm. Sounds like mainframe DB2 to me. Keep in mind, too, that the well-known availability and workload management strengths of System z and z/OS and DB2 are made even stronger when DB2 is deployed in data sharing mode on a parallel sysplex mainframe cluster configuration.

Oh, and let's not forget that the legendary reliability of mainframe systems is not just a matter of advanced hardware and software technology (good as that stuff is) -- it also reflects the deep skills and robust processes (around change management, performance monitoring and tuning, capacity planning, business continuity, etc.) that typify the teams of professionals that support organizations' mainframe computing environments. As BI applications continue to move from "nice to have" to "must have" in the eyes of corporate leaders, it stands to reason that IT executives would want to house these essential systems on the server platform that exemplifies "rock solid," and to assign their care to the people in whom they have the utmost trust and confidence -- mainframe people.

One more trend driving BI workloads to System z is the increased frequency with which data warehouse databases are being updated. Not long ago, the "query by day, update at night" model predominated. Now, many BI application users demand that updates of source data values be reflected more quickly in the data warehouse database -- sometimes in a near-real-time manner. A lot of the source data that supplies data warehouses comes from mainframe databases and files, and locating the data warehouse close to that source data can facilitate round-the-clock updating.

Let's make a deal. The technical arguments for building a data warehouse on a mainframe platform are many and strong, but what about the financial angle? IBM has been pretty busy in this area of late. I already knew of DB2 Value Unit Edition pricing, which makes DB2 for z/OS available for a one-time charge for net new workloads of certain types, including data warehousing. I'll admit to not having known about IBM's System z Solution Edition series (announced in August of last year) before Jim talked about them during his presentation. Included in this set of offerings is the System Z Solution Edition for Data Warehousing, a package of hardware, software (including DB2), and services that can help an organization to implement a mainframe-based data warehouse system in a cost-competitive way.

If your organization is serious about data warehousing, get serious about your data warehouse platform. Mainframes deliver the availability, mixed workload performance management, security, and -- yes -- total cost of ownership that can improve your chances of achieving BI success. Analyze that.

Wednesday, January 27, 2010

Some Basic Information About SQL in DB2-Accessing Programs

DB2 has been around for a long time (more than 25 years), and a lot of people who work with DB2 have been doing so for a long time (myself included). Many of the younger folks I meet who are professionally engaged in DB2-related activity are developers. Some of these who came to DB2 after working with other relational database management systems might have been initially confused on hearing their DB2-knowledgeable colleagues talk about SQL as being "embedded" or "static" or "dynamic." Waters may have been further muddied when familiar words such as "package," "collection," and "plan" took on unfamiliar meanings in discussions about DB2-based applications. Throw in a few terms like "DBRM" and "consistency token," and you can really have a DB2 newbie scratching his or her head. Of late, I've seen enough misunderstanding in relation to programming for DB2 data access. My hope is that this post will provide some clarity. Although I am writing from a DB2 for z/OS perspective, the concepts are essentially the same in a DB2 for Linux/UNIX/Windows environment (some of the terminology is a little different).

First up for explanation: embedded SQL. Basically, this refers to SQL statements, included in the body of a program, that are converted into a structure, called a package, that runs in the DB2 database services address space when the program executes. The package is generated through a mechanism, known as the bind process, that operates on a file called a database request module, or DBRM. The DBRM, which contains a program's embedded SQL statements in a bind-ready form, is one of two outputs produced when the program is run through the DB2 precompiler. The other of these outputs is a modified version of the source program, in which the embedded SQL statements have been commented out and to which calls to DB2 have been added -- one call per SQL statement. Each of these DB2 calls contains a statement number, the name of the program's DB2 package, and a timestamp-based identifier called a consistency token. The statement numbers and the consistency token are also included in the aforementioned DBRM, and these serve to tie the program in it's compiled and linked form to the package into which the DBRM is bound: at program execution time, a DB2 call indicates the package to use (the match is on package name and and consistency token value), and identifies the section of the package that corresponds to the SQL statement to be executed.

The above paragraph is kind of a mouthful. Here's the key concept to keep in mind: the package associated with a program containing embedded SQL is generated before the program ever runs. To put it another way, DB2 gets to see the embedded SQL statements and prepare them for execution (by doing things like access path selection) before they are issued by the application program.

A few more items of information related to packages:

  • Packages are persistent -- they are stored in a system table in the DB2 directory database and loaded into memory when needed. Once cached in the DB2 for z/OS environmental descriptor manager pool (aka the EDM pool) a package is likely to stay memory-resident for some time, but if it eventually gets flushed out of the pool (as might happen if it goes for some time without being referenced), it will again be read in from the DB2 directory when needed.
  • Packages are organized into groups called collections.
  • For application processes that are local to a DB2 for z/OS subsystem (i.e., that run in the same z/OS system as the target DB2 data server), packages are executed through plans. So, a batch job running in a JES initiator address space -- or a CICS transaction, or an IMS transaction -- will provide to DB2 the name of a plan, which in turn points to one or more collections that contain the package or packages associated with the embedded SQL statements that the application process will issue. Applications that are remote to the DB2 subsystem and communicate with DB2 through the Distributed Data Facility using the DRDA protocol (Distributed Relational Database Architecture) make use of packages, but they do not refer to DB2 plans.
What about dynamic versus static SQL? Plenty of people who know a lot about DB2 will tell you that "static SQL" means the same thing as "embedded SQL." In my mind, the two terms are almost equivalent (some would say that this "almost" of mine is a matter of splitting hairs). It's true that static SQL is seen by DB2 and prepared for execution before the associated program is executed. That's what I said about embedded SQL, isn't it? Yes, but there is something called embedded dynamic SQL. That would be an SQL statement that is placed in a host variable that is subsequently referenced by a PREPARE statement. The SQL statement string in the host variable is dynamic (that is to say, it will be prepared by DB2 for execution when it is issued by the program), but -- and this is the splitting-hairs part -- PREPARE itself is not a dynamic SQL statement.

Dynamic SQL statements (again, those being statements that are prepared when issued by a program, versus being prepared beforehand through the previously described bind process) can of course be presented to DB2 without the use of PREPARE -- they can, for instance, take the form of ODBC (Open Database Connectivity) or JDBC (Java Database Connectivity) calls. They can also be issued interactively through tools such as SPUFI (part of the TSO/ISPF interface to DB2 for z/OS) and the command line processor (a component of DB2 for Linux/UNIX/Windows and of the DB2 Client).

Some DB2 people hear "dynamic SQL" and think "ad-hoc SQL." In fact, these terms are NOT interchangeable. Ad-hoc SQL is free-form and unpredictable -- it could be generated by someone using a query tool in a data warehouse environment. Ad-hoc SQL will be dynamic (prepared for execution by DB2 when issued), but dynamic SQL certainly doesn't have to be ad-hoc. There are tons of examples of applications -- user-written and vendor-supplied -- that send SQL statements to DB2 in a way that will result in dynamica statement preparation. That doesn't mean that users have any control over the form of statements so issued (users might only be able to provide values that will be substituted for parameter markers in a dynamic SQL statement). "Structured dynamic" is the phrase I use when referring to this type of SQL statement. Just remember: static SQL CANNOT change from execution to execution (aside from changes in the values of host variables). Dynamic SQL CAN change from execution to execution, but it doesn't HAVE to.

I'll close by pointing out that dynamic SQL statements are not, in fact, always prepared by DB2 at the time of their execution. Sometimes, they are prepared before their execution. I'm referring here to DB2's dynamic statement caching capability (active by default in DB2 V8 and V9 systems). When a dynamic SQL statement is prepared for execution, DB2 will keep a copy of the prepared form of the statement in memory. When the same statement is issued again (possibly with different parameter values if the statement was coded with parameter markers), DB2 can use the cached structure associated with the previously prepared instance of the statement, thereby saving the CPU cycles that would otherwise be consumed in re-preparing the statement from scratch. Dynamic statement caching is one of the key factors behind the growing popularity and prevalence of dynamic SQL in mainframe DB2 environments.

I hope that this overview of SQL in DB2-accessing programs will be useful to application developers and others who work with DB2. When all is said and done, the value of a database management system to an organization depends in large part on the value delivered by applications that interact with the DBMS. Code on!

Wednesday, January 13, 2010

DB2 for z/OS Data Sharing: Then and Now (Part 2)

In part 1 of this two-part entry, posted last week, I wrote about some of the more interesting changes that I've seen in DB2 for z/OS data sharing technology over the years (about 15) since it was introduced through DB2 Version 4. More specifically, in that entry I highlighted the tremendous improvement in performance with regard to the servicing of coupling facility requests, and described some of the system software enhancements that have made data sharing a more CPU-efficient solution for organizations looking to maximize the availability and scalability of a mainframe DB2 data-serving system. In this post, I'll cover changes in the way that people configure data sharing groups, and provide a contemporary view of a once-popular -- and perhaps now unnecessary -- application tuning action.

Putting it all together. Of course, before you run a DB2 data sharing group, you have to set one up. Hardware-wise, the biggest change since the early years of data sharing has been the growing use of internal coupling facilities, versus the standalone boxes that were once the only option available. The primary advantage of internal coupling facilities (ICFs), which operate as logical partitions within a System z server, with dedicated processor and memory resources, is economic: they cost less than external, standalone coupling facilities. They also offer something of a performance benefit, as communication between a z/OS system on a mainframe and an ICF on the same mainframe is a memory-to-memory operation, with no requirement for the traversing of a physical coupling facility link.

When internal coupling facilities first came along in the late 1990s, organizations that acquired them tended to use only one in a given parallel sysplex (the mainframe cluster on which a DB2 data sharing group runs) -- the other coupling facility in the sysplex (you always want at least two, so as to avoid having a single point of failure) would be of the external variety. This was so because people wanted to avoid the effect of the so-called "double failure" scenario, in which a mainframe housing both an ICF and a z/OS system participating in the sysplex associated with the ICF goes down. Group buffer pool duplexing, delivered with DB2 Version 6 (with a subsequently retrofit to Version 5), allayed the double-failure concerns of those who would put the group buffer pools (GBPs) in an ICF on a server with a sysplex-associated z/OS system: if that server were to fail, taking the ICF down with it, the surviving DB2 subsystems (running on another server or servers) would simply use what had been the secondary group buffer pools in the other coupling facility as primary GBPs, and the application workload would continue to be processed by those subsystems (in the meantime, any DB2 group members on the failed server would be automatically restarted on other servers in the sysplex). Ah, but what of the lock structure and the shared communications area (SCA), the other coupling facility structures used by the members of a DB2 data sharing group? Companies generally wanted these in an external, standalone CF (or in an ICF on a server that did not also house a z/OS system participating in the sysplex associated with the ICF). Why? Because a double-failure scenario involving these structures would lead to a group-wide failure -- this because successful rebuild of the lock structure or SCA (which would prevent a group failure) requires information from ALL the members of a DB2 data sharing group. If a server has an ICF containing the lock structure and SCA (these are usually placed within the same coupling facility) and also houses a member of the associated DB2 data sharing group, and that server fails, the lock structure and SCA will not be rebuilt (because a DB2 member failed, too), and without those structures, the data sharing group will come down.

Nowadays, parallel sysplexes configured with ICFs and no external CFs are increasingly common. For one thing, the double-failure scenario is less scary to a lot of folks than it used to be, because a) actually losing a System z server is exceedingly unlikely (it's rare enough for a z/OS or a DB2 or an ICF to crash, and rarer still for a mainframe itself to fail), and b) even if a double-failure involving the lock structure and SCA were to occur, causing the DB2 data sharing group to go down, the subsequent group restart process that would restore availability would likely complete within a few minutes (with duplexed group buffer pools, there would be no data sets in group buffer pool recover pending status, and restart goes much more quickly when there's no GRECP). Secondly, organizations that want insurance against even a very rare outage situation that would probably not exceed a small number of minutes in duration can eliminate the possibility of an outage-causing double-failure by implementing system duplexing of the lock structure and the SCA. System duplexing of the lock structure and SCA increases the overhead of running DB2 in data sharing mode (more so than group buffer pool duplexing), and that's why some organizations use it and some don't -- it's a cost versus benefit decision.

Another relatively recent development with regard to data sharing set-up is the availability of 64-bit addressing in Coupling Facility Control Code, beginning with the CFLEVEL 12 release (Coupling Facility Control Code is the operating system of a coupling facility, and I believe that CFLEVEL 16 is the current release). So, how big can an individual coupling facility structure be? About 100 GB (actually, 99,999,999 KB -- the current maximum value that can be specified when defining a structure in the specification of a Coupling Facility Resource Management, or CFRM, policy). For the lock structure and the SCA, this structure size ceiling (obviously well below what's possible with 64-bit addressing) is totally a non-issue, as these structures are usually not larger than 128 MB each. Could someone, someday, want a group buffer pool to be larger than 100 GB? Maybe, but I think that we're a long way from that point, and I expect that the 100 GB limit on the size of a coupling facility structure will be increased well before then.

DEALLOCATE or COMMIT? DB2 data sharing is, for the most part, invisible to application programs, and organizations implementing data sharing groups often find that use of the technology necessitates little, if anything, in the way of application code changes. That said, people have done various things over the years to optimize the CPU efficiency of DB2-accessing programs running in a data sharing environment. For a long time, one of the more popular tuning actions was to bind programs executed via persistent threads (i.e., threads that persist across commits, such as those associated with batch jobs and with CICS-DB2 protected entry threads) with the RELEASE(DEALLOCATE) option. This was done to reduce tablespace lock activity: RELEASE(DEALLOCATE) causes tablespace locks (not page or row locks) acquired by an application process to be retained until thread deallocation, as opposed to being released (and reacquired, if necessary) at commit points. The reduced tablespace lock activity would in turn reduce the type of data sharing global lock contention called XES contention (about which I wrote in last week's part 1 of this two-part entry). There were some costs associated with the use of RELEASE(DEALLOCATE) for packages executed via persistent threads (the size of the DB2 EDM pool often had to be increased, and package rebind procedures sometimes had to be changed or rescheduled to reflect the fact that some packages would remain in an "in use" state for much longer periods of time than before), but these were typically seen as being outweighed by the gain in CPU efficiency related to the aforementioned reduction in the level of XES contention.

All well and good, but then (with DB2 Version 8) along came data sharing Locking Protocol 2 (also described in last week's part 1 post). Locking Protocol 2 drove XES contention way down, essentially eliminating XES contention reduction as a rational for pairing RELEASE(DEALLOCATE) with persistent threads. With this global locking protocol in effect, the RELEASE(DEALLOCATE) versus RELEASE(COMMIT) package bind decision is essentially unrelated to your use of data sharing. There are still benefits associated with the use of RELEASE(DEALLOCATE) for persistent-thread packages (e.g., a slight improvement in CPU efficiency due to reduced tablespace lock and EDM pool resource release and reacquisition, more-effective dynamic prefetch), but take away the data sharing effieicncy gain of old that is now provided by Locking Protocol 2, and you might decide to be more selective in your use of RELEASE(DEALLOCATE). If you implement a DB2 data sharing group, you may just leave your package bind RELEASE specifications as they were in your standalone DB2 environment.

What new advances in data sharing technology are on the way? We'll soon see: IBM is expected to formally announce DB2 Version "X," the successor to Version 9, later this year. I'll be looking for data sharing enhancements among the "What's New?" items delivered with Version X, and I'll likely report in this space on what I find. Stay tuned.

Friday, January 8, 2010

DB2 for z/OS Data Sharing: Then and Now (Part 1)

A couple of months ago, I did some DB2 data sharing planning work for a financial services organization. That engagement gave me an opportunity to reflect on how far the technology has come since it was introduced in the mid-1990s via DB2 for z/OS Version 4 (I was a part of IBM's DB2 National Technical Support team at the time, and I worked with several organizations that were among the very first to implement DB2 data sharing groups on parallel sysplex mainframe clusters). This being the start of a new year, it seems a fitting time to look at where DB2 data sharing is now as compared to where it was about fifteen years ago. I'll do that by way of a 2-part post, focusing here on speed gains and smarter software, and in part 2 (sometime next week) on configuration changes and application tuning considerations.

Speed, and more speed. One of most critical factors with respect to the performance of a data sharing group is the speed with which a request to a coupling facility can be processed. In a parallel sysplex, the coupling facilities provide the shared memory resource in which DB2 structures such as the global lock structure and the group buffer pools are located. Requests to these structures (e.g., the writing of a changed page to a group buffer pool, or the propagation of a global lock request for a data page or a row) have to be processed exceedingly quickly, because 1) the volume of requests can be very high (thousands per second) and 2) most DB2-related coupling facility requests are synchronous, meaning that the mainframe engine that drives such a request will wait, basically doing nothing, until the coupling facility response is received (this is so for performance reasons: the request-driving mainframe processor is like a runner in a relay race, waiting with outstretched hand to take the baton from a teammate and immediately sprint with it towards the finish line). This processor wait time associated with synchronous coupling facility requests, technically referred to as "dwell time," has to be minimized because it is a key determinant of data sharing overhead (that being the difference in the CPU cost of executing an SQL statement in a data sharing environment versus the cost of executing the same statement in a standalone DB2 subsystem).

In the late 1990s, people who looked after DB2 data sharing systems were pretty happy if they saw average service times for synchronous requests to the group buffer pools and lock structure that were under 250 and 150 microseconds, respectively. Nowadays, sites report that their average service times for synchronous group buffer pool and lock structure requests are less than 20 microseconds. This huge improvement is due in large part to two factors. First, coupling facility engines are MUCH faster than they were in the old days. If you know mainframes, you know about this even if you are not familiar with coupling facilities, because coupling facility microprocessors are identical, hardware-wise, to general purpose System z engines -- they just run Coupling Facility Control Code instead of z/OS. Today's z10 microprocessors pack 10 times the compute power delivered by top-of-the-line mainframe engines a decade ago. The second big performance booster with regard to coupling facility synchronous request service times is the major increase in coupling facility link capacity versus what was available in the 1999-2000 time frame. Back then, many of the links in use had an effective data rate of 250 MB per second. Current links can move information at 2 GB (2000 MB) per second.

This big improvement in performance related to the servicing of synchronous coupling facility requests helped to improve throughput in data sharing systems. Did it also reduce the CPU cost of data sharing? Yes, but it's only part of that story. DB2 data sharing is a more CPU-efficient technology now than it was in the 1990s: overhead in an environment characterized by lots of what I call inter-DB2 write/write interest (referring to the updating of individual database objects -- tables and indexes -- by multiple application processes running concurrently on different members of a DB2 data sharing group) was once generally in the 10-20% range. Now the range is more like 8-15%. That improvement wasn't helped along all that much by faster coupling facility engines. Sure, they lowered service times for synchronous coupling facility requests, but the resulting reduction in the aforementioned mainframe processor "dwell time" was offset by the fact that much-faster mainframe engines forgo handling a lot more instructions during a given period of "dwelling" versus their slower predecessors (in other words, as mainframe processors get faster, you have to drive down synchronous request service times just to hold the line on data sharing overhead). Faster coupling facility links helped to reduce overhead, but I think that improvements in DB2 data sharing CPU efficiency have at least as much to do with system software changes as with speedier servicing of coupling facility requests. A tip of the hat, then, to the DB2, z/OS, and Coupling Facility Control Code development teams.

Working smarter, not harder. Over the years, IBM has delivered a number of software enhancements that lowered the CPU cost of DB2 data sharing. Some of these code changes boosted efficiency by reducing the number of coupling facility requests that would be generated in the processing of a given workload. A DB2 Version 5 subsystem, for example (and recall that data sharing was introduced with DB2 Version 4), was able to detect more quickly that a data set that it was updating and which had been open on multiple members of a data sharing group was now physically closed on the other members. As a result, the subsystem could take the data set out of the group buffer pool dependent state sooner, thereby eliminating associated coupling facility group buffer pool requests. DB2 Version 6 introduced the MEMBER CLUSTER option of CREATE TABLESPACE, enabling organizations to reduce coupling facility accesses related to space map page updating for tablespaces with multi-member insert "hot spots" (these can occur when multiple application processes running on different DB2 members are driving concurrent inserts to the same area within a tablespace). DB2 Version 8 took advantage of a Coupling Facility Control Code enhancement, the WARM command (short for Write And Register Multiple -- a means of batching coupling facility requests), to reduce the number of group buffer pool writes necessitated by a page split in a group buffer pool dependent index from five to one.

These code improvements were all welcome, but my all-time favorite efficiency-boosting enhancement is the Locking Protocol 2 feature delivered with DB2 Version 8. Locking Protocol 2 lowered data sharing overhead by virtually eliminating a type of global lock contention known as XES contention. Prior to Version 8, if an application process running on a member of a data sharing group requested an IX or IS lock on a tablespace (indicating an intent to update or read from the tablespace in a non-exclusive manner), that request would be propagated by XES (Cross-System Extended Services, a component of the z/OS operating system) to the coupling facility lock structure as an X or S lock request (exclusive update or exclusive read) on the target object, because those are the only logical lock states known to XES. Thus, if process Q running on DB2 Version 7 member DB2A requests an IX lock on tablespace XYZ, that request will get propagated to the lock structure as an X lock on the tablespace. If process R running on DB2 Version 7 member DB2B subsequently requests an IS lock on the same tablespace, that request will be propagated to the lock structure as an S lock on the resource. If process Q on DB2A still holds its lock on tablespace XYZ, the lock structure will detect the incompatibility of the X and S locks on the tablespace, and DB2B will get a contention-detected response from the coupling facility. The z/OS image under which DB2B is running will contact the z/OS associated with DB2A in an effort to resolve the apparent contention situation. DB2A's z/OS then drives an IRLM exit (IRLM being the lock management subsystem used by DB2), supplying the target resource identifier (for tablespace XYZ) and the actual logical lock requests involved in the XES-perceived conflict (IX and IS). IRLM will indicate to the z/OS system that these logical lock states are in fact compatible, and DB2B's z/OS will be informed that application process R can in fact get it's requested lock on tablespace XYZ. The extra processing required to determine that the conflict perceived by XES is in fact not a conflict adds to the cost of data sharing.

With locking protocol 2, the aforementioned scenario would play out as follows: the request by application process Q on DB2 Version 8 (or 9) member DB2A for an IX lock on tablespace XYZ gets propagated to the lock structure as a request for an S lock on the object. The subsequent request by application process R on DB2 Version 8 (or 9) member DB2B is also propagated as an S lock request, and because S locks on a resource are compatible with each other, no contention is indicated in the response to DB2B's system from the coupling facility, and the lock request is granted. Because IX and IS tablespace lock requests are very common in a DB2 environment, and because IX-IX and IX-IS lock situations involving application access to a given tablespace from different data sharing members are not perceived by XES as being in-conflict when Locking Protocol 2 is in effect, almost all of the XES contention that would be seen in a pre-Version 8 DB2 data sharing group goes away, and data sharing CPU overhead goes down. I say almost all, because the S-IS tablespace lock situation (exclusive read on the one side, and non-exclusive read on the other), which would not result in XES contention with Locking Protocol 1, does cause XES contention with Locking Protocol 2 (the reason being that Locking Protocol 2 causes an S tablespace lock request to be propagated to the lock structure as an X request -- necessary to ensure that an X-IX tablespace lock situation will be correctly perceived by XES as being in-conflict). This is generally not a big deal, because S tablespace lock requests tend to be quite unusual in most DB2 systems.

So, DB2 for z/OS data sharing technology, which was truly ground-breaking when introduced, has not remained static in the years since -- it's gotten better, and it will get better still in years to come. That's good news for data sharing users. If your organization doesn't have a DB2 data sharing group, make a new year's resolution to look into it. In any case, stop by the blog next week for my part 2 entry on the "then" and "now" of data sharing.

Tuesday, December 29, 2009

Clearing the Air Re: Indexes on DB2 for z/OS Partitioned Tables

With the end of the year in sight, it's a good time to tie up loose ends, as we say here in the USA. Thus it is that I've decided to focus, in this last post to my blog in 2009, on indexes as they pertain to DB2 for z/OS partitioned tables. That subject qualifies as a "loose end," because several years after the introduction of table-controlled partitioning with DB2 for z/OS V8, some folks are still not certain as to what can and cannot be done with indexes defined on partitioned tables. I'll try, in this entry, to clear things up.

When a table is partitioned by way of an index specification (the only way to partition a table prior to DB2 V8), index options for the table are pretty straightforward. The index that describes the partitioning scheme (i.e., the one with the PART integer VALUES (constant) clause in the CREATE INDEX statement) is called the partitioning index. No other index on that table is called a partitioning index, and no other index on the table is physically partitioned. Any index on a unique key can be defined as UNIQUE.
Only the partitioning index can be the table's clustering index.

Starting with DB2 V8, table partitioning can be controlled by way of a table's definition (through the PARTITION BY and PARTITION integer ENDING AT (constant) clauses of CREATE TABLE). Table-controlled partitioning (enhanced in DB2 V9 via the partition-by-range universal tablespace) is way better than index-controlled partitioning, but this important DB2 advancement did change -- considerably -- the landscape as far as index options are concerned. First and foremost: for a table-controlled partitioned table, any index on a key that starts with the table's partition-by column or columns is called a partitioning index
(and "starts with" means that the columns of a multi-column partitioning key appear in the order specified in the CREATE TABLE statement); so, if a table's partitioning key is COL_X, COL_Y then an index on COL_X, COL_Y, COL_A is a partitioning index, and so is an index on COL_X, COL_Y, COL_B (but an index on COL_Y, COL_X, COL_D would not be a partitioning index, because the order of the partition-by columns does not match the order specified in the table's definition). Among the implications of this rule: a table-controlled partitioned table may have several partitioning indexes, or it may not have any partitioning indexes. Furthermore, a partitioning index may or may not be physically partitioned. Continuing with the example of the table partitioned on COL_X, COL_Y, an index on COL_X, COL_Y, COL_D that does not have the PARTITIONED clause in its definition is partitioning (because its key begins with the table's partitioning key) but not partitioned (because it was not defined with the PARTITIONED clause). Possible, of course, doesn't necessarily mean advisable -- I don't see why you would have a partitioning index that is not also partitioned.

Next: for a table-controlled partitioned table, a secondary index (i.e., one that is not a partitioning index) can itself be partitioned -- you just have to specify PARTITIONED in the definition of the index (this will cause the index to be physically partitioned along the lines of the underlying table, so that partition 1 of the secondary index will contain the keys for rows in partition 1 of the table). A secondary index that is partitioned is called a data-partitioned secondary index, or DPSI (an index that is not partitioned is called a non-partitioned index, or NPI).

Third: there is a restriction on DPSIs with regard to uniqueness (a restriction that was loosened somewhat with DB2 V9). In a DB2 for z/OS V8 environment, no DPSI can be defined as unique (and remember: a DPSI is a partitioned index that is not a partitioning index -- a partitioning index can be unique). If you try to create a secondary index with both UNIQUE and PARTITIONED in the index definition, you'll get a -628 SQL code (and an accompanying error message indicating that "clauses are mutually exclusive"). In a DB2 V9 environment (and this was recently pointed out by DB2 consultant Peter Backlund in a thread on the DB2-L discussion forum), a DPSI can be defined as UNIQUE if the index key contains the table's partition by column or columns. Once again, consider the table partitioned on COL_X, COL_Y. A DPSI defined on COL_A, COL_Y, COL_B, COL_X could have the UNIQUE attribute because the index key contains all of the table's partition-by columns (and note that the partition-by columns do not have to be in any particular order within the DPSI's key -- they just have to be present within the key). Can there be multiple unique DPSIs defined on a DB2 V9 table-controlled partitioned table? Yes -- again, what's required is that the underlying table's partition-by columns be included in the key of a DPSI that is to be defined as UNIQUE.

Finally: with regard to the CLUSTER attribute, you have flexibility with a table-controlled partitioned table that you don't have with an index-controlled partitioned table. For an index-controlled partitioned table, the partitioning index will be the table's clustering index. For a table-controlled partitioned table, any one index can be the table's clustering index (and of course, a table can only have one index with the CLUSTER attribute). The clustering index could be a partitioning index or a secondary index (whether a DPSI or an NPI). The ability to cluster a table with one key and partition it by another key is, in my opinion, one of the key advantages of table-controlled partitioning over index-controlled partitioning (other pluses include the ability to add partitions to a table-controlled partitioned table, and the ability to rotate partitions in a "first to last" manner).

Is all that clear? I hope so. Table-controlled partitioning is a VERY good thing -- well worth the effort of getting your arms around the new rules regarding indexes on table-controlled partitioned tables.

Throughout my 27 years in IT, I've enjoyed the constancy of opportunities to learn new things. I look forward to more of the same in 2010. Have fun ringing in the new year!