<?xml version='1.0' encoding='UTF-8'?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'><id>tag:blogger.com,1999:blog-6806654330436722244</id><updated>2008-11-20T12:13:47.725-08:00</updated><title type='text'>Catterall Consulting</title><subtitle type='html'></subtitle><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/roberts_blog.html'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default?start-index=26&amp;max-results=25'/><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://www.catterallconsulting.com/atom.xml'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>54</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-8041034068189423201</id><published>2008-11-20T05:55:00.000-08:00</published><updated>2008-11-20T12:13:47.741-08:00</updated><title type='text'>As DB2 Evolves, so, too, do DBA Roles</title><content type='html'>&lt;span style="font-family:arial;"&gt;I recently gave a presentation on DB2 stored procedure trends and technology to a room full of application developers and DB2 DBAs, part of a "lunch and learn" program organized by a large company in the retail industry.  Seated next to me at my table was a DB2 DBA manager, and as I sat down after delivering my presentation I mentioned to him that recent DB2 changes emphasizing application enablement (e.g., native SQL procedures in DB2 9 for z/OS, global variables in DB2 9.5 for Linux/UNIX/Windows, the &lt;a href="http://www-01.ibm.com/software/data/studio/"&gt;IBM Data Studio&lt;/a&gt; development tool) and reducing the need for "hands on" system administration (examples include automated memory and disk storage management enhancements in DB2 9.5 for LUW, and partition-by-growth tablespaces in DB2 9 for z/OS) offered an opportunity for companies to re-think the ways in which DB2 DBAs can be engaged to deliver value to the organization.  I went on to say that I was particularly interested in the rise of what might be called the application-focused DB2 DBA.&lt;br /&gt;&lt;br /&gt;The DB2 DBA manager smiled and told me that his group had indeed recognized and acted on that opportunity by creating a new role, called "Procedural DBA".  The title emphasizes stored procedure technology, which -  with DB2 for z/OS V9's great-leap-forward support for native SQL procedures, about which I &lt;a href="http://www.catterallconsulting.com/2008/11/db2-9-for-zos-stored-procedure-game.html"&gt;recently blogged&lt;/a&gt; -  provided the DB2 DBA team with something to "sell" that the retailer's application development teams were very ready to buy.  These application developers are -  as they should be -  focused on delivering functionality needed by the business.  They use modern programming languages and techniques, they need access to DB2 databases on both mainframe and distributed systems platforms, and they need to move fast -  which means, among other things, that they don't want to have to have the detailed DB2 database schema knowledge needed to embed SQL DML statements (SELECT, INSERT, UPDATE, DELETE) in their business-tier programs.  For the development teams, being able to get the back-end DB2 database work done via stored procedures, and being able to call those procedures in a "native" way with respect to the programming language being used (e.g., via IBM's JDBC or ADO.NET drivers), is a very attractive proposition.&lt;br /&gt;&lt;br /&gt;That's the demand side of the DB2 stored procedure equation at the big retailer.  On the supply side, the DBA manager sees that all the pieces are now in place.  With the native SQL procedure support delivered in DB2 for z/OS V9, preparation and deployment of stored procedures written in SQL no longer depends on having a C compiler (the C compiler requirement was similarly removed via DB2 V9 for the Linux, UNIX, and Windows platforms).  Not only that, but the SQL programming extensions made available for native SQL procedures on the mainframe platform (such as nested compound statements) make it easier than ever to develop SQL procedures that can be deployed on different DB2 server platforms.  Throw in a couple of cherries on top like the zIIP-eligibility of native SQL procedures called via DRDA (for more cost-efficient mainframe computing) and the great SQL procedure development interface of the aforementioned IBM Data Studio tool, and you've got a package that looks great to everyone from developers to systems programmers.&lt;br /&gt;&lt;br /&gt;I told the DBA manager that this all sounded pretty cool, and that I'd expect some "traditional" DB2 DBAs to be eager to take the Procedural DBA role for a spin.  Again a smile from the manager, and a gesture to person across the table: "Here's one of our first.  You can ask her about it."  The newly-minted Procedural DBA did indeed show a lot of enthusiasm in talking about her role.  She liked all kinds of things about it, and found especially appealing the fact that it involved working with DB2 on both the mainframe and LUW platforms (DB2 for z/OS and DB2 for LUW have some systems programming and database administration differences, but from an application perspective they are virtually identical).  She also welcomed the opportunity to move in a new technical direction and to pick up and hone some new skills.  She and her Procedural DBA colleagues are poised to accelerate application implementation (a longtime president of Southwest airlines once said that a plane makes zero money when it's on the ground, and an application doesn't make or save money until its in production), because they'll be involved in the development process from concept through data modeling and data architecture all the way to deployment -  no heaving stuff over the transom with a "You guys take it from here" (or, on the other side, picking up the pieces of stuff so heaved).  The new Procedural DBA knows that she's caught a good wave, and she's looking forward to riding it for a long time.&lt;br /&gt;&lt;br /&gt;New technology can benefit an organization.  When the organization itself changes (for example, by defining new roles) to take advantage of new technology, that potential benefit can be substantially magnified.  I spent time the other day with people from a large retail-industry company who really get this.  That makes for a good day indeed.&lt;br /&gt;&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/8041034068189423201/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6806654330436722244&amp;postID=8041034068189423201' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/8041034068189423201'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/8041034068189423201'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/2008/11/as-db2-evolves-so-too-do-dba-roles.html' title='As DB2 Evolves, so, too, do DBA Roles'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-3853875785906589908</id><published>2008-11-14T05:07:00.000-08:00</published><updated>2008-11-14T13:25:39.627-08:00</updated><title type='text'>DB2 9 for z/OS -  A Stored Procedure Game-Changer</title><content type='html'>&lt;span style="font-family:arial;"&gt;Stored procedures have long been a mainstay of enterprise-class applications built on a DB2 for Linux/UNIX/Windows foundation.  They provide for robust security (among other things, keeping the SQL DML statements server-side helps to restrict what people with ill intent can learn about your database schema, thereby reducing exposure to hack attacks), scalability (stored procedures can significantly reduce network traffic), and flexibility (packaging table-touching SQL statements in stored procedures is a nice way of abstracting DBMS particulars from client-side developers, allowing these folks to focus more of their coding efforts on business logic).&lt;br /&gt;&lt;br /&gt;On the mainframe side, the story's been a little different, with stored procedure usage increasing since delivery of the feature with DB2 for z/OS Version 4, but not quite achieving what I'd call "escape velocity" (with the notable exception of some organizations that have embraced mainframe DB2 stored procedures in a major way).  I believe that DB2 9 for z/OS, available since March of 2007, will change that situation.&lt;br /&gt;&lt;br /&gt;It is not surprising that DB2 stored procedure usage didn't skyrocket from the get-go on the mainframe platform.  For one thing, z/OS users already had some robust and reliable solutions for the server-side packaging of DB2 SQL statements, namely, the CICS and IMS/TM transaction management subsystems.  Additionally, DB2 for z/OS stored procedures had some functionality holes early on that needed filling before some organizations would commit resources to leveraging the technology.  With that in mind, I'd like to review some of the stored procedure enhancements that have come out in various releases of DB2 over the past dozen years:&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;DB2 V4:&lt;/span&gt; Stored procedure functionality introduced.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;DB2 V5:&lt;/span&gt; Program calling a stored procedure can FETCH rows from a result set (previously, data could only be returned via output parameters -  a problem for varying-size and many-row result sets).&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;DB2 V6:&lt;/span&gt; CREATE/ALTER/DROP PROCEDURE statements added to DB2 DDL (before that, DBAs had to manually insert/update/delete rows in the SYSPROCEDURES catalog table).&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;DB2 V7:&lt;/span&gt; SQL Procedure Language introduced, enabling programmers (and development-oriented DBAs) to code stored procedure programs in SQL, versus having to embed SQL statements in stored procedure programs written in COBOL or C (to make this work, DB2 SQL was extended to include logic flow-control statements such as GOTO, IF, ITERATE, LEAVE, LOOP, REPEAT, and WHILE).&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;DB2 V8:&lt;/span&gt; Better synergy between DB2 and the z/OS Workload Manager in terms of optimizing the number of tasks in a WLM-managed stored procedure address space, plus the ability to specify an abend limit (a number of execution failures after which a stored procedure is placed in stopped status) at the individual stored procedure level (especially handy in a development environment).&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;DB2 V9:&lt;/span&gt; Native SQL procedures, meaning stored procedures that are written in SQL and which execute in the DB2 Database Services address space (aka DBM1).&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family:arial;"&gt;DB2 V9 native SQL procedure support is what I see as being the game-changer with regard to stored procedure usage on the mainframe platform.  Why?  Two reasons.  First, it provides for a more streamlined stored procedure invocation process, as pointed out by Peggy Zagelow, one of IBM's senior software developers, in &lt;a href="http://www.ibm.com/developerworks/blogs/page/pegggggy?entry=native_sql_procedures_in_dbm1"&gt;a blog entry&lt;/a&gt; earlier this year.  With external procedures (and SQL procedures of the non-native variety end up executing as external stored procedures coded in C), the underlying program needs a language environment in which to execute.  This is provided by a WLM-managed address space, and when the external procedure is called the caller's DB2 thread is suspended while it's switched from the caller's task (SRB or TCB) to the TCB associated with the external procedure program.  In some real-world cases, this thread suspension has led to processing delays and increased DBM1 virtual storage consumption.  Such problems can be dealt with through adjustments in WLM policy goals and/or DB2 thread limits, but with native SQL procedures they are eliminated entirely.  A native SQL procedure exists in the form of a package, which is -  as packages always have been -  a "runtime structure" generated from the SQL statements to be executed.  When a native SQL procedure is called, DB2 just switches from the caller's package to the stored procedure package.  No thread-suspension-and-task-switching, and therefore no delay in stored procedure execution.  Another benefit of the run-in-DB2 model is the elimination of instruction pathlength associated with crossing back and forth between DBM1 and a stored procedure address space for each SQL statement issued by an external procedure (I don't have numbers, but I expect that native SQL procedures are quite competitive with COBOL external procedures with respect to CPU consumption).&lt;br /&gt;&lt;br /&gt;The second game-changing aspect of native SQL procedures is exploitation of zIIP processing resources (referring to the z9 Integrated Information Processor).  zIIPs, as you may know already, are specialty engines on a mainframe that can run eligible workloads and which do not factor into mainframe software pricing (as general-purpose engines do).  A native SQL procedure is zIIP eligible if it is invoked via a remote call through the Distributed Data Facility component of DB2 (commonly called DDF).  Tests have shown that the amount of CPU processing directed to a zIIP can exceed 50% for some stored procedures.  These zIIP MIPS are as inexpensive as they get on the mainframe platform, and native SQL procedures offer a great way to use 'em.  Why the restriction of remote versus local calls regarding the zIIP eligibility of native SQL procedures?  I believe that it reflects IBM's long-term vision of the role of DB2 on System z: a super-reliable, super-scalable, high-ratio-of-capacity-to-footprint data server supporting various application servers (e.g., WebSphere/Java, WebLogic/Java, Windows/.NET, Ruby on Rails, etc.).  Basically, IBM with its zIIP initiative is giving you a financial incentive to move towards that architecture.&lt;br /&gt;&lt;br /&gt;In addition to those Big Two native SQL procedure pluses, you get some advantages on the programming front versus external SQL procedures.  Important in this regard is support for nested compound statements.  With the ability to use more than one compound statement in a SQL procedure, you can code compound statements within condition handlers, thereby providing for a native SQL procedure much more sophisticated error-handling capabilities versus an external SQL procedure.&lt;br /&gt;&lt;br /&gt;I feel pretty strongly that native SQL procedures are the way of the future as far as DB2 for z/OS is concerned.  If you are already using external SQL procedures, plan on migrating these to native SQL procedures in a DB2 V9 New Function Mode environment (not a difficult process).  If you are not yet using SQL procedures, I would encourage you to start using them.  To help you on your way, take advantage of the recently published IBM "red book" titled &lt;a href="http://www.redbooks.ibm.com/abstracts/sg247604.html?Open"&gt;&lt;span style="font-style: italic;"&gt;DB2 9 Stored Procedures: Through the CALL and Beyond&lt;/span&gt;&lt;/a&gt; (a complete update of an outstanding red book originally published in 2004 for DB2 V7).&lt;br /&gt;&lt;br /&gt;The mainframe DB2 stored procedure wave is now a big one.  Catch it.&lt;br /&gt;&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/3853875785906589908/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6806654330436722244&amp;postID=3853875785906589908' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/3853875785906589908'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/3853875785906589908'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/2008/11/db2-9-for-zos-stored-procedure-game.html' title='DB2 9 for z/OS -  A Stored Procedure Game-Changer'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-570252814089809296</id><published>2008-11-05T16:22:00.000-08:00</published><updated>2008-11-06T19:11:42.869-08:00</updated><title type='text'>A Couple of Oldie-But-Goodie DB2 Tablespace Questions</title><content type='html'>&lt;span style="font-family:arial;"&gt;Some DB2 for z/OS questions first asked years ago came to be moot due to advances in product technology and functionality.  It used to be, for example, that people wanted to know how to reduce index page lock contention, but that was before type 2 indexes (delivered in DB2 V4) eliminated locking on index pages.  Similarly, folks used to look for ways to get more than 64 GB of data into a table before that limit went to 16 TB and then to 128 TB (this for a non-LOB table -  the size limit is staggeringly large for a table with lots of LOB columns).&lt;br /&gt;&lt;br /&gt;On the other hand, there are some questions that were good 15 or more years ago and are still good today.  Two that fall into that category were put to me a few weeks ago:&lt;br /&gt;&lt;/span&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;How many tables should be created in a given tablespace?&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;Which tables in a database should be partitioned?&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;span style="font-family:arial;"&gt;The first of these questions is, I think, the more interesting of the two.  Consider some of the factors that can influence decisions regarding the ratio of tables to tablespaces:&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;Size matters, and sometimes it's all that matters.&lt;/span&gt;  If a table is going to occupy more than 64 GB of space on disk, the associated tablespace is going to have to be partitioned, and that means that only the one table can be assigned to the tablespace.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;Production versus non-production databases.&lt;/span&gt;  Generally speaking, non-production DB2 environments (i.e., those used for test and development purposes) differ from their production counterparts in that they have more tables (there will often be several different versions of each table, in different schemas used at different stages of an application's development) and fewer rows (tables in a performance test database may be close to production-sized, but other non-production tables may have row counts that are 75% or more below production levels).  Both these characteristics of non-production DB2 systems -  more tables, and smaller tables - make a larger tables-to-tablespaces ratio attractive from a database administration perspective.  That's what segmented tablespaces are for.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;DBD contention, or lack thereof.&lt;/span&gt;  Every DB2 database (and here I'm using the term "database" in the technical DB2 for z/OS sense, versus the generic notion of a set of logically-related tables) has associated with it a control block (a chunk of system-used information) called a database descriptor, or DBD.  Often, we don't think much about DBDs, but some applications are characterized by considerable DDL (data-definition language) activity - related, perhaps, to dynamic view creation for data security purposes - and dynamic (versus static) SQL (particularly common in a data warehouse/business intelligence system). In that kind of environment, one needs to give some thought to the potential for DBD contention associated with concurrently active DDL operations and (sometimes) with dynamic SQL DML (data manipulation language) statements executing at the same time as DDL statements. This might lead you to want more databases (in the DB2 for z/OS technical sense of the word) in your database (generic term).  That, in turn, could mean more tablespaces (a given tablespace is associated with one and only one DB2 database, again using the term "database" in the DB2 for z/OS technical sense), and if you have more tablespaces for the same number of tables then of course you'll have fewer tables per tablespace.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;Utilities.&lt;/span&gt; Keep in mind that DB2 utilities tend to operate at the tablespace level.  Depending on your situation, that could cause you to want a higher tables-to-tablespaces ratio (nice to back up a bunch of tables by image copying one tablespace) or a lower ratio (you might need the ability to recover an individual table to a prior point in time while leaving data in other tables at currency).&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family:arial;"&gt;Now, how about the partitioning question?  Some things to keep in mind in this regard:&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;Again, size matters.&lt;/span&gt;  As noted above, you have to partition a table if it's going to hold more than 64 GB of data.  That said, partitioning can make lots of sense for tables that are merely large (say, 500 MB or more), as opposed to being huge. Some utilities can operate at the partition level, and it can be beneficial to have the ability to reorganize data (for example) in a large table a partition at a time, rather than REORGing the entire tablespace.  The same goes for backup and recovery.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;Parallelism.&lt;/span&gt; While it's not impossible to get some DB2 for z/OS query-splitting activity when tables are not partitioned, parallelism is, to a very significant degree, driven by partitioning: the more partitions you have, the more parallel query processing you're likely to get (assuming that you have packages bound with DEGREE(ANY) and/or you set the value of the CURRENT DEGREE special register to ANY for dynamic SQL statements).  Keep in mind that query parallelism isn't just for data warehouse workloads.  It can also substantially reduce run times for read-intensive batch jobs running in operational (i.e., non-BI) systems.  Also keep in mind that DB2 query parallelism is a great way to utilize the processing capacity of zIIP engines on a mainframe.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family:arial;"&gt;More good news on the partitioning front: the new universal tablespace introduced with DB2 for z/OS V9 gives you all the data size and utility-granularity and query-parallelism benefits of traditional partitioned tablespaces, with the quick mass-delete and good "insert-into-the-middle" performance associated with segmented tablespaces.&lt;br /&gt;&lt;br /&gt;More or fewer tables per tablespace?  Partition or don't partition?  Still good questions, these -  and they are questions that don't have one-size-fits-all answers.  Know your options, think about the needs of your organization (and your own priorities from a database administration perspective), and make the decisions that are right for your environment.&lt;br /&gt;&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/570252814089809296/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6806654330436722244&amp;postID=570252814089809296' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/570252814089809296'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/570252814089809296'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/2008/11/couple-of-oldie-but-goodie-db2.html' title='A Couple of Oldie-But-Goodie DB2 Tablespace Questions'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-4781347372359790059</id><published>2008-10-29T15:42:00.001-07:00</published><updated>2008-10-29T18:14:17.568-07:00</updated><title type='text'>"Big C" or "little c," Champions All</title><content type='html'>&lt;span style="font-family:arial;"&gt;Earlier this week, at its Information on Demand conference in Las Vegas, IBM announced its &lt;a href="http://www.ibm.com/developerworks/wikis/display/champion/IBM+Data+Champions"&gt;Data Champions&lt;/a&gt; program, along with the first 23 people to be so designated.  I am delighted to be among this inaugural class of IBM Data Champions, in large part because of the other people in the group.  The 15 DB2-focused honorees (there are also Informix, U2, and Data Studio professionals on the list) are consultants and users who have made tremendous contributions to the worldwide DB2 community.  IBM, through this new program, has made us "Big C" (i.e., "official") Champions, but what we have in common -  and I suppose that this is the point of the award -  is that we have worked to further the efforts of the legions of "little c" (as in not officially known-as) champions all over the world who stand up and speak out for DB2 every working day of the year.&lt;br /&gt;&lt;br /&gt;Having long been a "little c" DB2 champion myself, I think that I can speak to what motivates so many people to be active DB2 advocates, even when there is no direct corresponding financial reward (as might be earned by an IBM software sales representative).  It comes down to having a strongly-held view that DB2 represents the very best in database technology -  in terms of advanced functionality, in terms of reliability and availability, in terms of scalability and performance, and in terms of value received for the dollar (or whatever your unit of currency) spent.  During my first year with IBM, when I was training to be a Systems Engineer (an in-the-field technical support person), I delivered a presentation that impressed one of my instructors.  "You ought to consider going into sales," he said, but I wasn't interested in that line of work.  I never was comfortable with trying to persuade someone to buy something.  And yet, I've long spoken forcefully in favor of using DB2 for high-visibility, high-stakes applications -  whether as an IBMer, as a DB2 user, or as an independent consultant.  How to explain the seeming contradiction?  Simple: I don't like to "sell," but I'm very much ready to speak to the advantages of building applications on a DB2 foundation.  I'm just educating people -  telling them what I feel they ought to know.  This may sound like sales talk, but it doesn't feel that way to me.&lt;br /&gt;&lt;br /&gt;And so it is that all kinds of folks -  right now -  are championing DB2.  Many of these individuals -  including, as mentioned previously, my fellow "Big C" Champions -  have taken things a step further by working to equip "little c" champions with the knowledge that they need to be successful in their DB2 advocacy efforts.  We facilitate that knowledge transfer through our leadership of regional and international DB2 user groups, through our presentations and Webcasts, and through our writing in technical journals and blogs.  We help people to learn about new DB2 features, we share best practices, and we encourage people to think big with respect to using DB2 technology in new ways.  We're Champions working with champions.  That's called community.  It's something that money doesn't buy, and it's something that DB2 has in spades.  I encourage you to be a part of it.&lt;br /&gt;&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/4781347372359790059/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6806654330436722244&amp;postID=4781347372359790059' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/4781347372359790059'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/4781347372359790059'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/2008/10/big-c-or-little-c-champions-all.html' title='&quot;Big C&quot; or &quot;little c,&quot; Champions All'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-7696146418732333160</id><published>2008-10-23T18:43:00.001-07:00</published><updated>2008-10-23T20:52:49.467-07:00</updated><title type='text'>So, What Makes for "Good" DB2 I/O Performance?</title><content type='html'>&lt;span style="font-family: arial;"&gt;Recently, someone e-mailed me a portion of a DB2 monitor accounting report for an application program, and asked me if any of the numbers in the report were of the "red flag" variety (i.e., indicative of a performance problem).  In responding, I mentrioned that one figure did indeed stand out in a negative way: the 30 millisecond average wait time per synchronous read I/O (meaning: on average, every time DB2 had to bring a particular table or index page -  as opposed to a chunk of pages in a prefetch request -  into the buffer pool from the disk subsystem on behalf of the application program, said program had to be suspended for 30 milliseconds waiting for the I/O operation to complete).  Funny thing is, 30 milliseconds of wait time per synchronous DB2 read was once considered to be good performance.  How times have changed.&lt;br /&gt;&lt;br /&gt;When I first started working with DB2 in the latter half of the 1980s, IBM's 3380 disk storage device was king of the hill.  The associated 3880 storage controllers had little, if any, cache memory (maybe 16 or 32 megabytes), and the cache management algorithms were not very sophisticated.  On top of that, data was transmitted between disk subsystem and mainfrane DB2 server (DB2 for Linux/UNIX/Windows was not yet on the scene) over bundles of copper wire.  It all seemed pretty fast to us at the time, and DB2 users did indeed aim to get the average wait time per synchronous read I/O below 30 milliseconds (20 milliseconds per sync read was indicative of really good performance).&lt;br /&gt;&lt;br /&gt;The 1990s saw huge leaps forward in the capabilities of high-end disk storage systems (and saw these devices become options for users of LUW servers as well as for mainframes).  Disk controller cache sizes jumped way up to multi-gigabyte territory, and the controllers got maximum bang from that resource thanks to advanced cache-management software that ran on powerful CPUs built into the units (a cool example of sophisticated cache handling was adaptive staging, whereby the disk controller monitored data-access patterns and shifted between cylinder-at-a-time and track-at-a-time staging of data to cache memory depending on whether sequential or random access was predominant at the time).  Non-volatile memory for super-fast disk write operations became a standard feature, and fiber-optic disk-to-server connections really opened thr throttle with respect to data transfer rates.  I well remember the first time -  around 1995 or so -  that I looked over a DB2 monitor statistics report at an organization that had implemented a disk storage subsystem with gigabytes of controller cache.  I was amazed to see that the average wait time per synchronous DB2 read I/O was about 5 milliseconds.  Since that time, speed has increased even more, to the point that some DB2-using companies see average sync read wait times in the vicinity of 2 to 3 milliseconds -  an order of magnitude better than what I'd seen in the DB2 Version 1 days.&lt;br /&gt;&lt;br /&gt;So, what if you are a DB2 user and you see an average wait time per synchronous read I/O that's in the 20- to 30-millisecond range.  That is NOT good by today's standards, but what can you do about it?  Some thoughts:&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;&lt;span style="font-style: italic;"&gt;Spinning disk -  even fast disk -  ain't enough&lt;/span&gt;.  To get great I/O performance, you need a lot of hits in the disk controller cache.  To get that, you probably need gigabytes of cache memory in front of your disk volumes.  &lt;span style="font-style: italic;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;&lt;span style="font-style: italic;"&gt;Check the size of your DB2 buffer pool configuration&lt;/span&gt;.  Whaddya got?  A couple hundred meg worth of buffer space?  HELLO!  It's a 64-bit world, folks!  A buffer pool config that's less than a gigabyte in size is kind of dinky, in my book.  "OK," to me, means at least 2-4 GB, and "big" is north of 10 GB (yeah, you need the server memory to back it up, but you can get hundreds of gigabytes of memory on a high-end system these days).  A too-small buffer pool means that your disk storage subsystem will get pounded, I/O wise (maybe hundreds of I/O operations per second), and even high-performance disk devices can get bogged down with I/O contention.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;&lt;span style="font-style: italic;"&gt;Clustering matters&lt;/span&gt;.  Did you put much thought into performance implications when you chose clustering keys for your DB2 tables?  Locality of reference (i.e., rows to be retrieved by a program being in close physical proximity to each other) can make a very big difference in the number of pages that DB2 has to examine (and bring into the buffer pool if they're not already there) in executing SQL statements.  Are your programs getting 20 rows from 1 or 2 pages, or from 20 pages?&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;&lt;span style="font-style: italic;"&gt;Data compression can be a performance-booster&lt;/span&gt;.  It's not just about saving disk space.  Mainframers have known this for a long time, and now DB2 for LUW has a great data compression capability, as well.  When you compress a table, the number of rows per page will typically increase by 2-3 times, and that can mean a drop in page I/O requests.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;&lt;span style="font-style: italic;"&gt;Do NOT try to fool the DB2 optimizer&lt;/span&gt;.  I've heard of sites where they fudge DB2 catalog statistics in an attempt to get DB2 to not choose prefetch for data access when doing access path selection.  This is typically done because of an impression that prefetch reads "get in the way" of single-page synchronous reads, thereby slowing the latter.  Folks, today's DB2 optimizer is the product of 25 years of development effort and experience, and it knows what it's doing.  If you don't want DB2 to prefetch a lot of pages when a certain SELECT statement is executed, try telling it that you want the first few rows of the result set ASAP (via the OPTIMIZE FOR n ROWS clause), or that DB2 can quit fetching after a few rows have been retrieved (FETCH FIRST n ROWS ONLY).  When, through catalog stats that do not reflect reality, you trick DB2 into thinking that just a few data and/or index pages will be scanned to generate a result set when in fact lots of pages will be examined, you will very likely end up driving lots more single-page synchronous reads than should be occurring for your workload, and that can really gum things up.  It's best to be honest with DB2 -  keep your stats current and accurate.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family: arial;"&gt;Will technology advances eventually get us to where we expect average wait time per synchronous DB2 read I/O to be under a millisecond?  Based on what I've seen over the past 20 years, I'd lean towards a "yes" on that one.  Us verterans will then have fun regaling young 'uns with stories of these things on which we used to store DB2 data.  "Disks," we called them...&lt;br /&gt;&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/7696146418732333160/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6806654330436722244&amp;postID=7696146418732333160' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/7696146418732333160'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/7696146418732333160'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/2008/10/so-what-makes-for-good-db2-io.html' title='So, What Makes for &quot;Good&quot; DB2 I/O Performance?'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-116053533204868530</id><published>2008-10-16T08:37:00.000-07:00</published><updated>2008-10-16T10:02:22.374-07:00</updated><title type='text'>DB2 Notes from Warsaw (Day 4)</title><content type='html'>&lt;span style="font-family: arial;"&gt;The 2008 International DB2 Users Group European Conference concluded a little while ago, and I'm looking forward to seeing many of my fellow attendees (and some new people) at the 2009 conference that will take place October 5 - 9 in Rome.  This was another good day, from my perspective.  Some snapshots:&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;&lt;span style="font-weight: bold;"&gt;Mainframe DB2 is rockin' and rollin'.&lt;/span&gt;  Roger Miller, a veteran member of the DB2 for z/OS development team at IBM's Silicon Valley Lab, gave his session attendees an update on the topic of DB2 for z/OS performance.  He talked up the z10, IBM's top-of-the-line mainframe server.  Processing capacity has long been a core strength of the mainframe platform, but this thing is a monster (in a good way, of course).  The z10's engines are more 50% faster than those of its predecessor, and you can get up to 64 of the CPUs in one box (and z/OS manages a large number of processors very effectively).  Want a lot of memory on your z10 server?  You can get up to 1.5 terabytes.  What will people do with all those MIPS?  How about running a lot of the native SQL stored procedures that you can deploy in a DB2 V9 environment?  Roger talked about performance tests of DB2 V9 on a z/10 server, with thousands of stored procedure calls per second.  Native SQL procedures generally consume 30-40% less CPU time than external SQL procedures, AND -  when invoked via DDF (the Distributed Data Facility component of DB2 for z/OS) -  they can run on zIIP processors (specialty engines that -  unlike general-purpose CPUs -  do not factor into mainframe software pricing).  Roger also talked about advances in disk I/O technology and mainframe I/O connections that benefit DB2 for z/OS performance: a chunk of 32 4KB pages can be brought into a DB2 buffer pool via prefetch in one millisecond.  A number of organizations are migrating their DB2 Version 8 subsystems to Version 9, and Roger indicated that these companies can expect to see CPU reductions of around 3% upon migrating to Version 9 in Conversion Mode, and another 3% or so (for a total CPU efficiency gain of about 6%) once DB2 9 is in New Functon Mode and new performance-enhancing features are being exploited.  Some of the best performance gains are expected for programs that access LOB (large object) data, as multiple improvements have been made to this component of DB2.  Roger also told attendees that the multi-row fetch and insert capability provided with DB2 V9 can dramatically reduce CPU costs for data-intensive programs, with the magnitude of this positive effect perhaps being greatest for distributed database applications that access DB2 via DDF.  Roger concluded by offering up some goodies that people can look for in "version next" of DB2 for z/OS, including lots more concurrently active threads, lots more stuff moved above the 2 GB level in the DB2 database services address space, and a hash technique for super-fast data row location.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family: arial;"&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;&lt;span style="font-weight: bold;"&gt;The IBM panel did well, collectively and individually.&lt;/span&gt;  Several leaders of IBM's DB2 development group, including Curt Cotner (IBM Fellow and CTO for DB2) and Matt Huras (chief kernal architect for DB2 for Linux/UNIX/Windows) answered a variety of questions from attendees during a lively 90-minute session.  Some of the panel's answers had to do with new features expected to be delivered in the next release of DB2, with Jeff Josten of the DB2 for z/OS team mentioning an ALTER capability that will facilitate conversion of existing tablespaces to the new universal format (a combination of segmented and partitioned).  Others had to do with outreach to various application programming communities (Curt spoke of IBM initiatives that are making it easier for people coding in languages such as PERL, Python, PHP, and Ruby to interact with DB2).  This being DB2's 25th anniversary year, the panelists were asked to name their favorite all-time DB2 technology advances.  On the DB2 for z/OS side, data sharing on the parallel sysplex got a mention, as did stored procedures and distributed database application support.  Matt Huras cited the change to a threaded process model (from an agent model) for DB2 on Linux and UNIX servers, and John Hornibrook talked up the real-time statistics updates that enable the DB2 for LUW optimizer to make better access path decisions.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family: arial;"&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;&lt;span style="font-weight: bold;"&gt;Curt Cotner delivered a good "state of DB2" keynote address.&lt;/span&gt;  IBM's DB2 CTO talked about the enduring strength of DB2 for z/OS in the large-enterprise market, noting that 59 of the world's top 59 banks run DB2 for z/OS.  He also pointed out that UPS took top honors in a recent Winter Corporation survey of the world's largest databases and database workloads, with a peak load of over 1 billion SQL statements executed per hour on a DB2 for z/OS system.  The aforementioned (in the above item on the panel discussion) switch to a threaded process model for DB2 on Linux and UNIX servers (it already used a threaded process model in a Windows environment) was cited as an enabler of a more unified design for DB2 on the mainframe and Linux/UNIX/Windows platforms, since DB2 for z/OS has always used that process model (a fact  not well known due to nomenclature differences: the pieces of work that execute in a mainframe DB2 address space are called TCBs and SRBs, but they equate to Linux/UNIX/Windows threads).  Near the end of his presentation, Curt spoke of the importance of IBM's new Data Studio tool, particularly as a means of enabling DB2 DBAs to much more effectively support application developers whose programs access DB2 databases from Java-based and other application server environments.  Previously, it could be very difficult to tie a poor-performing SQL statement noticed by a DB2 DBA to the Java (for example) program that issued it.  With Data Studio in the mix, DB2 can build a repository of application metadata that can greatly facilitate the linking of an SQL statement with the issuing program, thereby providing a DBA with information that he (or she) can use to help the associated programmer code a more efficient data-accessing statement.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family: arial;"&gt;Tomorrow morning, it's back to Atlanta for me.  I've enjoyed my first visit to Poland, and I hope to return at some point in the future.&lt;br /&gt;&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/116053533204868530/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6806654330436722244&amp;postID=116053533204868530' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/116053533204868530'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/116053533204868530'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/2008/10/db2-notes-from-warsaw-day-4.html' title='DB2 Notes from Warsaw (Day 4)'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-1544546610072062670</id><published>2008-10-16T00:28:00.000-07:00</published><updated>2008-10-16T03:55:05.525-07:00</updated><title type='text'>DB2 Notes from Warsaw (Day 3)</title><content type='html'>&lt;span style="font-family:arial;"&gt;I'm blogging about my Day Three of the 2008 International DB2 Users Group European Conference on the morning of Day Four.  Yesterday (Wednesday) I delivered my presentation (on what I call ultra-availability).  Before that late-afternoon session, I was focused on business intelligence and data warehousing (much of the consulting work I've done over the past several months has been in this area).  The day ended with a DB2 25th birthday party sponsored by IBM and held at an old manor house in the country just outside of Warsaw (the long bus ride back to the hotel proved to be a bonus, as I'll explain later).&lt;br /&gt;&lt;br /&gt;Some of the take-aways from my Day Three:&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;BI is a hot topic in DB2-land.&lt;/span&gt;  I participated in the Business Intelligence and Data Warehousing Special Interest Group session (a SIG, in IDUG-ese -  basically a "birds of a feather" get-together).  The discussion was lively and covered a lot of ground.  One of the IBMers present reminded people of the special DB2 for z/OS pricing (called &lt;a href="http://www-01.ibm.com/software/data/db2/zos/edition-vue.html"&gt;DB2 for z/OS Value Unit Edition&lt;/a&gt;) available for qualifying BI workloads.  The benefits and costs of ETL (extract/transform/load processes) were kicked around, with references to the need (or not) to aggregate data values bound for the data warehouse, differences between operational system and BI system database designs, and the potential integration of ETL with an organization's overall information lifecycle strategy (the latter brought up by consultant Jan Henderyckx, a reliably forward-thinking individual).  The need for BI-supporting DB2 people to really understand what business users are trying to accomplish with a data warehouse was emphasized.  One of the DB2 for z/OS participants spoke of the performance breakthroughs that can be achieved through the use of materialized query tables (MQT support, a feature well-known by DB2 for LUW people, was delivered on the mainframe platform with DB2 Version 8).  Near the end of the discussion, participants talked about the growing popularity of "operational BI", especially on the system Z platform -  a trend driven users' desire to access detailed as well as aggregated data records, and the importance of "data immediacy" (time-proximity to data change events).&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;The latest on data warehousing and DB2 for z/OS.&lt;/span&gt;  Willie Favero, one of IBM's top DB2 experts, delivered an excellent session on data warehousing and DB2 for z/OS.  He started out by showing text from IBM's 1983 announcement of DB2 which positioned the DBMS as an excellent foundation for decision-support applications (this was before the term "data warehouse" emerged as a label for BI database systems).  Willie noted that two of the factors driving an increase in data warehouse activity on the mainframe DB2 platform are SLAs that match those for operational database systems (and so call for maximum uptime, a strength of DB2 and system Z), and ever-growing numbers of concurrent queries accessing data warehouses (the z/OS operating system is very good at managing very large numbers of concurrently active tasks).  A recently completed performance benchmark run at IBM's lab in Poughkeepsie, New York, was described: a 50-terabyte database, with a 300 billion-row table, and 200-300 query-issuing clients hitting the warehouse at one time.  Willie was very pumped up about the results of the benchmark, which will be documented in an IBM "red book" that should be out within the next couple of months.  An interesting slide in Willie's presentation showed the evolution of decision support applications from those that used query and reporting to gain insight into what happened, to deep analysis to try to better predict what will happen, to the current leading-edge systems that enable analysis of "right now" events so as to see more clearly things that are &lt;span style="font-style: italic;"&gt;happening now&lt;/span&gt;.  The benefits of DB2 for z/OS hardware-assisted compression were covered: 50% space savings, on average (with savings of 80% or more seen for some tables), with virtually no overhead on read (uncompress) operations.  A recent enhancement: starting with DB2 for z/OS V8, the 64 KB compression dictionary that goes with each compressed data set (each partition of a compressed partitioned tablespace has one) is stored "above the bar" (i.e., above the 2 GB level) in the DB2 address space.  Willie wrapped up with some discussion of DB2 for z/OS query parallelism: 1) use it, because it works great, 2) let DB2 determine the degree of parallelism for queries, because it does that very well, 3) remember that sysplex query parallelism can deliver massive parallel processing by splitting a query across multiple DB2 subsystems in a data sharing group, and 4) remember that query parallelism is one of the best ways to drive usage of a zIIP processor (a specialized mainframe engine that can offload certain types of work from the general-purpose CPUs but which does not affect mainframe software pricing).&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;People are asking the right questions about high availability.&lt;/span&gt;  I delivered a presentation on ultra-availability, pushing people to think beyond merely "great" availability to "ultimate" availability, which might be described as "never down, never lose anything, even in a disaster recovery situation."  In discussions with attendees after the session, I heard a lot about the costs -  hardware, software, programming -  associated with getting closer and closer to ultimate availability for a data-serving system.  Super-high availability can indeed be a pricey proposition, but IT people can serve their organizations by developing and costing solutions for ultra-availability (versus settling for less by assuming that the organization won't commit to achieving an audacious availability goal) and letting upper management make the call as to whether or not the potential payoff is worth the required investment.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;You have to love a line of work that can stay fun over a 5-decade career.&lt;/span&gt;  On the bus ride back to the Hilton following the aforementioned DB2 25th anniversary party, I had the pleasure of sitting next to Tor Stryker (I hope that I have the spelling right), a lead data architect for one of Norway's top insurance companies.  Tor is in his fifth decade in IT (all with the same company), and he still gets a tremendous kick out of helping to advance his organization's IT capabilities.  He spoke of the days, way back, when programmers had to write code that would move sections of their own programs in and out of server memory because the whole thing couldn't fit within the few kilobytes of available space.  Recently, he's helped to extend the functionality of legacy CICS-DB2 programs to Java-based client systems by enabling them to be accessed via DB2 stored procedure calls, and he has been a leader in the development of rules-based applications that can be extended, functionality-wise, much more quickly than old-style monolithic applications, thereby enhancing his company's operational agility for competitive advantage.  All of this information was delivered with smiles and enthusiasm that were inspiring.  Getting paid to learn and to innovate is indeed a good thing, Tor.  Thanks for the reminder.&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/1544546610072062670/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6806654330436722244&amp;postID=1544546610072062670' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/1544546610072062670'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/1544546610072062670'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/2008/10/db2-notes-from-warsaw-day-3.html' title='DB2 Notes from Warsaw (Day 3)'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-5789167109069672633</id><published>2008-10-14T04:59:00.000-07:00</published><updated>2008-10-14T15:47:53.779-07:00</updated><title type='text'>DB2 Notes from Warsaw (Day 2)</title><content type='html'>&lt;span style="font-family:arial;"&gt;A little rainy in the morning on this, the second day of the 2008 International DB2 Users Group European Conference, then just some high clouds after that.  Following are a few interesting (to me, at least) nuggets of information from my day of technical sessions and hallway conversations (it's been said that some of the most useful knowledge that one gains from attending a conference such as this one comes from the so-called "coffee track"):&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;The rules of table partitioning really did change a lot with DB2 for z/OS Version 8.&lt;/span&gt;  Kurt Struyf, a Senior Consultant with &lt;a href="http://www.cp.be/"&gt;Competence Partners&lt;/a&gt; in Belgium, delivered a very good presentation on the table-based (versus index-based) partitioning feature that was delivered with DB2 for z/OS Version 8.  He went over some of the benefits of table-based partitioning, such as the ability to partition a table's data using one key while clustering data within partitions using another key (with index-based partitioning, the clustering and partitioning indexes are one and the same).  He talked about data-partitioned secondary indexes (DPSIs), noting that while they can be good for availability (enabling the avoidance of the BUILD2 phase of online REORG, which in some cases can result in a multi-minute loss of data availability when a subset of a tablespace's partitions are reorganized online), they do not provide an application performance advantage.  Kurt also covered the increase in the number of partitions that can be defined for a tablespace (now up to 4096), the dynamic adding of a new partition to an existing partitioned tablespace, and the partition-swap capability that enables one to replace data in a partition so that what had been the first partiton now logically becomes the last (e.g., the partition that had held data for 1997 becomes the receptacle for 2008 data).  The presentation also laid out nomeclature changes that came with DB2 V8: a &lt;span style="font-style: italic;"&gt;partitioning&lt;/span&gt; index now is one that does not &lt;span style="font-style: italic;"&gt;control&lt;/span&gt; partitioning, but which begins with the column (or columns) of the partitioning key of a table-based partitioned table (and a table-based partitioned table can have several partitioning indexes).  Indexes defined on a table-based partitioned table that are not partitioning indexes are called secondary indexes, and these can be partitioned (with entries in separate files that correspond to the table's partitions) or not (Kurt recommended that partitioning indexes be made partitioned indexes, as well).  The session concluded with a description of changes in the output of the DISPLAY DATABASE command that support table-based partitioned tables, and with examples of various recovery and reorganization scenarios involving table-based partitioned tables.  Kurt's presentation, and others delivered here this week, will be available to IDUG premier-level members on IDUG's Web site (www.idug.org) within a couple of months, and will be available on the site to basic-level members nine months after that.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;DB2 for z/OS dynamic statement caching is NOT just about avoiding the CPU cost of statement preparation.&lt;/span&gt;  I had an interesting between-sessions discussion with Thomas Baumann, a long-time (and VERY knowledgeable) DB2 professional who works for Swiss Mobiliar, a large insurance company.  Thomas described how Swiss Mobiliar uses the information in the dynamic statement cache (extracted via EXPLAIN STMTCACHE ALL and EXPLAIN STMTCACHE STMTID) to monitor and tune dynamic SQL statements that run on the Company's production DB2 for z/OS system, and it seemed to me that while dynamic statement caching was initially touted as a way to avoid the CPU cost of re-preparing previously prepared dynamic SQL statements, the primary benefit of the technology to Swiss Mobiliar is in the area of enhanced performance management of the dynamic SQL workload.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;.NET is a great application server platform to pair with a DB2 database (whether DB2 for z/OS or DB2 for Linux/UNIX/Windows).&lt;/span&gt;  Frank Petersen of Bankdata (a large supplier of IT services to 15 Danish banks) gave an excellent presentation on the various ways in which a .NET-based application can work very well together with a DB2 database.  He compared and contrasted the Java and .NET application environments (both have their strong points), and described the ways in which a .NET program can access DB2 data (focusing primarily on the the ODBC, ADO.NET, and DB2 .NET data provider interfaces).  Going above and beyond a presenter's call of duty, Frank passed out to all session attendees a CD-ROM containing the source of a simple .NET-DB2 application program he had written, thereby providing a great starting point for further exploration of this programming technology.  Nick Manu, who works in Belgium for insurance giant AXA, was sitting next to me in this session, and he talked about the tremendous performance results he'd seen in working with DB2 as a back-end database for programs running on Linux/UNIX/Windows-based application servers.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;What should DB2 professionals be doing with their time and talents?&lt;/span&gt;  Jan Henderyckx, leader of the consulting firm &lt;a href="http://www.brainware.be/"&gt;BrainWare&lt;/a&gt;, delivered a thought-provoking presentation in which he urged DB2 professionals to focus on increasing the value that they deliver to their employing organizations by spending more of their time and creative energy on higher-value data-related activities, as opposed to just administering a DBMS.  To this end, Jan suggested that people work to develop a catalog of the database services that should be made available within their respective enterprises, along with risk assessments  associated with NOT having those services, together with potential organizational benefits that could be realized if the services were to be established.  Among the services promoted by Jan were data governance (including not only data quality but data "lineage," which has to do with where data came from and how it got to where it is), data movement (which includes data replication), metadata management (emphasizing the need for metadata to deliver "actionability" -  in other words, to actually be useful to people within the organization), and SOA (with an interesting description of the concept of an enterprise information platform).  A challenge mentioned by Jan: getting to a unified approach to data service management and delivery can be complicated by that fact that at many companies various data-related roles are spread across multiple departments within an IT organization.  He suggested that virtual teams of data professionals can be effective if the actual organizational structure can't be changed to bring data-related people together within a department.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;The DB2 optimizer: 25 years old (more if you count its developmental years before the general availability of DB2) and getting better all the time.&lt;/span&gt;  Terry Purcell, optimizer guru with the DB2 for z/OS development organization at IBM's Silicon Valley Lab, gave a presentation on recent enhancements to the DB2 for z/OS optimizer.  He began by laying out IBM's ongoing priorities as they pertain to optimizer development: 1) improve access path selection (largely a matter of making SQL statement access path cost estimation ever more accurate and comprehensive), 2) provide more access path choices, and 3) provide better access path stability.  Terry then reviewed some of the optimizer-related enhancements delivered with DB2 for z/OS V8 (including materialized query tables, improved star-join processing, backward index scans, and the ability to change a table's clustering index key), and some of the V8 SQL enhancements (such as upping the SQL statement size limitation to 2 MB, the ability to GROUP BY an expression, and support for common table expressions and recursive SQL).  After that Terry covered DB2 V9 optimizer-related enhancements (examples include plan stability, which is basically the ability to back up and restore access paths; histogram statistics for SQL statement optimzation; and the ability to create an index on an expression), along with V9 SQL-related enhancements (e.g., the MERGE statement, also known as "upsert"; OLAP improvements; and the SQL extensions INTERSECT and EXCEPT).  The last part of the presentation covered two of Terry's personal-favorite V9 optimizer enhancements.  The first of these is a sort-avoidance technique that can kick in when a statement has a large initial result set and contains an ORDER BY and a FETCH FIRST n ROWS (with "n" being a relatively small value).  Rather than sort the large initial result set (say, a million or more rows) just to return the first three rows of the ordered set (if FETCH FIRST 3 ROWS is specified), DB2 V9 will start going through the values in the ORDER BY column, comparing them in good-sized chunks at a time and all the while retaining the three smallest values found (again, assuming a specification of FETCH FIRST 3 ROWS -  the number could be any relatively small values).  In the end, the rows with the 3 smallest values of the ORDER BY column are identified in a much more efficient manner versus a full-blown sort operation.  Terry concluded with a review of the new SYSINDEXSPACESTATS.LASTUSED table, which is populated by the DB2 real-time statistics function and which indicates the last time an index was used (and I mean used for data location or for referential integrity purposes -  not just accessed because of a row INSERT or DELETE that drives the insert or deletion of an entry in an index leaf page).  This should be especially useful when it comes to figuring out which indexes could be dropped without hurting application performance -  if you see that a given index hasn't been used in three months, for example, it might be a good candidate for dropping (which would save disk space and reduce the cost of insert and delete and some update operations targeting the associated table).  People with dynamic SQL workloads will especially like this feature, since the SYSPACKDEP catalog table just shows indexes used by static SQL statements.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family:arial;"&gt;Tomorrow afternoon I'll deliver my presentation (on what I call ultra-availability), and I'll post another blog entry with things learned and discussed during the day.    &lt;br /&gt;&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/5789167109069672633/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6806654330436722244&amp;postID=5789167109069672633' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/5789167109069672633'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/5789167109069672633'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/2008/10/db2-notes-from-warsaw-day-2.html' title='DB2 Notes from Warsaw (Day 2)'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-7159987489532421086</id><published>2008-10-13T06:31:00.000-07:00</published><updated>2008-10-13T14:57:57.433-07:00</updated><title type='text'>DB2 Notes from Warsaw</title><content type='html'>&lt;span style="font-family:arial;"&gt;I'm in Warsaw, Poland, this week for the 2008 International DB2 Users Group European Conference.  It's been a good first day.  This is my first time in Poland, and I have to say that the level of human energy in Warsaw is quite impressive.  Construction cranes reach into the sky all around the city center, where multiple new commercial and residential high-rises are under construction (the Hilton, where we're meeting, is itself only about a year old).  I've eaten very well, and I'm planning on trying the sushi place adjacent to the hotel sometime later in the week.&lt;br /&gt;&lt;br /&gt;A few notes and observations from Day 1:&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;The kick-off keynote was first-rate.&lt;/span&gt;  &lt;a href="http://www.bbc.co.uk/blogs/olympics/2008/03/about_marc_woods.html"&gt;Marc Woods&lt;/a&gt;, a gold medal-winning paralympic swimmer, delivered an excellent talk on the trials and triumphs he's experienced as a competitive swimmer since losing the lower part of one of his legs to cancer at age 17.  I've sat through many a keynote speech in my time.  Some stick with me, and some don't.  This one will.  Two key take-aways: don't be afraid to set audacious goals, and know that some wins require multiple years of planning and effort (so don't give up prematurely).&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;IDUG membership is growing nicely.&lt;/span&gt;  Just prior to Marc's keynote, IDUG President Julian Stuhler addressed the opening session audience and shared some information about the ongoing growth of IDUG membership.  There are now more than 12,000 registered IDUG members, representing more than 100 countries.  Registering as an IDUG member is easily done at the IDUG Web site (www.idug.org).  Basic membership is free, and premier membership, which offers additional benefits, is available for a modest annual fee (or by attending an IDUG conference such as this one going on now in Warsaw).&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;How big is a "big" DB2 for z/OS buffer pool?  Bigger than you might think. &lt;/span&gt;Thomas Baumann, a DB2 professional who has worked for Swiss Mobiliar (a large insurance company) for the past 16 years, delivered one of his typically excellent and information-packed presentations, this time on the subject of DB2 for z/OS virtual storage management (and, in particular, how it relates to the use of dynamic SQL).  In his session, Thomas mentioned that Swiss Mobiliar's primary DB2 for z/OS production subsystem has a buffer pool configuration that's in excess of 14 GB, size-wise.  Swiss Mobiliar is running with DB2 V8, which brought 64-bit virtual and read storage addressing to the mainframe DB2 platform (up from the old 31 bits).  A few months ago, I posted a &lt;a href="http://www.catterallconsulting.com/2008/07/db2-for-zos-people-dont-be-memory.html"&gt;blog entry&lt;/a&gt; in which I urged DB2 for z/OS people to take advantage of 64-bit addressing, especially as it pertains to buffer pool sizing.  The folks at Swiss Mobiliar are certainly on board with regard to that message.  Thomas made an interesting observation: whereas people once thought of a 100,000-buffer pool as being big (that's 400 MB if we're talking about 4 KB buffers), now it's probably reasonable to think of a 100,000-buffer pool as being of medium size, with 1,000,000 or more buffers (4 GB if 4 KB buffers) being a better threshold for qualification as a "large" pool (and that's just one pool in what would likely be a multi-pool configuration).  Thomas's presentation (and all the others delivered here), crammed with performance analysis formulas and rules of thumb, will be available on IDUG's Web site within the next couple of months for IDUG premier member access, and nine months after that for basic-level member access.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;DB2 for z/OS V9 has some very attractive features related to tables and tablespaces.&lt;/span&gt;  Phil Grainger of CA delivered an informative presentation on this topic.  He mentioned that the new SQL statement TRUNCATE TABLE essentially provides a means of doing with SQL what could be done before via an execution of the LOAD utility with a dummy input data set (i.e., an empty file) and the REPLACE option: namely, empty a table of data very quickly (even more quickly than a mass DELETE of a table in a segmented tablespace -  a process that just marks space occupied by table rows as being empty, versus actually deleting the rows one-by-one).  Staying with that analogy, Phil explained that the new ADD CLONE option of ALTER TABLE, combined with the new EXCHANGE statement, enables one to &lt;span style="font-style: italic;"&gt;very&lt;/span&gt; quickly do with SQL what could be done by way of the LOAD utility with the REPLACE option and a non-empty input data set.  Basically, you create a clone of a base table, then populate that clone with the data with which you want to replace the base table data, and then make the clone the new base table through execution of an EXCHANGE statement (this causes DB2 to update its catalog so that the table name is associated with the data set(s) of what had formerly been the clone table).  Phil also talked up the benefits of the new universal tablespace (a tablespace that is both segmented&lt;span style="font-weight: bold;"&gt; &lt;/span&gt;&lt;span style="font-style: italic;"&gt;and&lt;/span&gt; partitioned), and of the new "by growth" table partitioning option (enabling one to get the benefits of partitioning -  especially useful for very large tables -  without having to specify partitioning range values).  Topping it off, we got some good information about reordered row format (now a DB2 standard), that being the label for a feature by which DB2 for z/OS V9 physically relocates varying-length columns to the "back" of table rows, "behind" all the fixed length columns, whilst continuing to return retrieved rows with the column order as specified in the CREATE TABLE statement (this under-the-covers column reordering enables DB2 to locate data in varying-length columns within a row in a much more efficient manner).&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;Trends favoring the use of DB2 for z/OS for data warehousing.&lt;/span&gt;  Over lunch with a couple of fellow attendees (including Lennart Henang of &lt;a href="http://henfield.se/"&gt;Henfield AB&lt;/a&gt;), I got into a discussion about the rise in data warehousing activity on the mainframe DB2 platform (about which I &lt;a href="http://www.catterallconsulting.com/2008/07/data-warehousing-on-db2-for-zos.html"&gt;blogged&lt;/a&gt; a few weeks ago).  It was mentioned that one driving factor could be the increased interest on the part of many organizations in getting closer and closer to "real time" BI, in which data changes are analyzed for insight very soon after (or even as) they occur, versus taking these changes and batch-loading them into a data warehouse nightly for analysis the next day).  When that source data is in a DB2 for z/OS database, that can lead to a desire to have analysis also occur on that platform.  In fact, the now-64-bit architecture of DB2 for z/OS (enabling much larger buffer pool configurations), combined with the unparalleled ability of the z/OS operating system to manage concurrent workloads with widely divergent characteristics, is leading some companies to think very seriously about running analysis-oriented queries against the actual DB2 for z/OS database that is accessed by the operational online transaction-processing workload.  Think this can't be done?  Think again.  I'm not saying that you should toss a data warehouse system in favor of just querying the OLTP database (the database design of a data warehouse is generally much better suited to OLAP and other forms of intense analysis than is a typical OLTP database design), but some querying of the operational database for BI purposes could be a useful component of an overall business intelligence application system that would also feature a true warehouse.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family:arial;"&gt;That's it for now.  More to come tomorrow.&lt;br /&gt;&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/7159987489532421086/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6806654330436722244&amp;postID=7159987489532421086' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/7159987489532421086'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/7159987489532421086'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/2008/10/db2-notes-from-warsaw.html' title='DB2 Notes from Warsaw'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-9025966003396580021</id><published>2008-10-06T22:31:00.000-07:00</published><updated>2008-10-07T21:04:21.192-07:00</updated><title type='text'>On DB2 and Table-Pinning</title><content type='html'>&lt;span style="font-family:arial;"&gt; I recently fielded, via e-mail, an entertaining question about "pinning" a DB2 table in memory (i.e., having the whole table resident in virtual storage on the DB2 server).  I use the term "entertaining" not  to poke fun at the question (or the questioner), but to convey that I enjoyed answering the question.&lt;br /&gt;&lt;br /&gt;Among some people who are familiar with certain non-IBM DBMSs but are relatively new to DB2, there can be a bit of confusion about table-pinning in a DB2 environment.  This may be due to the fact that table-pinning is covered in the documentation of some other DBMSs but is not mentioned in any of the DB2 manuals.  Thus it is that one might conclude that DB2 does not support the pinning of tables in memory.  In fact, DB2 has ALWAYS supported table-pinning -  the difference is in nomenclature and in the process by which a DB2 table gets pinned.&lt;br /&gt;&lt;br /&gt;In some non-DB2 DBMS environments, a table that is to be pinned in memory must be explicitly marked as such.  With DB2, this is not the case.  Why?  Because some DBMSs do not allow a database administrator or systems programmer to control the size of a buffer pool, nor do they provide support for multiple buffer pools.  Instead, there is a single buffer pool (aka a page cache) and the DBMS dynamically determines the amount of memory to be allocated to the pool (based on the available memory on the server and the nature of the database-accessing workload).  Given that situation, a table that is not marked as to-be-pinned is unlikely to be entirely resident in memory unless it is quite small and very frequently accessed.&lt;br /&gt;&lt;br /&gt;Now, don't get me wrong -  DBMS management of buffer pool sizing can be quite useful in some cases, and indeed DB2 Version 9 provides a system-managed buffer pool sizing option.  Pinning a table in memory, however, is about the database adminsitrator exerting control, and DB2 DBAs have been able to do this since Version 1 Release 1 back in the early 1980s.  How do I pin a DB2 table in memory?  Easy -  I just assign it to a buffer pool that has enough buffers to hold all the table's pages.  Here's a very simple example: I have a DB2 table, called XYZ, that occupies 1000 4K pages (the page size doesn't matter -  could be 4K pages, 8K, 16K, whatever).  I want to pin this table in memory.  I set up buffer pool BP8 (the actual buffer pool name doesn't matter) with 1000 4K buffers.  I assign table XYZ (actually, the tablespace in which XYZ is located) to BP8 and, voila, it's pinned in memory.  What if I also want to pin table ABC, with 500 4K pages, in memory?  No problem.  I can set up BP9 with 500 buffers and assign table ABC's tablespace to that buffer pool, or I can add 500 buffers to BP8 and have both XYZ and ABC pinned in that pool.&lt;br /&gt;&lt;br /&gt;Of course, you could go overboard with table-pinning.  I, for one, would not want to pin a 1-terabyte table in memory.  On the other hand, I wouldn't confine my pinning plans to itty-bitty tables.  In today's 64-bit addressing world, you really can think about pinning a pretty big DB2 table -  say, one that has a hundred thousand or more pages.  I might actually do that if said table were really critical to the performance of an important application.&lt;br /&gt;&lt;br /&gt;How big is too big, when you're talking about table-pinning?  The answer to that question doesn't involve rocket science.  First of all, I would start out NOT pinning any tables in memory, and going the pinning route if I need to  boost the performance of a certain program or programs by eliminating read I/O wait time to the fullest extent possible.  If I do decide to pin, the amount of table-pinning that I do (in terms of the number of pages belonging to pinned tables) will depend in large part on the amount of memory that is available on the server.  If my server has 50 gigabytes of memory, should I think about pinning 50 gigabytes' worth of tables in memory?  Of course not -  you don't ever try to use ALL of a server's memory for DB2.  You also, however, don't want to under-utilize the server's memory resource (your company paid for the memory, so you want to get maximum bang for that buck).  When I'm trying to determine whether or not I'm using too much of a server's memory for DB2 buffers, I take a look at the rate of demand paging to auxiliary storage (that's a mainframe term, but you Linux/UNIX/Windows people will know what I mean).  In other words, I want to see how often a referenced page (and I'm talking about ALL pages in memory, not just those holding DB2 data) has to be read in from auxiliary storage on disk (also known as page data sets in mainframe-ese) because it got paged out of virtual storage by the operating system (a system monitor product can provide this information).  If the rate of demand paging to auxiliary storage is less than one per second, my assessment would be that you're under-utilizing server memory, and a good way to rectify that situation would be to make more of that memory available to DB2 in the form of additional buffers, and one possible use of the extra buffer pool space would be more table-pinning.  If the rate of demand paging to disk is in the single digits of pages per socond, I'd still be fine with giving more memory to DB2.  If the demand paging rate is in the double digits per second, I might want to pass on bulking up the DB2 buffer pools and doing more table-pinning.&lt;br /&gt;&lt;br /&gt;Can you pin DB2 indexes in memory, as well as (or instead of) tables?  Of course you can.  As with tables, it's a matter of assigning an index to a buffer pool that has enough buffers to hold all of the index's pages.&lt;br /&gt;&lt;br /&gt;Something I mentioned earlier bears repeating: it's a good idea to start out NOT pinning any DB2 objects in memory.  If you do that (i.e., you don't pin anything), and application performance is good, there's no need to pin.  Even if you need to cut down on disk read I/Os in order to improve application response time and/or throughput, try doing that by enlarging the buffer pool before you try pinning anything.  If a larger buffer pool doesn't do the trick, you can think about pinning certain DB2 objects in memory.&lt;br /&gt;&lt;br /&gt;So, don't be put off by differences in nomenclature and database administration in a DB2 environment versus other DBMSs.  I put stuff in the trunk of my car, while my friends in the UK speak of stowing items in the boot, but we mean the same thing.  One has always been able to pin DB2 objects in memory, even though the DB2 doc doesn't use that particular term.  Go ahead and pin, my friends -  but pin wisely.&lt;br /&gt;&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/9025966003396580021/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6806654330436722244&amp;postID=9025966003396580021' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/9025966003396580021'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/9025966003396580021'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/2008/10/on-db2-and-table-pinning.html' title='On DB2 and Table-Pinning'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-7491172209768390198</id><published>2008-09-23T19:38:00.000-07:00</published><updated>2008-09-24T20:00:14.007-07:00</updated><title type='text'>Effective DB2 Application Tuning: Team-Based and Face-to-Face</title><content type='html'>&lt;span style="font-family:arial;"&gt;I recently spent an enjoyable few days engaged in a DB2 application performance tuning effort.  The work was fun because it was successful (we reduced the run time for some complex transactions by more than 90%), and it was successful largely because it involved a cross-functional team working in face-to-face mode.  We had business experts (who knew about the functionality required of the application), application developers, data experts (very knowledgeable with regard to the logical design of the underlying database), and DB2 specialists.  Everyone contributed to the successful outcome.&lt;br /&gt;&lt;br /&gt;DB2 technical expertise is certainly helpful when it comes to improving the performance of a DB2 application, but alone it may not be sufficient to achieve really dramatic results.  Enlarging a DB2 buffer pool configuration might help by reducing I/O wait time.  Reclustering a table might lead to better locality of reference for some SQL statements (and, therefore, fewer associated DB2 page read requests).  Adding an index to a table could provide a more efficient data access path for certain queries.  Increasing the percentage of free space in index pages (when the key is not continuously-ascending) could reduce leaf page split activity and help to preserve fast index scan access.  All good, certainly, but big-time gains are often achieved at the statement level (as opposed to the database- or system-level), and this is where you really want to be able to leverage the complimentary skills that an interdisciplinary team can bring to the table.&lt;br /&gt;&lt;br /&gt;Let me give you an example of what I'm talking about.     Cross-functional team sitting in a room, working to significantly reduce the run-time for a particular query at the heart of a complex transaction.  The query has an in-list non-correlated subquery predicate (of the form AND COL1 IN (SELECT...)), and there's an index on COL1 but it's not being used.  DB2 specialist in the room notes that the predicate references the inner table in a nested loop join operation, and an in-list non-correlated subquery predicate is not indexable in that situation. He goes on to point out that an in-list predicate with an actual list of values (literals or host variables, as in AND COL1 IN (:var1, :var2, :var3...)) as opposed to a subquery &lt;span style="font-style: italic;"&gt;would&lt;/span&gt; be indexable in this case. Data expert in the room indicates that the result set generated by the non-correlated subquery would be pretty small, numbering in the single digits of values.  Based on that information, an application developer in the room states that it would be pretty easy for the application code to build the query with an in-list with host variable values in place of the non-correlated subquery in-list (it's a dynamically constructed and executed SQL statement).  We try submitting the query with that change, DB2 uses an index in resolving the (now list-of-values) in-list predicate, and response time goes way down. Success, thanks to the contributions of several individuals on a query-tuning "SWAT team" (as the organization likes to call them) who applied their specialized knowledge in a complementary way.&lt;br /&gt;&lt;br /&gt;A similar example, from the same "SWAT" group: a different transaction is running too long, and the DB2 and data and application experts have already done what they can and are stumped as to what to do next to achieve the desired performance improvement. Business expert in the room speaks up and says that the really long-running part of the transaction is associated with functionality that in fact doesn't have to be delivered by that particular piece of the application.  Unneeded functionality removed - problem solved, thanks again to the presence of someone with the right domain knowledge in the room.&lt;br /&gt;&lt;br /&gt;Just about as important as having an application tuning team composed of people with DB2- and data- and application- and business-related expertise is having all of those people &lt;span style="font-style: italic;"&gt;in the same room&lt;/span&gt;. That may sound old-fashioned in this age of videoconferencing and virtual meetings, but I have time and again been impressed by just how much more effective a group of people working to solve a problem can be when they are working around the same physical (&lt;span style="font-style: italic;"&gt;not&lt;/span&gt; virtual) conference-room table. The ideas are better, and they come faster, and there can be a real snowball-rolling-down-the-hill effect, with people building on the contributions of others and adding their own contributions in a way that doesn't seem to happen when team members are distance-separated. Yes, travel costs raise expenses, but I feel that the return on that investment is typically very attractive. There have been plenty of times when I've been told that I have the choice of working with a group remotely or going to where the people are, and I always choose the latter route because I know that I'm much more productive when I can communicate in a face-to-face way with co-workers. Are organizations being "penny wise and pound foolish" when they refuse to foot the bill for getting problem solvers from different locations into the same room at the same time? If you have a really spread-out organization then you don't want to do the "hail, hail, the gang's all here" thing all the time, but given the kind of breakthrough productivity that can be realized through in-person communication, a few days here and there can go a long way with respect to achieving objectives.&lt;br /&gt;&lt;br /&gt;When the challenge is tough, get a group of people with the right mix of specialized knowledge, and get them together in the same place. It's a simple notion that can deliver powerful results.&lt;br /&gt;&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/7491172209768390198/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6806654330436722244&amp;postID=7491172209768390198' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/7491172209768390198'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/7491172209768390198'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/2008/09/effective-db2-application-tuning-team.html' title='Effective DB2 Application Tuning: Team-Based and Face-to-Face'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-4824764135277447885</id><published>2008-09-08T19:02:00.000-07:00</published><updated>2008-09-09T19:34:46.306-07:00</updated><title type='text'>DB2 for z/OS Data Warehousing: Query Performance</title><content type='html'>&lt;span style="font-family:arial;"&gt;&lt;a href="http://www.catterallconsulting.com/2008/08/db2-for-zos-performance-management-data.html"&gt;In my last installment&lt;/a&gt;, I riffed a little on DB2 for z/OS data warehousing.  I did a little compare and contrast on the topic of performance management in OLTP versus business intelligence application environments, and concluded with some words about system-level performance optimization for DB2-based BI workloads.  Herewith, a few thoughts on the subject of analyzing and tuning the performance of data warehouse queries.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Don't be intimidated by the sheer size of a BI-oriented SELECT statement.&lt;/span&gt;  Complex queries of the type regularly found within a data warehouse workload can be downright scary-looking when initially examined (I was looking through one last week that featured a 27-table join operation). The biggest query can be broken down into bite-sized chunks. Keep in mind, too, that a "big" query, in terms of the number of tables referenced, is not necessarily a bad (i.e., poorly-written) query.  Particularly in a star-schema database design, in which a large, central fact table is surrounded (logically) by a multitude of dimension tables (some of which might have their own logically-associated sub-tables, making a diagram of the table arrangement look like a snowflake), it is not uncommon to see a lot of tables joined in a query.  Performance can still be fine, especially if the dimension tables are on the small side and only a small percentage of the fact table rows are accessed.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Get real comfortable with EXPLAIN.&lt;/span&gt; EXPLAIN output (the explanation of the access path chosen by DB2 for execution of a query) can be rather voluminous for a really big query, but, as just mentioned, you can get through it if you'll just be methodical and take it step-by-step. At each step along the way, ask yourself: does DB2's decision (regarding, for example, selection of an index, or use of a join method, or the order in which tables are joined) make sense to you? If not, figure out why you and DB2 disagree.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;What do you know that DB2 doesn't?&lt;/span&gt; Suppose a query is running too slowly, and you determine that the cause is the sub-optimal access path chosen by the DB2 optimizer. If the choice of a different available path (perhaps using a certain index) seems obvious to you, and you're wondering why it isn't obvious to the optimizer, consider that you may have knowledge of the data that the optimizer doesn't have. The optimizer knows what's shown by the statistics in the DB2 catalog. How up-to-date and accurate are those statistics? This isn't just a matter of running RUNSTATS more frequently (though that's a good thing to do) -- it's also about &lt;span style="font-style: italic;"&gt;how&lt;/span&gt; you run RUNSTATS. For example, have you set up RUNSTATS to gather correlation statistics for a set of columns in a table via the COLGROUP keyword (with, perhaps, some distribution statistics thrown in through a specification of FREQVAL)? Want some help in figuring out the right way to run RUNSTATS to boost the performance of your data warehouse queries? Check out IBM's new (and &lt;span style="font-style: italic;"&gt;free&lt;/span&gt;) DB2 &lt;a href="http://www-01.ibm.com/software/data/db2/zos/downloads/osc.html"&gt;Optimization Service Center&lt;/a&gt; tool.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;You can generally index a table more liberally in a data warehouse versus an OLTP environment.&lt;/span&gt; When the workload is characterized by a high volume of quick-running transactions, with many (perhaps most) updating data in the DB2 for z/OS database, I like to be pretty conservative in terms of defining indexes on tables -- this because the aim is throughput and every index on a table makes every insert and every delete operation (and updates, if index-key column values are changed) more expensive.  In data warehouse systems, data updates are often accomplished en masse during nightly ETL (extract/transform/load) runs, and while these update operations have to complete within a certain time window, that can often be accomplished even with a pretty good number of indexes, on average, defined on tables in the database (in some cases, people will drop most indexes on data warehouse tables after online query hours, then update the table data, then rebuild the indexes before the start of the new query day). While I get uncomfortable if the number of indexes defined on a table in an OLTP system exceeds 4 or 5, I'm generally OK with more indexes per table -- maybe up to the vicinity of 8-10 -- when the database serves a BI purpose. More indexes are useful for BI applications because there is more uncertainty as to which columns will be referenced in the predicates of queries.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;Look for opportunities to use MQTs.&lt;/span&gt; Materialized query tables, relatively new on the mainframe DB2 scene, have for some time enabled DB2 for Linux/UNIX/Windows users to score big performance gains in data warehouse systems. The idea's pretty simple: a query may end up materializing a result set that is subsequently used in the generation of the final result set that's returned to the user. Such materializations are par for the course when aggregate functions (e.g., AVERAGE and COUNT) are used. It can take a while to build this intermediate result set on the fly, and if it contains a lot of rows (it might not, even if it's based on an &lt;span style="font-style: italic;"&gt;evaluation&lt;/span&gt; of a lot of rows), query run time may be further elongated because the intermediate result set isn't indexed in any way. Enter the MQT -- essentially, a pre-built intermediate result set that is not only already-there, but indexable, as well. The really good part? DB2 can use an MQT in executing a query, even if the query doesn't explicitly reference the MQT -- DB2 is smart enough to figure out whether or not an MQT can be used to reduce a query's run time.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Partition those big tables.&lt;/span&gt; In an OLTP environment, partitioning decisions are often driven by availability considerations (e.g., reduced time to recover a partition versus an entire tablespace). For DB2-based BI applications, partitioning is a performance thing. Among other things, partitioning drives query parallelization. Something important to keep in mind: with DB2 V8 and beyond, a table can be partitioned on one key and clustered on another (meaning that data within a partition can be clustered by something other than the partitioning key). What's a big table? I'd say, most anything with a million or more rows.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Make the right table clustering decisions.&lt;/span&gt; Speaking of clustering, this is a really big deal in a data warehouse environment, where bunches of rows are typically retrieved in the execution of a single query (versus the smaller result sets -- often just one row -- that are more the norm for OLTP applications).  When you're going after a lot of rows, locality of reference (having the desired rows physically close to each other) can make quite a difference in query run-times. Also, when tables A and B are frequently joined, it's generally a VERY good idea to have the tables clustered in the same way.&lt;br /&gt;&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/4824764135277447885/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6806654330436722244&amp;postID=4824764135277447885' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/4824764135277447885'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/4824764135277447885'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/2008/09/db2-for-zos-data-warehousing-query.html' title='DB2 for z/OS Data Warehousing: Query Performance'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-4757330001800526553</id><published>2008-08-27T06:59:00.000-07:00</published><updated>2008-08-27T13:53:09.325-07:00</updated><title type='text'>DB2 for z/OS Performance Management -- Data Warehouse vs. OLTP</title><content type='html'>&lt;span style="font-family:arial;"&gt;Much of the consulting work I've done lately (and that I will be doing over the next several weeks) has been focused on DB2 for z/OS-based data warehouse performance tuning. As this subject is very much on my mind, I've decided to share some related thoughts in my blog.&lt;br /&gt;&lt;br /&gt;Managing performance in a data warehouse environment is a different animal relative to the same task when the DB2 application is transactional in nature (e.g., an order processing or an online banking application), because these system types differ in several important ways:&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;SQL statement run time:&lt;/span&gt; In some OLTP (online transaction processing) environments, more than a thousand database-accessing transactions are executed every second, on average, during peak processing hours. Each of these transactions will typically contain multiple SQL statements, each of which will usually complete within a small fraction of a second of wall-clock time (while consuming an even smaller fraction of a second of CPU time).  In a data warehouse, some report-generating SELECT statements -- particularly of the month-end or quarter-end variety -- will run for several minutes, and maybe for an hour or more.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;Concurrent threads: &lt;/span&gt;In a busy OLTP system, there could be multiple hundreds (perhaps a thousand or more) of concurrently active DB2 threads. A data warehouse is more likely to have fewer concurrently active threads (fewer in-flight units of work that remain in flight for longer periods of time, as noted above).&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;The "come-from" environment: &lt;/span&gt;Although stored procedures are an increasingly popular choice for mainframe DB2-based OLTP applications, it remains true that a whole lot of transaction-oriented SQL gets to DB2 for z/OS through local (to DB2) subsystems such as CICS and IMS/TM.  In a data warehouse environment, it is much more likely that queries will get to DB2 (and associated result sets will be returned to requesters) via DB2 Connect and the DB2 for z/OS Distributed Data Facility (aka DDF).&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;Database design: &lt;/span&gt;&lt;/span&gt;A DB2 database used for OLTP work will likely have a traditional third-normal-form (or close to it) design. Increasingly, DB2 data warehouse databases are dimensional in their design, with sets of related tables arranged logically in a star-schema fashion.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;span style="font-weight: bold;"&gt;Update windows: &lt;/span&gt;&lt;/span&gt;For an OLTP application, there often is no window of time during which database updates are processed. Instead, database updates happen 24/7. For many data warehouses, virtually all data updates are processed during nightly -- and often massive -- ETL runs (ETL is short for extract/transform/load). Query access is often unavailable while the ETL process is running, making timely completion of said process (e.g., by 6:00 AM each weekday morning) a matter of considerable importance.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;Query result sets:&lt;/span&gt; These are generally rather small in an OLTP environment -- usually less than 100 rows, and sometimes only one or two rows. For a data warehouse, result sets that feed reports may contain many thousands of rows -- perhaps hundreds of thousands or more.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;Query complexity:&lt;/span&gt; SELECT statements in online transactions tend to be quite simple -- maybe one or two tables accessed, and little or no dynamic table-building or data value or data type transformation. Some SELECTs associated with a data warehouse might be more than a page long (if you print them out), often involving large-scale joins, on-the-fly table-building (via nested table or common table expressions), recursive SQL (great for navigating hierarchies of data), data value transformation (via CASE expressions), and data type transformation (via CAST specifications and/or scalar functions).&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;Response time versus throughput: &lt;/span&gt;&lt;span&gt;Response time is, of course, not unimportant for OLTP applications, but the relatively simple nature of&lt;/span&gt; the transactions (plus indexes on the right DB2 table columns) often leads to sub-second run times. On top of that, a change in a transaction's elapsed time from, say, 0.85 seconds to 0.95 seconds (or the other way around) is likely to go unnoticed by system users. The focus of performance tuning efforts is often throughput: the number of transactions that can be run through the system in a given period of time. For a data warehouse, the aim of performance tuning tends to be reduced run times for individual queries (from an hour to under 10 minutes, for example, or from 10 minutes to less than one minute). &lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;SQL you have to deal with, versus SQL you wrote (or at least reviewed and approved):&lt;/span&gt; In a data warehouse environment, SQL might be generated by a reporting or OLAP tool (OLAP stands for online analytical processing), and a DB2 DBA might not be able to change it. Result: it has to be dealt with as-is. Furthermore, some SELECTs may be relatively ad-hoc in nature -- either generated in real-time by a query tool or built dynamically by client-side code based on the selection by a user of some combination of presented search criteria (along with ORDER BY and GROUP BY options). Not every column can be indexed, and it's not always easy to anticipate the predicates that will come in with queries. In the OLTP world, on the other hand, application code reviews can ferret out poorly-written SQL statements (which can then be recoded) as well as providing information that can lead to smart indexing decisions.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family:arial;"&gt;Although DB2 for z/OS data warehouse performance monitoring and tuning is largely SQL statement-centered, system performance optimization is still important (a well-tuned SELECT might still run poorly if the overall DB2 system is hobbled by a resource bottleneck).  I always analyze system performance when I'm working with a DB2 for z/OS-based data warehouse.  I'll list here a few of the system-related things that I like to look at, and in a subsequent post (probably sometime next week) I'll write about SQL statement analysis and tuning.  So, some key system-level metrics and checklist items:&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;Memory --&lt;/span&gt; &lt;a href="http://www.catterallconsulting.com/2008/07/db2-for-zos-people-dont-be-memory.html"&gt;In a previous post&lt;/a&gt;, I encouraged people to leverage the 64-bit real and virtual storage addressing capability that is relatively new on the DB2 scene (introduced with DB2 for z/OS V8). This is especially important for data warehouse workloads, which tend to be I/O intensive (lots of large index and table scans).  Bigger bufferpools are almost always better, but be smart about growing them (also covered in the cited post) and know when they're big enough (I might consider putting the brakes on buffer pool enlargement if the rate of demand paging to auxiliary storage for the z/OS LPAR on which DB2 is running gets into the low single digits per second).&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;CPU efficiency --&lt;/span&gt; Also in a previous post, I touted the CPU efficiency gains that could be realized through &lt;a href="http://www.catterallconsulting.com/2008/07/note-on-db2-for-zos-page-fixed-buffer.html"&gt;page-fixing DB2 buffer pools&lt;/a&gt;.  This is a pretty easy change that can yield a decent return (not spectacular, but not bad, either). Also of major importance is the "page-look efficiency" of a DB2 for z/OS-based data warehouse system.  By this I mean the ratio of GETPAGEs (requests by DB2 to look at index and tablespace pages) to the number of rows that are returned to requesters. Bring that ratio down (often through physical database design changes such as adding or altering indexes, re-clustering data rows, creating materialized query tables, etc.), and you've improved the CPU efficiency of your data warehouse workload (the CPU consumption of a DB2 workload is very much dependent on the level of GETPAGE activity in the system).&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-weight: bold;"&gt;Query parallelism --&lt;/span&gt; If your data warehouse system has some CPU headroom (i.e., if it's not consistently running at a utilization rate of 90% or more), consider trading some of the CPU "white space" for improved response time -- especially for queries that scan LOTS of tablespace and/or index pages -- through query CPU parallelism (if you're not already using this DB2 feature). It's been around since Version 4 of the mainframe DB2 product (and so is very much field-tested), and it can work across multiple DB2 subsystems in a parallel sysplex if you're running DB2 in data sharing mode. Concerned about one query splitting a lot and swamping your system?  Don't be.  First of all, you can put a cap on the degree to which DB2 will parallelize a query (you could limit this degree to 5 or 10, for example). Second, I don't know of any operating system that is the equal of z/OS when it comes to dynamically shifting processor resources among tasks in response to workload shifts. If a query splits 20 ways and is taking up a large chunk of CPU capacity in a z/OS LPAR that otherwise would not be very busy, so what? That's what parallelism is about. If some work does come into the system while that 20-way-split query is humming along, z/OS will throttle back resources allocated to the split query and reallocate them to the newly-arrived tasks. And, it does this very quickly. So don't be afraid to split.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family:arial;"&gt;As mentioned, I'll cover SELECT statement analysis and tuning in my next post (or very soon thereafter, if I feel compelled to blog about something else between now and then).&lt;br /&gt;&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/4757330001800526553/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6806654330436722244&amp;postID=4757330001800526553' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/4757330001800526553'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/4757330001800526553'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/2008/08/db2-for-zos-performance-management-data.html' title='DB2 for z/OS Performance Management -- Data Warehouse vs. OLTP'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-4980336899421194661</id><published>2008-08-14T11:32:00.000-07:00</published><updated>2008-08-14T18:37:59.208-07:00</updated><title type='text'>Some Interesting DB2 for z/OS Data Sharing Trends and Issues</title><content type='html'>&lt;span style="font-family: arial;"&gt;Once again, I've let more time than usual pass since the last post to my blog, the cause (again) being a period of particular busyness on my part.&lt;br /&gt;&lt;br /&gt;I've just finished teaching a DB2 data sharing implementation class for a large company in the health-care industry.  The experience had me thinking a lot about the way data sharing worked, and how organizations used it,  back when the functionality was introduced with DB2 Version 4 for OS/390 (predecessor to z/OS).  I'll share some of those thoughts in this entry.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Prime motivation: from capacity to availability.&lt;/span&gt; When DB2 data sharing showed up in the  mid-1990s, IBM had recently decided to shift from very expensive (and very powerful) bipolar chip sets in its mainframe "engines" to less expensive (and -- at first -- much slower) CMOS microprocessors (the same integrated circuit technology used in PCs and other so-called distributed systems servers).  In order to support what were then the world's largest mainframe applications, big companies -- manufacturers, banks, brokerages,  retailers, railroads, and package delivery firms, to name some of the represented industries -- had to harness the capacity of several of the new CMOS mainframes in a single-logical-image fashion. The mainframe cluster was (and is) called a parallel sysplex, and DB2 data sharing leveraged that shared-disk clustering architecture to enable multiple DB2 subsystems on multiple servers to concurrently access a single database in read/write mode.&lt;br /&gt;&lt;br /&gt;Well, the CMOS-based processors in today's mainframes are way faster than the bipolar variety ever were. Not only that, but you can get way more of them in one server than you could  back in the nineties -- and tons more memory, to boot: the current top of the mainframe line, the IBM z10, can be configured with up to 64 engines and 1.5 &lt;span style="font-style: italic;"&gt;terabytes&lt;/span&gt; of central storage. Even for a huge DB2 data-serving workload, it's likely that few organizations would need processing capacity beyond what's available with one these bad boys. Still, companies continue to implement parallel sysplexes and DB2 data sharing. Why? Availability, my friend. With a DB2 data sharing group, planned outages for DB2 maintenance (or z/OS or hardware  maintenance) can be virtually eliminated, as you can apply software or hardware fixes with ZERO database downtime. DB2 data sharing on a parallel sysplex can also greatly reduce the impact of unplanned outages. A company with which I worked had a DB2 subsystem in a data sharing group fail (something that hadn't happened in a LONG time), &lt;span style="font-style: italic;"&gt;and system users didn't even notice the failure&lt;/span&gt;: database access continued via the other DB2 subsystems in the group, while the failing DB2 member was automatically (and quickly) restarted by the operating system.&lt;br /&gt;&lt;br /&gt;The primary motivation for implementing a DB2 data sharing group these days is the quest for ultra-high availability.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Plenty of DB2 subsystems in a group, but fewer hardware "footprints."&lt;/span&gt; A few years ago, it was not unusual to find three or more mainframes clustered in a parallel sysplex. With the processing capacity of individual mainframe "boxes" getting to be so large (through a combination of faster engines and more engines per server),  organizations increasingly opt for two-mainframe sysplexes (also contributing to the decrease in hardware "footprints" within parallel sysplex configurations: the growing use of internal coupling facilities -- running in logical partitions within mainframe servers -- versus standalone external coupling facilities).  Interestingly, as the number of physical boxes in companies' sysplexes have declined in number, the number of DB2 subsystems in the data sharing groups running on these parallel sysplexes has often stayed the same or even gone up. I know of an organization that runs a nine-way DB2 data sharing group on a two-mainframe parallel sysplex. Why so many?  There are several reasons:&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;Having at least two DB2 subsystems per mainframe in the sysplex allows you to fully utilize the processing capacity of each mainframe even when you have a DB2 subsystem down for maintenance (recall that more and more organizations are using DB2 data sharing to enable the application of hardware and software maintenance without the need for a maintenance "window."&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;When a DB2 subsystem in a data sharing group fails, the X-type locks (aka modify locks) held by that subsystem at the time of the failure are retained until that subsystem can be restarted (usually automatically, either in-place or on another server in the parallel sysplex) to&lt;br /&gt; free them up (this is done to protect the integrity of the database). If the data sharing group has more members, the number of retained locks held by a given member in the event of a failure is likely to be smaller, reducing the impact of the failure event. Additionally, having the same workload spread across more members could speed up restart time for a failed member, as there might be somewhat less data change roll-forward and rollback work to do during restart.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;Having more members in a data sharing group reduces the log-write load per member, as each member writes log records only for changes made by programs that execute on the member.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;The cost of going from n-way to n+1-way data sharing (once you've gone to 2-way data sharing) is VERY small, so the overhead cost of having more DB2 subsystems in a data sharing group is typically pretty insignificant.&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family: arial;"&gt;&lt;span style="font-weight: bold;"&gt;Binding programs with RELEASE(DEALLOCATE) is not the data sharing recommendation it once was.&lt;/span&gt; Prior to DB2 for z/OS Version 8, XES (cross-system extended services, the component of the operating system that handles interaction with coupling facility structures such as the global lock structure) would perceive that an IS lock on tablespace XYZ held by a program running on DB2A is incompatible with an IX lock requested for the same tablespace by a program running on DB2 B, when in fact the two locks are compatible. The local lock managers associated with DB2A and DB2B (i.e., the IRLMs) would figure out that the two locks are not in conflict with each other, but only after some inter-system communication that drove up overhead. In order to avoid incidences of such perceived-but-not-real global lock contention (called XES contention), people would bind programs with RELEASE(DEALLOCATE) to have them retain tablespace locks across commits .In a related move, people would seek to use more transactional threads that last through commits, such as CICS protected entry threads (batch threads automatically persist through commits, deallocating only at end-of-job).&lt;br /&gt;&lt;br /&gt;DB2 Version 8 introduced a clever new global locking protocol that eliminates perceived IX-IS and IX-IX inter-system tablespace lock contention. One effect of this development is that the decision on whether to bind programs with RELEASE(DEALLOCATE) or RELEASE(COMMIT) is now pretty much unrelated to data sharing. Do what you'd do (or as you've done) in a non-data sharing DB2 environment.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;The "double-failure" scenario is not so scary anymore.&lt;/span&gt; What I'm referring to here is the situation that can arise if the global lock structure used by a data sharing group is located in an internal coupling facility (ICF) that is part of a mainframe on which at least one DB2 member  of the same data sharing group is running. In that case, if the whole box (the mainframe server) fails, both the lock structure and a  related DB2 subsystem fail. In that case, the lock structure can't be rebuilt because the rebuild process needs information from all of the DB2 members in the group, and -- as mentioned -- at least one of those DB2 subsystems failed when the mainframe failed. Without a lock structure, the whole group will fail, and a group restart will be necessary to get everything back again.&lt;br /&gt;&lt;br /&gt;Nowadays, with (as pointed out earlier) many organizations having fewer (though much more powerful) mainframe servers than they'd had some years ago, and with many companies strongly preferring to use ICFs (they are attractively priced versus standalone external CFs), folks are wanting to implement two-mainframe parallel sysplexes with an ICF in each mainframe. That set-up makes the "double-failure" scenario a possibility. Know what I say to folks leaning towards this configuration? I tell them to go ahead and to not sweat a "double failure," because 1) it's exceedingly unlikely (remember, the whole mainframe server would have to fail, and on top of that, it'd have to be the one with the ICF holding the lock structure), 2) even if it does happen, the group restart will clean everything up so that no committed database changes are lost, 3) group restart is a better-performing process than it was  in the earlier years of data sharing, when processors were slower and the recovery code was less sophisticated than it is in current versions of DB2, and 4)  given items 1-3 in this list, I can understand why organizations would balk at paying extra -- either in the form of an external CF or the overhead of system duplexing of the lock structure -- to avoid a situation that has a very low probability of occurrence and that does not (in my opinion) constitute a "disaster" even if it does occur.&lt;br /&gt;&lt;br /&gt;So, the DB2 data sharing landscape has changed since DB2 Version 4.  Here's something that hasn't changed: DB2 data sharing on a parallel sysplex is the most highly available and scalable  data-serving platform on the market. It's great technology that just keeps getting better.&lt;br /&gt;&lt;/span&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/4980336899421194661/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='https://www.blogger.com/comment.g?blogID=6806654330436722244&amp;postID=4980336899421194661' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/4980336899421194661'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6806654330436722244/posts/default/4980336899421194661'/><link rel='alternate' type='text/html' href='http://www.catterallconsulting.com/2008/08/some-interesting-db2-for-zos-data.html' title='Some Interesting DB2 for z/OS Data Sharing Trends and Issues'/><author><name>Robert Catterall</name><uri>http://www.blogger.com/profile/12629696535422235653</uri><email>noreply@blogger.com</email></author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6806654330436722244.post-3814068611647667573</id><published>2008-07-28T09:27:00.000-07:00</published><updated>2008-07-28T11:34:16.131-07:00</updated><title type='text'>Data Warehousing on DB2 for z/OS</title><content type='html'>&lt;span style="font-family: arial;"&gt;Not so long ago, a lot of people had the idea that if you wanted to do advanced, large-scale (multi-terabyte) data warehousing with DB2, your best bet, platform-wise, was DB2 for Linux, UNIX, and Windows (LUW). This widespread impression was based on several factors, including:&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;IBM's own product marketing and packaging efforts, which emphasized DB2 for LUW as the data warehouse platform of choice.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;The absence on the z/OS platform of some product features that were very useful from a business intelligence perspective (including materialized query tables, better star-join query optimization, and 64-bit addressability).&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;A richer SQL on the DB2 for LUW platform, including support for common table expressions (which enable powerful data manipulation via recursive SQL) and convenient result set comparisons via EXCEPT and INTERSECT operations.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;A perception that the mainframe platform was not a cost-effective data warehouse solution.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family: arial;"&gt;It's time now for people who have looked upon DB2 for z/OS as an also-ran BI platform to rethink their opinions, because the situation with respect to all of the factors cited above has changed:&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;IBM is promoting DB2 for z/OS as well as DB2 for LUW as a prime-time data warehousing platform.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;DB2 for z/OS has materialized query tables (which can dramatically improve response times for queries that involve data aggregations and/or large-scale joins), advanced star-join query optimization (important for so-called dimensional data warehouses),  and 64-bit addressing support (great for reducing I/O wait times via extra-large buffer pools).&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: arial;"&gt;SQL on the DB2 for z/OS platform has been enriched with BI-friendly features such as common table expressions (enabling you to do with one SQL statement what might have taken several SQL statements and some user programming before), and (in V9) the INTERSECT and EXCEPT clauses of SELECT (both of which make the coding of result set comparisons much easier). Also available on System z now are advanced XML data management and query capabilities that were introduced with DB2 9 for 