The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DBD::MVS_FTPSQL - DBI driver to query IBM DB2 mainframe databases through an IBM FTP server.

VERSION

This documentation refers to DBD::MVS_FTPSQL version 0.38.14.

SYNOPSIS

  require DBI;
  
  #DSN String that identifies the DB2 subsystem
  my $DSN = "hostname=...;ssid=..." ;

  #A mainframe account able to perform SELECTs  
  my ($username, $password) = ('...','...');
  
  my $dbh = DBI->connect("dbi:MVS_FTPSQL:$DSN", $username, $password) 
    or die "Cannot connect: " . $DBI::errstr;

It connects to the DB2 subsystem $ssid of the mainframe whose hostname is $hostname. Refer to the DSN string section of this document for details on matching the configuration of the mainframe you want to connect to (in particular the attributes hostname, port and ssid). It could be a good idea to take also a look at the EXAMPLES section below.

DESCRIPTION

This pure Perl module lets you submit SQL queries (that's it, only SELECT statements) to a DB2 subsystem installed on a mainframe, provided that:

  • The IBM FTP Communications Server (CS) on the mainframe side was installed with the (optional) SQL query function enabled ( see the section Installing the SQL query function on the Communications Server for additional information on this subject).

  • You supply, as mentioned in the SYNOPSIS section above, enough information, inside the DSN string of the DBI::connect() statement, to identify and locate the ssid (the four character DB2 subsystem identifier) you want to connect to and the hostname and port of the IBM FTP CS that will taxi your queries.

  • You can submit queries via QMF (Query Management Facility) or, in other words, your account has SELECT privileges on the subsystem that contains the tables you want to query.

  • Your account on the mainframe has write permissions to either a "swap" directory (that you can specify via the DSN facultative attribute remote_directory) or your home directory (if you omit it). This directory will be utilized as a temporary storage area for the files containing the sql statements you submit. No harm will be done to other files located there. For more details read the DSN string section of this document.

This document focuses primarily on specific issues regarding this particular DBI driver and it assumes that you are familiar with the DBI architecture. If not the case, please read the DBI documentation first to acquire a general knowledge of its classes and methods.

DSN string

The following instruction:

  my $dbh = DBI->connect("dbi:MVS_FTPSQL:$DSN", $username, $password)
            or die $DBI::errstr;  

establishes a connection to the DB2 subsystem identified by the DSN string $DSN, using as login credentials the $username and $password supplied. Notice that in the OS/390 or z/OS environment you don't connect to a database, instead you have to connect to the DB2 subsystem which gives access to all the databases it contains (a mainframe database has very little to do with its pc counterpart as it is basically a logical grouping of tables, other objects and so, in many ways, it's more similar to a pc schema).

The DSN string consists of a list of argument=value pairs separated by semicolons, like the example below:

  # Identifies the DB2 subsystem DDB2 accessible through an IBM FTP CS 
  # running at foo.com:9999
  my $DSN = 'hostname=foo.com;port=9999;ssid=DDB2'; 

The following is a list of allowed arguments and their meaning, arranged in order of relevance.

hostname

The mainframe hostname or ip address. This argument is mandatory.

port

Denotes the port on which the IBM FTP CS installed on the mainframe is listening and defaults to 21. Do not confuse this (ftp) port with the DRDA port; this is not a DRDA driver.

ssid

A 4 character string representing a DB2 subsystem identifier. You can omit this attribute and rely on the default specified in the IBM FTP CS configuration dataset FTP.DATA (the IBM book Communications Server: IP Configuration Reference covers this topic in detail ).

If, however, while connecting to the DB2 subsystem <ssid>, you receive the error message:

  DBD::MVS_FTPSQL::st execute failed: attempt to connect to <ssid> failed

this means that the default value doesn't match an existing DB2 subsystem and you need to explicitly specify it. Notice that this is the same error message that you receive if the ssid you specify doesn't exist. If you don't know the ssid of the DB2 subsystem(s) installed on the mainframe you are trying to connect to ask your system administrator or read the "How to find out the DB2 subsystem IDs" section of this document.

remote_prefix

The first 4 characters of the temporary dataset name that will be used to store the query on the mainframe. The default prefix is 'FSQL', so the dataset will be named with the first unassigned string of the sequence FSQL0001, FSQL0002, FSQL0003 and so on (this prevents overwriting existing files). Please note that you don't need to interact with those files directly as they only serve to upload the query to the mainframe before the execution and they will be deleted suddenly after. The files will be allocated for the minimum retention possible period (0 days) so if something goes wrong they will be deleted anyway the next day.

remote_directory

The "directory" (more precisely the first n-1 name segments of a dataset name, composed of n name segments, where the n-th name segment can be inappropriately called the "filename") on the mainframe where the temporary datasets are stored during the query execution phase. The directory must be writable for the user. The default is the user home directory. If specified, this attribute must obey MVS data set naming convention:

  • it may consist of one or more name segments separated by a period

  • every name segment can be 1 to 8 alphanumeric (A-Z plus 0-9) or national (-,#, @, $) characters long but

  • a restriction to the previous rule is that the first character of every name segment can't be a number or a sign (-)

If the dataset name is preceded by two slashes, then the path will be interpreted as absolute, otherwise the path will be evaluated as relative to the user home directory. Valid values for this attribute are:

  //FOO.BAR 
  //FIRST.SECOND.THIRD  
  DB2TEMP 
  ONE.TWO.THREE                

where the latter two are relative paths. If the user's home directory being connected with the mainframe is JSMITH and the preceding attribute remote_prefix is set to FDB2, then the temporary files will be created in the form:

  //FOO.BAR.FDB2XXXX 
  //FIRST.SECOND.THIRD.FDB2XXXX 
  //JSMITH.DB2TEMP.FDB2XXXX 
  //JSMITH.ONE.TWO.THREE.FDB2XXXX  

where XXXX is the smallest zero-padded 4-digit integer that will produce an unique filename. See how the total length of one of those dataset names must not exceed 44 characters.

timeout

The "client" timeout. Specifies the maximum time in seconds the client will wait for a response from the server before signaling failure to the caller. The default is 120 seconds and should be appropriate for most situations.

Don't confuse the "client" timeout DSN attribute with the "server" timeout that is instead the timeout after which an idle connection to the database (the IBM FTP server, in our case) is closed. To hold a connection open and get rid of the server timeout you should periodically issue ping commands (internally redefined as noop).

How to find out the DB2 subsystem IDs

I've asked this question to the comp.databases.ibm-db2 group and Jeroen van den Broek kindly answered me:

    Every DB2 subsystem has at least 3 address spaces associated with it, the names of which all start with the subsystem-id (SSID):

      <SSID>MSTR = system services address space
      <SSID>DBM1 = database services address space
      <SSID>DIST = DDF (distributed data facility) address space

    (next to these, you might have others, like the Stored Procedures address space and the IRLM (Integrated Resource Lock Manager) address space, but naming for these is not fixed) You should be able to identify your SSID's via SDSF's "Status Display" (option ST on SDSF's Primary Option Menu). Use the following subcommands to show the various types of fixed address spaces:

      SELECT *MSTR
      SELECT *DBM1
      SELECT *DIST

    from which you can deduct your SSID's.

Note that SDSF (System Display and Search Facility) is an IBM product which interfaces with the MVS spool that, among other things, allows the user to list all the jobs on the spool, not only the ones whose name starts with his user-id.

Locking and concurrency considerations

This section discusses issues related to how the intrinsic constraints of the "SQL through FTP" feature influence the concurrent access of data. An explanation of the concepts behind locking, concurrency and the way DB2 implements it is far beyond the scope of this document and although a brief introduction of isolation levels is provided, that knowledge is taken for granted. Please consult your DB2 documentation for a more thorough overview. It is also worth checking out a couple of interesting articles of Roger E. Sanders published on Db2 Magazine.

Ok, after having said that...

IBM DB2 for OS/390 or later supports four levels of isolation. These, ordered from the more to the less restrictive, are:

Repeatable Read (RR)

Share locks are acquired on all the rows referenced (not only those ones that will be returned) and they are released only when the transaction is committed or rolled back. Other concurrent transactions can't acquire exclusive locks on those rows (and hence will have to wait before modify the data) until the transaction owning the locks terminates. This prevents any interference between transactions themselves (the same query issued multiple times within the same transaction will ever return the same data) but also decreases concurrency, causing a slow down in performance.

Read Stability (RS)

Share locks are acquired only for those rows that are part of a result set. This prevents dirty reads (the reading of uncommitted data) and nonrepeatable reads while phantoms phenomena (described below) can occur. If a query is issued more than once in the same transaction, it may get additional (precisely phantom) rows, as another concurrent transaction can insert rows that match the search criteria of the query.

Cursor Stability (CS)

This is the default isolation level. It locks only the row (the page) that is currently being returned. As the cursor leaves the row, the lock is released and acquired for the next one, until all the data is returned. While this maximizes concurrency and prevents dirty reads it does not ensure that the data retrieved will not be changed by other transactions, so if the transaction reads the same row of data more than once odds are it gets different results each time (nonrepetable read phenomena).

Uncommitted Read (UR)

With this isolation levels the transaction (almost) doesn't acquire locks and doesn't check if the data that is retrieving is locked. This, at the price of risking reading non committed data, leads to two main advantages:

  • Better performance if compared with other isolation levels.

  • Ensures that a deadlock condition can not occur.

Notice that with this driver you can override the default isolation level only at query level. You can do so by ending the statement with a "with" clause whose syntax is:

  (fullselect)  WITH [RR|RS|CS|UR]

as illustrated by the following example:

  SELECT * FROM SYSIBM.SYSDUMMY1 WITH UR

When using IBM FTP CS as a medium to submit queries, there are two main limitations that affect your control over the way the data is locked and isolated between concurrent processes. These limitations are:

  • The inability to control transaction boundaries. That is, you can only issue select statements; every other statement - and this includes also commit and rollback - are not permitted. To put it briefly, you can't disable autocommit.

  • When a statement is execute()d, all the data requested it's transferred from the mainframe to the pc altogether. Subsequently, when you fetch() the rows, you will interact with a local temporary copy of the data.

The first condition implies it's not possible in any way to protect your application against nonrepetable read and phantom phenomena between two different executions of the same query. The second one makes de-facto the choice of RR or RS as isolation levels useless, because while you are (locally) fetching the data, the transaction is already terminated ( the real fetching of data happens contextually to the statement execution).

Using RR as isolation level, can make a difference when your query (maybe with the auxilium of a subquery) accesses the same table more than once, like in the following example:

  select max(salary) as sal_ko from staff where 
    salary = ( select max(salary) as sal_ok from staff)

If this query it's not executed with an isolation level RR, it may return a null value instead of the maximum salary. Let's clarify why this can take place. During execution, the table staff is processed two times, first time to determine the maximum salary sal_ok and later, to check which salary corresponds to sal_ok. If, between the two phases, a transaction that modifies the maximum salary is committed (like an update that increases the salary of that staff member) then sal_ko will not match any value.

Since RR or RS don't work as they should, this leave us with two options. Specifying CS (or omitting it as it's the default) and retrieve only data committed after the execution of the statement, or choosing UR and retrieve also non committed data. Remember that, in any case, you won't see any changes, committed or not, happening while you're fetching data because, as stated before, you're working on a local copy of the resultset, that was internally fetched during the execution of the query.

Although these limitations may seem harsh, it is important to realize that in the majority of the cases, CS or also UR are the best choice, because maximize concurrency and hence performance. This is particularly true when retrieving data from a mainframe, because there are a lot of other processes that accesses data, potentially more critical than your application (CICS applications for example).

EXAMPLES

Example 1: retrieving a single row of data

  use warnings;
  use strict;
  use DBI;

  # It connects to the IBM FTP CS server running at IP 123.456.789.123, port 4021
  #
  # All the queries will be routed to the DB2 subsystem DB2P
  #
  # jsmith/123456 must be a valid mainframe account able to query the tables of DB2P
  #
  my ($hostname,$port,$ssid) = ('123.456.789.123',4021,'DB2P');
  my ($username,$password)   = ('jsmith','123456');
  my $dbh = DBI->connect("dbi:MVS_FTPSQL:hostname=$hostname;port=$port;ssid=$ssid", $username, $password,
                        { RaiseError => 1 }) ||  confess  $DBI::errstr;
  #Notice that RaiseError is set to 1 so we don't need to test for the return code of each method call

  #Prepares the query
  my $sth = $dbh->prepare(<<EOSQL);
  SELECT 
     max (SALARY) as MOST_PAYED
    ,min (SALARY) as LESS_PAYED
  FROM Q.STAFF
  WHERE
    JOB = 'CLERK'
  OPTIMIZE FOR 1 ROWS
  WITH UR
  EOSQL

  #Executes it
  $sth->execute();

  #Retrieves the data  
  my ($clerk_max, $clerk_min) = $sth->fetchrow_array();

  $sth->finish();
  $dbh->disconnect();

Example 2: read (and structure) all the data at once

  use warnings;
  use strict;
  use DBI;
  use Data::Dumper qw(Dumper);

  # It connects to the IBM FTP CS server running at bigiron.localdomain
  # with the default port 21
  #
  # All the queries will be submitted to the default DB2 subsystem
  #
  # jsmith/123456 must be a valid mainframe account able to query the default DB2 subsystem
  my ($hostname,$username,$password) = ('bigiron.localdomain','jsmith','123456');
  my $dbh = DBI->connect("dbi:MVS_FTPSQL:hostname=$hostname", $username, $password,
                        { RaiseError => 1 }) or die $dbh->errstr;

  #Prepare, execute and retrieve all the databases in the selected 
  #subsystem, all in a single call and returns a reference to an hash of hash,
  #where the index key of the first hash is the database name.
  #To return the data in the form of an array of array or an array of hashes see
  #the method selectall_arrayref
  my $db_list = $dbh->selectall_hashref( "SELECT * FROM SYSIBM.SYSDATABASE",'NAME');

  print Dumper($db_list);

  $dbh->disconnect();

Example 3: looping through the data

  use warnings;
  use strict;
  use DBI;

  # Same ssid location as in example 2, but this time the temporary datasets
  # will be in the form: //TEMPDS.FTPSQL.QRYXXXXXXXX
  my ($hostname,$username,$password) = ('bigiron.localdomain','jsmith','123456');
  my ($remote_directory,$remote_prefix) = ('//TEMPDS.FTPSQL','QRY');

  my $dbh = DBI->connect(
    "dbi:MVS_FTPSQL:hostname=$hostname;remote_directory=$remote_directory;remote_prefix=$remote_prefix"
   ,$username, $password,
   { RaiseError => 1}) ||  confess  $DBI::errstr;

  my $sth = $dbh->prepare(<<EOSQL);
    SELECT 
       PARTNAME AS PART 
      ,PRODUCT AS PROD 
      ,PRODPRICE AS PRICE 
    FROM 
       Q.PARTS AS PT
      ,Q.PRODUCTS AS PR
    WHERE
          PT.PRODNO = PR.PRODNUM
      AND PR.PRODPRICE <= 
          (SELECT AVG(PRODPRICE) * 2 FROM Q.PRODUCTS)
    order by price, product
    fetch first 4 rows only
    WITH CS
  EOSQL

  $sth->execute();

  #Prints The column headers
  print join("\t",@{$sth->{'NAME'}})."\n";

  #Prints the data
  while (my @row  = $sth->fetchrow_array()) {
    print join("\t",@row)."\n";
  }

  $sth->finish();
  $dbh->disconnect();

There is not much more to say about using this driver to establish a connection. In the following examples, we will assume that a database handle $dbh to an active connection with a mainframe is available and we will focus on other aspects.

Example 4: binding parameters (input)

  my $sth = $dbh->prepare(<<EOSQL);
    SELECT 
       PARTNAME AS PART 
      ,PRODUCT AS PROD 
      ,PRODPRICE AS PRICE 
    FROM 
       Q.PARTS AS PT
      ,Q.PRODUCTS AS PR
    WHERE
          PT.PRODNO = PR.PRODNUM
      AND PARTNAME  = ?
    WITH CS
  EOSQL

  foreach my $partname (qw (WIRE BEARINGS COPPER)) {                                                          
    $sth->bind_param(1, $partname, SQL_VARCHAR);                                                                 
    $sth->execute();
    print join("\t",$sth->fetchrow_array())."\n";                                                                   
  } 

Example 5: binding columns (output)

  my $sth = $dbh->prepare(
    'SELECT PRODNAME, PRODPRICE FROM Q.PRODUCTS order by 1,2'
  );
  $sth->execute();
  
  my ($name,$price,$total)=('',0,0);
  $sth->bind_columns(\$name,\$price);
  
  my $delimiter = '-'x21 ."\n";
  
  print $delimiter;
  printf ("%-10s %10s\n" ,"Name","Price");
  print $delimiter;
  
  while ($sth->fetch()) {
    $total +=$price;
    printf ("%-12s %8.2f\n", $name,$price);
  }
  
  print $delimiter;
  printf ("%21.2f\n" ,$total);

The expected output is:

  ---------------------
  Name            Price
  ---------------------
  GENERATOR       45.75
  SCREWDRIVER      3.70
  SHAFT            8.65
  SWITCH           2.60
  RELAY            7.55
  SOCKET           1.40
  MOTOR           35.80
  CAM              1.15
  GEAR             9.65
  BUSHING          5.90
  SAW             18.90
  HAMMER           9.35
  CHISEL           7.75
  WRENCHSET       25.90
  ---------------------
                 184.05

Example 6: table_info()

  # Returns all the tables of the sample schema.
  # See the DBI manual for details on this method.
  #
  # Notice that DB2 does not have the concept of a catalog so
  # $catalog should ever be set to undef
  my ($catalog, $schema, $table, $type) = (undef ,'Q','','');
  my $sth = $dbh->table_info( $catalog, $schema, $table, $type );

  #Fetch all the rows in the form of an array of hashes where
  #the keys of the hashes are the column names
  my $refAOH = $sth->fetchall_arrayref({});
  $sth->finish();
  print Dumper($refAOH);

Example 7: column_info()

  # Returns all the columns of tables of the sample schema that
  # starts with 'N'
  my ($catalog, $schema, $table, $column ) = (undef ,'Q','','N%');
  my $sth = $dbh->column_info($catalog, $schema, $table, $column);
  my $refAOA = $sth->fetchall_arrayref([2,3,5,8]);
  $sth->finish();
  
  #Returns a list of fully qualified columns and their type and length in bytes
  printf ("%20s %20s %10s %7s\n".'-'x50,'TABLE','COLUMN','TYPE','LENGTH');
  print '-'x60 . "\n";
  map { printf ("%20s %20s %10s %7d\n",@{$_}) => $_ } @{$refAOA};

the output should be something like:

                 TABLE               COLUMN       TYPE  LENGTH
  ------------------------------------------------------------
             INTERVIEW            STARTTIME       TIME       3
           OBJECT_DATA                  SEQ   SMALLINT       2
                 PARTS               SUPPNO       CHAR       5
              PROFILES                SPACE       CHAR      50
              PROFILES             SYNONYMS    VARCHAR      31
               PROJECT               STARTD       DATE       4
                 SALES           SALESREPNO   SMALLINT       2
                 STAFF               SALARY    DECIMAL       5
              SUPPLIER                STATE       CHAR       2
              SUPPLIER               STREET    VARCHAR      15
  

INSTALLATION & PREREQUISITES

This driver relies on the following other Perl modules:

  Carp
  DBI
  IO::File
  Net::FTP

The automatic installation procedure via the CPAN module is the most recommended:

  perl -MCPAN -e "install Bundle::DBD::MVS_FTPSQL"

If you have never invoked CPAN, it will run through a series of configuration questions such as which CPAN mirror site to use. It's important that the network setup questions are answered correctly because network configuration errors may prevent access to CPAN repository and thus the download of the modules.

To install this module manually, run the following commands:

    perl Makefile.PL
    make
    make test
    make install

In order for this module to be of any use you need to have access to a mainframe running an ftp server configured for executing SQL query via FTP (see the section below to find out how this feature can be installed).

Installing the SQL query function on the Communications Server

To install the optional SQL query function and access the DB2 subsystems from FTP you need to bind the DBRM (Database Request Module) called EZAFTPMQ, located in the SEZADBRM library, to the plan used by FTP and grant execution privileges for that plan to PUBLIC. A sample JCL, that needs to be customized to perform the bind, is EZAFTPAB and can be found in the library SEZAINST. It is also advisable that your system administrator creates, if not already present, the FTP.DATA configuration data set and:

  • adds a DB2 statement that provides the ssid to be used by default.

  • specifies the DB2PLAN of the FTP server.

The IBM books entitled "<Your OS Version> Communications Server: IP Configuration Guide" describes such operations in detail.

CAVEATS

Rounding error when fetching numbers with more than 16 digits.

Due to a bug on the "sql through ftp" feature, when a numeric value is returned, only its first 16 digits are significant. This is shown by the following query:

  select 19999999999999999 from sysibm.sysdummy1

that returns the incorrect value 20000000000000000. A workaround for this bug is to cast to char every field returned by the query that is declared DECIMAL(17,..) or more, The query in the example above will become as follows:

  select char(19999999999999999) from sysibm.sysdummy1

Note that this affects only returned data, so any column of a subquery or any literal constant passed with the query are immune to this bug. The following queries do not require any fix.

  select empno from jobhistory where id=12345678901234567890

  SELECT  NUM - 990000000000000000 FROM (                                   
    SELECT 999999999999999999 AS NUM FROM SYSIBM.SYSDUMMY1        
  ) AS X  

"Fetch failed: Horizontal tab found. ..." error message

The "sql through ftp" feature returns data in the form of lines of text in which every field is delimited by tab characters (\x09 in ASCII that corresponds to \x05 in EBCDIC). If one of the field returned contains a tab character this driver fails to distinguish between values and delimiters and will return the error given in the title of this paragraph. This bug can be reproduced by the following query:

  SELECT 'X'|| CHAR(X'05') ||'X' AS TXT FROM SYSIBM.SYSDUMMY1

A workaround for this bug is , in case you receive the mentioned error, that you replace the problematic character with another string, like for example a blank character as in the query below:

 select TRANSLATE (TXT, ' ', x'05') from (
   SELECT 'X'|| CHAR(X'05') ||'X' AS TXT FROM SYSIBM.SYSDUMMY1
 ) as x

As for the preceding bug, this affects only data that is returned, so any column of a subquery or any literal constant passed with the query is immune.

Nullable columns are erroneously fetched as a sequence of '-'

  select NULLIF(1,1) FROM SYSIBM.SYSDUMMY1

This bug is a direct consequence of the way the "sql through ftp" feature returns nulls. A workaround for this consists in applying the function COALESCE() (whose alias is VALUE()) to all the returned columns that can contain null values,as in the following example.

  select coalesce (A,0) from (
    select NULLIF(1,1) AS A FROM SYSIBM.SYSDUMMY1
  ) as X

Notice that also this bug as the preceding concerns only fetched columns.

TO DO

-) Implement other database handle methods like primary_key, foreign_key_info, etc

-) Write more tests

AUTHOR

Clemente Biondo, <clemente.biondo@gmail.com>

ACKNOWLEDGEMENTS

I wish to thank Sonia Ingrassia for the very careful review of this document. I'd also like to thank to Jeroen van den Broek for his answer to the question: "How to find out the DB2 subsystem IDs".

And I wish to thank you for reading this far. I hope this work will be useful to you to some degree. Any comment will be really appreciated!.

SEE ALSO

DBI

Books:

  • IBM Communications Server IP: Configuration Guide

  • IBM Communications Server: IP Configuration Reference.

COPYRIGHT & LICENSE

Copyright 2007 Clemente Biondo <clemente.biondo@gmail.com>, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.