Friday, 1 July 2011

Perl tutorial-3

3. Useful Examples


3.1 Processing colon delimited Excel files

This example takes in a text file as parameter and creates a set of SQL statements which will create Oracle tables. 
#! /usr/local/bin/perl

# this is the normal perl formatting mechanism
# we don't've any standard headers, so we make
# STDOUT_TOP as blank.
format STDOUT_TOP =
. 
# now we format and print the column types nicely
# the @<<<... is the picture and for each picture line, we
# must have the next line made of the variables.
# during printing, picture is substituted by the actual
# values in the corresponding variables
# @< will left align, @> for right align, @| for center-align
format STDOUT =
   @<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<
$column_name, $data_type, $nullable
.

# you can have multiple table definitions in one input file
# so keep a count
$tableno=$colno=0;

# similarly, we will store the comments in hashes
%colcmnts=%tblcmnts=();
LINE: while(<>){
        chomp; # remove the new line character
        next LINE if (/^$/); # ignore null lines
        next LINE if (/^\s*#/); # ignore comment lines
        ($column_name, $data_type, $precision, $nullable, $comment) = split /:/;
        $column_name =~ tr/[A-Z]/[a-z]/ ; #lowercase column name
        if ($data_type eq ""){ #no data-type ? then this is a table
                $table_name = $column_name;
                $colno=0;
                print ");\n\n" unless ($tableno==0);
                print "CREATE TABLE $table_name AS (\n";
                $tableno++;
                #store table comment if there is a comment
                $tblcmnts{$table_name} = $comment unless ($comment eq "");
                next LINE;
        }
        #column comment
        $colcmnts{"$table_name.$column_name"} = $comment unless ($comment eq "");
        $colno++;
        #if precision is specified, we need to put it inside parantheses
        $data_type .="($precision)" unless ($precision eq  "");
        if ($colno==1) {
                $column_name = " $column_name";
        } else {
                $column_name = ",$column_name";
        }
        $data_type =~ tr/[a-z]/[A-Z]/ ; # uppercase datatype
        $nullable =~ tr/[a-z]/[A-Z]/ ;  # upper case "not null"
        # print to the format defined before
        write ;
}
print ");\n\n" ;



# print table comments
foreach my $key (sort keys(%tblcmnts)) {
        $comment = "'$tblcmnts{$key}'";
        print "COMMENT ON TABLE $key IS $comment;\n";
}
print "\n\n" ;

# print column comments
foreach my $key (sort keys(%colcmnts)) {
        $comment = "'$colcmnts{$key}'";
        print "COMMENT ON COLUMN $key IS $comment;\n";
}
print "\n\n" ;

__END__
# from here is the sample input file
# columns are
# column(table):data type:precision:nullable:comment
# for table name, data type is null
BARS_BATCH::::Batch Header Table
batch_number:number:5:not null:The batch number(sequential)
deposit_date:date::not null:Deposit date of the batch
payments:number:3:not null:Number of payments in the batch
payment_amount:number:11,2:not null:Dollar amount of the batch
pieces:number:3:not null:Number of cheques in the batch
payment_method:varchar2:2:not null:Method of payment(Cheque,Cash...)
clerk:varchar2:10:not null:Who entered the batch
origin:varchar2:2:not null:Where the batch originated (Field, HO...)
dirty:varchar2:1::Is the batch marked as dirty(1,0)
actual_payments:number:3::Number of payments actually entered
actual_amount:number:11,2::Amount actually entered
creadt:date::not null:Date batch was created
modidt:date:::Date batch was last modified
sts:varchar2:1:not null:Status of the batch
errcode:varchar2:16::Errors in the batch

BARS_GIFTS::::Batch Detail Table
batch_number:number:5:not null:The batch number(sequential)
doc_number:number:2:not null:The gift doc number(sequential within batch)
page:number:2:not null:Page number
account_id:number:8:not null:Active/new account id for the member
source:varchar2:14:not null:Active source
fund:varchar2:16::Active fund
gift_type:varchar2:2::Active gift type
credit_account:varchar2:4:not null:Active credit account (check the size!)
handle_flag:varchar2:1::Special handling flag
payment_amount:number:11,2:not null:Gift amount
total_payment_amount:number:11,2:not null:Cheque amount
creadt:date::not null:Date gift was created
modidt:date:::Date gift was last modified
errcode:varchar2:16::Errors in the gift

BARS_ACCOUNTS::::New members
account_id:number:8:not null:New account id for the member
title:varchar2:8::Title for the new member
first_name:varchar2:20::First name
middle_name:varchar2:20::Middle name
last_name:varchar2:40:not null:Last name(In TA, can be null)
suffix:varchar2:8::Suffix
phone_number:varchar2:15::Phone number
street_number:varchar2:8::Street number
street_name:varchar2:30::Street name
apt_no:varchar2:8::Apartment number
zipcode:varchar2:5::Zipcode
zipcode_ext:varchar2:5::Zipcode extension
city:varchar2:30::City
state:varchar2:2::State
freeline:varchar2:50::Free comments
extraline:varchar2:50::Free comments 2

BARS_CODES::::Default codes for LOVs
code_type:varchar2:2:not null:Code type
code:varchar2:20:not null:Code
codelb:varchar2:40:not null:Description

3.2 Processing fixed format text files

Following is an extract from one of our SQL*Loader control files. Note that the ocr_gift_fate and ocr_deposit_date columns are same and that there is a filler between positions 21 and 27. 
INTO TABLE ACQUIRED_DATA
   WHEN record_type =  'T'
(
record_type  POSITION(001:001) CHAR
 "DECODE (:record_type, 'T', 'BT', 'X', 'FT', 'D', 'D', 'O')",
ocr_batch_number POSITION(002:010) CHAR,
ocr_gift_date POSITION(011:021) CHAR,
ocr_deposit_date POSITION(011:021) CHAR,
target_payment_num POSITION(027:029) INTEGER EXTERNAL,
target_payment_amt POSITION(030:040) DECIMAL EXTERNAL
)
Here is our Perl code to read all the T records, split the record into corresponding variables and then print the batch number, payment number and the amount.
#! /usr/local/bin/perl -w

# read standard input
LINE : while(<>) {
    #ignore records other than  batch headers
    next LINE unless /^T/;

    # remove the new line character
    chomp;

    # split the record!
    ($rec_type, $ocr_batch_number, $ocr_gift_date,
     $filler, $target_payment_num,
     $target_payment_amount) = unpack("A1 A9 A11 A5 A3 A11",$_);

     #convert the number fields from scalar string to scalar number!
     $target_payment_num += 0;
     $target_payment_amount += 0;

     #voila! print it
     print "$ocr_batch_number, $target_payment_num, $target_payment_amount \n";
}

3.3 Report Generation and Formatting

This is same as the previous code except that we now format the output nicely (well, it is debatable whether this is nice!). We have also added a subroutine commify that will format numbers by adding commas. 
#! /usr/local/bin/perl

# use the POSIX module to access only the function
# strftime - to format a date nicely
use POSIX(strftime); 
# and use it right-away to format current time
# as MM/DD/YY HH:MI
$today=strftime("%m/%d/%y %H:%M",localtime());

# this subroutine adds commas to a number
#
sub commify {
        my $input = shift;
        $input = reverse $input;
        $input =~ s<(\d\d\d)(?=\d)(?!\d*\.)><$1,>g;
        return reverse $input;
}

#define the page header
## $% is the page number
format STDOUT_TOP=
                      THE NATURE CONSERVANCY
@<<<<<<<<<<<<<       Upload File Batch Report          Page : @>>>
$today,$%
------------------------------------------------------------------
Batch Number        Gift Date     Payments                  Amount
------------------------------------------------------------------
.

# define the page
format STDOUT=
@<<<<<<<<<<<<<<<<   @<<<<<<<<<<    @>>>>>>  @>>>>>>>>>>>>>>>>>>>>>
$ocr_batch_number,$ocr_gift_date,$target_payment_num,$target_payment_amount
.

# $= is the lines per page . Normal printers have this as 59
$= = 59;

# initialize the variables that hold report totals
$sum_num = $sum_amount = 0;

# read standard input
LINE : while(<>) {
        #ignore records other than  batch headers
        next LINE unless /^T/;

        # remove the new line character
        chomp;

        # split the record!
        ($rec_type, $ocr_batch_number, $ocr_gift_date,
         $filler, $target_payment_num,
         $target_payment_amount) = unpack("A1 A9 A11 A5 A3 A11",$_);

         #convert the number fields from scalar string to scalar number!
         $target_payment_num += 0;
         $target_payment_amount += 0;

         # add to the totals
         $sum_num += $target_payment_num;
         $sum_amount += $target_payment_amount;

         #add commas to the number
         $target_payment_num = &commify($target_payment_num);
         # dollar amount should have 2 decimal places 
         $target_payment_amount = "\$".&commify(sprintf("%.2f",$target_payment_amount));

         #voila! print it
         #print "$ocr_batch_number, $target_payment_num, $target_payment_amount \n";
         write;
}

##
# print a line before printing totals
#
$ocr_batch_number = "---------";
$ocr_gift_date = "-----------";
$target_payment_num = "------";
$target_payment_amount = "----------------------";
write;

##
# print totals
#
$ocr_batch_number = "TOTAL";
$ocr_gift_date = "";
$target_payment_num = &commify($sum_num);
$target_payment_amount = "\$".&commify(sprintf("%.2f",$sum_amount));
write;

3.4 DBM Databases

DBM is a standard UNIX database, which store data as key-value pairs. In this example, we will read in the batch records, check against a DBM database whether the batch exists in that. If it exists, we will print an error and if not we will insert the batch and values. This code helps in checking for duplicate uploading of batches. 
#! /usr/local/bin/perl

# set the name of your DBM file
$DBM_FILE = "batches.db";

# this will create two files, one for data and one for index
dbmopen %HASH, $DBM_FILE, 0666
    or die "Can't open $DBM_FILE: $!\n";

# read standard input
LINE : while(<>) {
        #ignore records other than  batch headers
        next LINE unless /^T/;

        # remove the new line character
        chomp;

        # split the record!
        ($rec_type, $ocr_batch_number, $ocr_gift_date,
         $filler, $target_payment_num,
         $target_payment_amount) = unpack("A1 A9 A11 A5 A3 A11",$_);

                #convert the number fields from scalar string to scalar number!
                $target_payment_num += 0;
                $target_payment_amount += 0;

                # key is the batch number
                # value is batch date + payments + amount
                # all joined by :
                ($Key,$Value) = ($ocr_batch_number,"$ocr_gift_date:$target_payment_num:$target_payment_amount");
                ##
                # check whether this batch is already loaded
                if ( defined($HASH{$Key}) ) {
                        # if so, print an error
                        ($b_date,$b_payments,$b_amount)=split(/:/,$HASH{$Key});
                        print "Error: The batch $Key ($b_payments for \$$b_amount) is already uploaded\n";
                } else {
                        # else, add to the batch database
                        $HASH{$Key} = $Value;
                }
}
dbmclose %HASH;

3.5 Exercise

Using the examples above, write a program to read all batch records from an input file, verify against a DBM database and print a formatted report. Duplicate batches should also be indicated in the report. Try to split the tasks (verifying against the database, reporting etc) into individual subroutines. 
Also add another routine to generate an Excel CSV file report, in addition to the normal report.

0 comments:

Post a Comment

 
Related Posts Plugin for WordPress, Blogger...