Monday, November 7, 2011

OLAP Window for Running Balance

Quick! What does this do? ...

How about this one?
... I thought not, neither do I.

But if you do know what they mean, this might not be the right blog for you :)

What's the point?


The point is, not only is the saying true "You can't judge a book by its cover", sometimes you can't judge an article by its first page.

Or, in the case of this Sybase iAnywhere White Paper, the first 20 pages...
Analytic Functions in SQL Anywhere by G. N. Paulley and B. Lucier
...not if you're looking for information about the new OLAP WINDOW, PARTITION and RANGE keywords and the related aggregate functions like RANK(), ROW_NUMBER() and FIRST_VALUE().

In other words, if you're not looking for a detailed discussion of statistical minutae formulae involving VAR_POP(), STDDEV_SAMP() and the like, don't give up, just skip forward to page 21: that's where the fun begins.

Question: How do I compute a running balance?


It's curious that among all the excellent examples in Analytic Functions in SQL Anywhere so few (i.e., none) have exactly matched my requirements over the years... but that's OK; sometimes students must do their own homework :)

In Example 11 on page 22, however, the article does come close; "cumulative shipped quantity" is pretty much the same as "running balance", isn't it? We shall see...

Example 11 (Queries with window functions)

Consider the following query which lists all products shipped in July and August 2001 and the cumulative shipped quantity by shipping date:

Select p.id, p.description, s.quantity, s.ship date,
       Sum(s.quantity) Over (Partition by s.prod id
                             Order by s.ship date
                             Rows Between Unbounded Preceding
                                          and Current Row) as cumulative qty
From sales order items s Join product p On (s.prod id = p.id)
Where s.ship date Between ‘2001-07-01’ and ‘2001-08-31’
Order by p.id

Here are the tables for the running balance computation:
CREATE TABLE opening_balance (
   account_number      VARCHAR ( 10 ) NOT NULL PRIMARY KEY,
   amount              DECIMAL ( 11, 2 ) NOT NULL );

CREATE TABLE transaction (
   account_number      VARCHAR ( 10 ) NOT NULL REFERENCES opening_balance,
   transaction_number  BIGINT NOT NULL DEFAULT AUTOINCREMENT PRIMARY KEY,
   transaction_date    DATE NOT NULL DEFAULT CURRENT DATE,
   transaction_type    VARCHAR ( 10 ) NOT NULL CHECK ( transaction_type IN ( 'Debit', 'Credit' ) ),
   amount              DECIMAL ( 11, 2 ) NOT NULL );

INSERT opening_balance VALUES 
   ( '200', 100.00 ),
   ( '300', 658.39 );

INSERT transaction VALUES 
   ( '200', DEFAULT, '2011-10-20', 'Debit',    20.00 ),
   ( '300', DEFAULT, '2011-10-21', 'Debit',   927.47 ),
   ( '200', DEFAULT, '2011-10-23', 'Credit',  200.00 ),
   ( '300', DEFAULT, '2011-10-25', 'Credit', 1242.67 ),
   ( '200', DEFAULT, '2011-10-29', 'Credit',  400.00 ),
   ( '300', DEFAULT, '2011-10-30', 'Credit',   78.05 ),
   ( '200', DEFAULT, '2011-11-01', 'Debit',   100.00 ),
   ( '300', DEFAULT, '2011-11-02', 'Debit',    63.89 );

COMMIT;

Here's a solution that uses SUM() OVER an OLAP WINDOW:
SELECT transaction.account_number,
       transaction.transaction_number,
       transaction.transaction_date,
       transaction.transaction_type,
       transaction.amount,
       opening_balance.amount
          + SUM ( CASE transaction_type 
                     WHEN 'Credit' THEN transaction.amount
                     WHEN 'Debit'  THEN -transaction.amount
                     ELSE transaction.amount
                  END CASE
           ) OVER running AS running_balance
  FROM transaction
          INNER JOIN opening_balance
          ON opening_balance.account_number = transaction.account_number
WINDOW running AS ( 
          PARTITION BY transaction.account_number
          ORDER BY transaction.transaction_number
          RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW )
 ORDER BY transaction.account_number,
       transaction.transaction_number;

The WINDOW clause on lines 16 through 19 divides all the rows into partitions by account_number, and then defines the partition in terms of the current row: for each row in the result set, the partition is defined as that row plus all the rows that precede it and have the same account number. In other words, the window is a "running partition" that grows bigger with each row.

The SUM() OVER expression on lines 7 through 12 adds up (and in the case of debits, subtracts) all the transaction.amount values in the partition. In other words, it computes a "running total" of all the transactions.

The reference to opening_balance.amount on line 6 takes care of the fact that a "running balance" is required, which is subtly different from a "running total"... sales quotas might reset at the beginning of each reporting cycle, but banks have longer memories :)

Here's the output:
account_number transaction_number transaction_date transaction_type amount running_balance

'200'          1                  '2011-10-20'     'Debit'           20.00    80.00
'200'          3                  '2011-10-23'     'Credit'         200.00   280.00
'200'          5                  '2011-10-29'     'Credit'         400.00   680.00
'200'          7                  '2011-11-01'     'Debit'          100.00   580.00

'300'          2                  '2011-10-21'     'Debit'          927.47  -269.08
'300'          4                  '2011-10-25'     'Credit'        1242.67   973.59
'300'          6                  '2011-10-30'     'Credit'          78.05  1051.64
'300'          8                  '2011-11-02'     'Debit'           63.89   987.75



Did you notice?
Both are relatively new to SQL Anywhere, both are small improvements that nevertheless make life easier.



Dilbert.com


4 comments:

Anonymous said...

Breck,

I would suggest to add a link from the according SQL Anywhere forum question to this elaborate answer.

You seem to hesitate to link to your blog from there - which is a pity IMHO:)

As your blog seems to get regular ideas from the forum questions (and answer attempts), it would be nice to connect both - and folks like me could add such links themselves but this seems like "earning reputation for someone else's great work":)

Regards
Volker

DazLiquid said...

Although that leaves you open to criticism for not posting things in the forum and linking to external sites instead. In this case the answer was given in the forum and the questioner was happy, and anyone who cares about what Breck thinks :) will have seen his view on his website. Seems like the perfect solution really.

Obviously it's a different matter if Breck had already discussed something on his site before the question is asked.

Breck Carter said...

@Volker and @DazLiquid: Blog articles are often written a few days ahead of the post date, so links can't be added to the forum right away... and then, being old, I often forget :)

DazLiquid said...

@Breck I'll let you off then :), especially as this is the only explanation of a window that I've read and understood :-) ta very much.