Monday, November 22, 2010

Pushing OPENSTRING and CROSS APPLY

The word "pushing" in the title has two meanings: First, to promote (push) the use of OPENSTRING to solve real-world problems, and second, to push OPENSTRING and CROSS APPLY beyond the examples shown in the Help.

Simply put...

SELECT ... FROM OPENSTRING is like coding SELECT ... FROM LOAD TABLE, and

CROSS APPLY lets you join OPENSTRING with another table in the FROM clause.

(For more information, scroll down to the "openstring-expression" section on the FROM clause page in the Help; also see Joins resulting from apply expressions.)

Let's say you have a table that contains a tab-delimited list of field values in a single string column, and you want to split those values apart and store them in separate columns in another table.

OPENSTRING and CROSS APPLY will let you do that in a single INSERT SELECT statement.

Here's an example of the input table; each value in the line_text column contains a tab-delimited list of field values describing a single radio program: station WAAV, dates Mon-Fri, time 10am and so on:
CREATE TABLE raw_text (
   line_number BIGINT NOT NULL 
                  DEFAULT AUTOINCREMENT
                  PRIMARY KEY CLUSTERED,
   line_text   LONG VARCHAR NOT NULL DEFAULT '' );

INSERT raw_text ( line_text ) VALUES ( 
   'Play - not available now\x09\x09WAAV\x09Mon-Fri\x0910am\x092 hours\x09Leland\x09NC\x09980 AM\x09\x09' );
INSERT raw_text ( line_text ) VALUES ( 
   'Play - not available now\x09\x09KFAR\x09Mon-Fri\x0912pm\x093 hours\x09Fairbanks\x09AK\x09660 AM\x09\x09' );
INSERT raw_text ( line_text ) VALUES ( 
   'Play - not available now\x09\x09News Talk 550\x09Mon-Fri\x0910am\x092 hours\x09Gainesville\x09GA\x09550 AM\x09\x09' );
INSERT raw_text ( line_text ) VALUES ( 
   'Play - not available now\x09\x09News Radio 1390\x09Mon-Fri\x0910am\x092 hours\x09Jackson\x09TN\x091390 AM\x09\x09' );
INSERT raw_text ( line_text ) VALUES ( 
   'Play - not available now\x09\x09KENN\x09Sat\x095pm\x093 hours\x09Farmington\x09NM\x091390 AM\x09' );
COMMIT;
Here's a table designed to hold each field value in a separate column:
CREATE TABLE radio_programs (
   line_number BIGINT NOT NULL PRIMARY KEY CLUSTERED,
   station     VARCHAR ( 100 ) NULL,
   dates       VARCHAR ( 100 ) NULL,
   times       VARCHAR ( 100 ) NULL,
   duration    VARCHAR ( 100 ) NULL,
   city        VARCHAR ( 100 ) NULL,
   state       VARCHAR ( 100 ) NULL,
   frequency   VARCHAR ( 100 ) NULL,
   line_text   LONG VARCHAR NOT NULL );
Here's the INSERT SELECT that fills the radio_programs table:
INSERT radio_programs
WITH AUTO NAME
SELECT *
  FROM raw_text 
       CROSS APPLY
       OPENSTRING ( VALUE STRING ( raw_text.line_text ) )
             WITH ( TABLE radio_programs (
                          filler(),
                          filler(),
                          station,
                          dates,
                          times,
                          duration,
                          city,
                          state,
                          frequency,
                          filler() ) )
             OPTION ( DELIMITED BY '\x09' ) AS programs;
Lines 6 through 18 shows the OPENSTRING call which acts like LOAD TABLE on the string in raw_text.line_text.

The VALUE clause on line 6 specifies where the input is coming from, and the OPTION clause on line 18 tells OPENSTRING that raw_text.line_text is tab-delimited rather than the default of comma-delimited.

The WITH clause on lines 7 through 17 tells OPENSTRING what to do with each of ten fields in each input string.

The TABLE clause on line 7 tells OPENSTRING that the radio_programs table is going to be the model or template for OPENSTRING to use. The column names on lines 10 through 16 tell OPENSTRING which radio_programs columns correspond to which input fields. The filler() fields on lines 8, 9 and 17 tell OPENSTRING to skip fields 1, 2 and 10 in the input.

The AS clause on line 18 is required to give the OPENSTRING call a correlation name in the FROM clause... but that name isn't used in this example.

The CROSS APPLY join operator on line 5 makes it possible for the OPENSTRING call to include a reference to a column in the raw_text table.

Here's what the final result looks like:


No comments: