Using recursive queries

 

Some applications work with data that is recursive in nature. To query this type of data, you can use a recursive common table expression or a recursive view.

For example, a Bill of Materials (BOM) application works with the expansion of parts and its component subparts. For example, a chair might be made of a seat unit and a leg assembly. The seat unit might consist of a seat and two arms. Each of these parts can be further broken down into its subparts until there is a list of all the parts needed to build a chair. In the following trip planner examples, airline flights and train connections are used to find transportation paths between cities. The following table definitions and data are used in the examples.

CREATE TABLE FLIGHTS (DEPARTURE CHAR(20),
                      ARRIVAL CHAR(20), 
                      CARRIER CHAR(15),
                      FLIGHT_NUMBER CHAR(5), 
                      PRICE INT)



INSERT INTO FLIGHTS VALUES('New York', 'Paris', 'Atlantic', '234', 400)

INSERT INTO FLIGHTS VALUES('Chicago', 'Miami', 'NA Air', '2334', 300)

INSERT INTO FLIGHTS VALUES('New York', 'London', 'Atlantic', '5473', 350)

INSERT INTO FLIGHTS VALUES('London', 'Athens' , 'Mediterranean', '247', 340)

INSERT INTO FLIGHTS VALUES('Athens', 'Nicosia' , 'Mediterranean', '2356', 280)

INSERT INTO FLIGHTS VALUES('Paris', 'Madrid' , 'Euro Air', '3256', 380)

INSERT INTO FLIGHTS VALUES('Paris', 'Cairo' , 'Euro Air', '63', 480)

INSERT INTO FLIGHTS VALUES('Chicago', 'Frankfurt', 'Atlantic', '37', 480)

INSERT INTO FLIGHTS VALUES('Frankfurt', 'Moscow', 'Asia Air', '2337', 580)

INSERT INTO FLIGHTS VALUES('Frankfurt', 'Beijing', 'Asia Air', '77', 480)

INSERT INTO FLIGHTS VALUES('Moscow', 'Tokyo', 'Asia Air', '437', 680)

INSERT INTO FLIGHTS VALUES('Frankfurt', 'Vienna', 'Euro Air', '59', 200)

INSERT INTO FLIGHTS VALUES('Paris', 'Rome', 'Euro Air', '534', 340)

INSERT INTO FLIGHTS VALUES('Miami', 'Lima', 'SA Air', '5234', 530)

INSERT INTO FLIGHTS VALUES('New York', 'Los Angeles', 'NA Air', '84', 330)

INSERT INTO FLIGHTS VALUES('Los Angeles', 'Tokyo', 'Pacific Air', '824', 530)

INSERT INTO FLIGHTS VALUES('Tokyo', 'Hong Kong', 'Asia Air', '94', 330)

INSERT INTO FLIGHTS VALUES('Washington', 'Toronto', 'NA Air', '104', 250)

CREATE TABLE TRAINS(DEPARTURE CHAR(20), ARRIVAL CHAR(20), RAILLINE CHAR(15), TRAIN CHAR(5), PRICE INT)

INSERT INTO TRAINS VALUES('Chicago', 'Washington', 'UsTrack', '323', 90)

INSERT INTO TRAINS VALUES('Madrid', 'Barcelona', 'EuroTrack', '5234', 60)

INSERT INTO TRAINS VALUES('Washington' , 'Boston' , 'UsTrack', '232', 50)

Now that the tables are set up, the data can be queried to find information about the airline network. Suppose you want to find out what cities you can fly to if you start in Chicago, and how many separate flights it will take to get there. The following query shows you that information.
WITH destinations (origin, departure, arrival, flight_count) AS    
    (SELECT a.departure, a.departure, a.arrival, 1 
            FROM flights a             WHERE a.departure = 'Chicago'
     UNION ALL
     SELECT r.origin, b.departure, b.arrival, r.flight_count + 1 
            FROM destinations r, flights b             WHERE r.arrival = b.departure)

SELECT origin, departure, arrival, flight_count FROM destinations

This query returns the following information.

Table 1. Results of the previous query
ORIGIN DEPARTURE ARRIVAL FLIGHT_COUNT
Chicago Chicago Miami 1
Chicago Chicago Frankfurt 1
Chicago Miami Lima 2
Chicago Frankfurt Moscow 2
Chicago Frankfurt Beijing 2
Chicago Frankfurt Vienna 2
Chicago Moscow Tokyo 3
Chicago Tokyo Hong Kong 4

This recursive query is written in two parts. The first part of the common table expression is called the intialization fullselect. It selects the first rows for the result set of the common table expression. In this example, it selects the two rows in the flights table that get you directly to another location from Chicago. It also initializes the number of flight legs to one for each row it selects.

The second part of the recursive query joins the rows from the current result set of the common table expression with other rows from the original table. It is called the iterative fullselect. This is where the recursion is introduced. Notice that the rows that have already been selected for the result set are referenced by using the name of the common table expression as the table name and the common table expression result column names as the column names.

In this recursive part of the query, any rows from the original table that you can get to from each of the previously selected arrival cities are selected. A previously selected row's arrival city becomes the new departure city. Each row from this recursive select increments the flight count to the destination by one more flight. As these new rows are added to the common table expression result set, they are also fed into the iterative fullselect to generate more result set rows. In the data for the final result, you can see that the total number of flights is actually the total number of recursive joins (plus 1) it took to get to that arrival city. A recursive view looks very similar to a recursive common table expression. You can write the previous recursive common table expression as a recursive view like this:

CREATE VIEW destinations (origin, departure, arrival, flight_count) AS    
     SELECT departure, departure, arrival, 1 
            FROM flights 
            WHERE departure = 'Chicago'
     UNION ALL
     SELECT r.origin, b.departure, b.arrival, r.flight_count + 1 
            FROM destinations r, flights b             WHERE r.arrival = b.departure)

The interactive fullselect part of this view definition refers to the view itself. Selection from this view returns the same rows as you get from the previous recursive common table expression.

 

Example: Two starting cities

Now, to make the query a bit more complicated, suppose you are willing to fly from either Chicago or New York, and you want to know where you could go and how much it would cost.

WITH destinations (departure, arrival, connections, cost) AS
    (SELECT a.departure, a.arrival, 0, price             FROM flights a 
            WHERE a.departure = 'Chicago' OR
                  a.departure = 'New York'
     UNION ALL 
     SELECT r.departure, b.arrival, r.connections + 1,
                  r.cost + b.price 
            FROM destinations r, flights b 
            WHERE r.arrival = b.departure) 

SELECT departure, arrival, connections, cost FROM destinations

This query returns the following information.

Table 2. Results of the previous query
DEPARTURE ARRIVAL CONNECTIONS COST
Chicago Miami 0 300
Chicago Frankfurt 0 480
New York Paris 0 400
New York London 0 350
New York Los Angeles 0 330
Chicago Lima 1 830
Chicago Moscow 1 1,060
Chicago Beijing 1 960
Chicago Vienna 1 680
New York Madrid 1 780
New York Cairo 1 880
New York Rome 1 740
New York Athens 1 690
New York Tokyo 1 860
Chicago Tokyo 2 1,740
New York Nicosia 2 970
New York Hong Kong 2 1,190
Chicago Hong Kong 3 2,070

For each returned row, the results show the starting departure city and the final destination city. It counts the number of connections needed rather than the total number of flight and adds up the total cost for all the flights.

 

Example: Two tables used for recursion

Now, suppose you start in Chicago but add in transportation by railway in addition to the airline flights, and you want to know which cities you can go to.

The following query returns that information:

WITH destinations (departure, arrival, connections, flights, trains, cost) AS
  (SELECT f.departure, f.arrival, 0, 1, 0, price 
          FROM  flights f 
          WHERE f.departure = 'Chicago' 
   UNION ALL
   SELECT t.departure, t.arrival, 0, 0, 1, price 
          FROM trains t 
          WHERE t.departure = 'Chicago'
   UNION ALL
   SELECT r.departure, b.arrival, r.connections + 1 , r.flights + 1, r.trains, 
             r.cost + b.price           FROM  destinations r, flights b  
          WHERE r.arrival = b.departure    UNION ALL
   SELECT r.departure, c.arrival, r.connections + 1 ,
             r.flights, r.trains + 1, r.cost + c.price  
          FROM destinations r, trains c 
          WHERE r.arrival = c.departure) 

SELECT departure, arrival, connections, flights, trains, cost FROM destinations

This query returns the following information.

Table 3. Results of the previous query
DEPARTURE ARRIVAL CONNECTIONS FLIGHTS TRAINS COST
Chicago Miami 0 1 0 300
Chicago Frankfurt 0 1 0 480
Chicago Washington 0 0 1 90
Chicago Lima 1 2 0 830
Chicago Moscow 1 2 0 1,060
Chicago Beijing 1 2 0 960
Chicago Vienna 1 2 0 680
Chicago Toronto 1 1 1 340
Chicago Boston 1 0 2 140
Chicago Tokyo 2 3 0 1,740
Chicago Hong Kong 3 4 0 2,070

In this example, there are two parts of the common table expression that provide initialization values to the query: one for flights and one for trains. For each of the result rows, there are two recursive references to get from the previous arrival location to the next possible destination: one for continuing by air, the other for continuing by train. In the final results, you would see how many connections are needed and how many airline or train trips can be taken.

 

Example: DEPTH FIRST and BREADTH FIRST options

The two examples here show the difference in the result set row order based on whether the recursion is processed depth first or breadth first.

The search clause is not supported for recursive views. You can define a view that contains a recursive common table expression to get this function.

The option to determine the result using breadth first or depth first is a recursive relationship sort based on the recursive join column specified for the SEARCH BY clause. When the recursion is handled breadth first, all children are processed first, then all grandchildren, then all great grandchildren. When the recursion is handled depth first, the full recursive ancestry chain of one child is processed before going to the next child.

In both of these cases, you specify an extra column name that is used by the recursive process to keep track of the depth first or breadth first ordering. This column must be used in the ORDER BY clause of the outer query to get the rows back in the specified order. If this column is not used in the ORDER BY, the DEPTH FIRST or BREADTH FIRST processing option is ignored.

The selection of which column to use for the SEARCH BY column is important. To have any meaning in the result, it must be the column that is used in the iterative fullselect to join from the initialization fullselect. In this example, ARRIVAL is the column to use.

The following query returns that information:

WITH destinations (departure, arrival, connections, cost) AS
    (SELECT f.departure, f.arrival, 0, price       
            FROM flights f 
            WHERE f.departure = 'Chicago'
     UNION ALL 
     SELECT r.departure, b.arrival, r.connections + 1,
                r.cost + b.price 
            FROM destinations r, flights b 
            WHERE r.arrival = b.departure) 
    SEARCH DEPTH FIRST BY arrival SET ordcol 

SELECT * FROM destinations ORDER BY ordcol

This query returns the following information.

Table 4. Results of the previous query
DEPARTURE ARRIVAL CONNECTIONS COST
Chicago Miami 0 300
Chicago Lima 1 830
Chicago Frankfurt 0 480
Chicago Moscow 1 1,060
Chicago Tokyo 2 1,740
Chicago Hong Kong 3 2,070
Chicago Beijing 1 960
Chicago Vienna 1 680

In this result data, you can see that all destinations that are generated from the Chicago-to-Miami row are listed before the destinations from the Chicago-to-Frankfort row.

Next, you can run the same query but request the result to be ordered breadth first.

WITH destinations (departure, arrival, connections, cost) AS 
    (SELECT f.departure, f.arrival, 0, price             FROM flights f 
            WHERE f.departure='Chicago' 
     UNION ALL 
     SELECT r.departure, b.arrival, r.connections + 1,
                r.cost + b.price 
            FROM destinations r, flights b 
            WHERE r.arrival = b.departure) 
    SEARCH BREADTH FIRST BY arrival SET ordcol 

SELECT * FROM destinations ORDER BY ordcol

This query returns the following information.

Table 5. Results of the previous query
DEPARTURE ARRIVAL CONNECTIONS COST
Chicago Miami 0 300
Chicago Frankfurt 0 480
Chicago Lima 1 830
Chicago Moscow 1 1,060
Chicago Beijing 1 960
Chicago Vienna 1 680
Chicago Tokyo 2 1,740
Chicago Hong Kong 3 2,070

In this result data, you can see that all the direct connections from Chicago are listed before the connecting flights. The data is identical to the results from the previous query, but in a breadth first order.

 

Example: Cyclic

The key to any recursive process, whether it is a recursive programming algorithm or querying recursive data, is that the recursion must be finite. If not, you will get into a never ending loop. The CYCLE option allows you to safeguard against cyclic data. Not only will it terminate repeating cycles but it also allows you to optionally output a cycle mark indicator that may lead you to find cyclic data.

The cycle clause is not supported for recursive views. You can define a view that contains a recursive common table expression to get this function.

For a final example, suppose we have a cycle in the data. By adding one more row to the table, there is now a flight from Cairo to Paris and one from Paris to Cairo. Without accounting for possible cyclic data like this, it is quite easy to generate a query that will go into an infinite loop processing the data.

The following query returns that information:

INSERT INTO FLIGHTS VALUES('Cairo', 'Paris', 'Euro Air', '1134', 440)



WITH destinations (departure, arrival, connections, cost, itinerary) AS (SELECT f.departure, f.arrival, 1, price, CAST(f.departure CONCAT f.arrival AS VARCHAR(2000)) FROM flights f WHERE f.departure = 'New York' UNION ALL SELECT r.departure, b.arrival, r.connections + 1 , r.cost + b.price, CAST(r.itinerary CONCAT b.arrival AS VARCHAR(2000)) FROM destinations r, flights b WHERE r.arrival = b.departure) CYCLE arrival SET cyclic_data TO '1' DEFAULT '0'

SELECT departure, arrival, itinerary, cyclic_data FROM destinations ORDER BY cyclic_data

This query returns the following information.

Table 6. Results of the previous query
DEPARTURE ARRIVAL ITINERARY CYCLIC_DATA
New York Paris New York     Paris 0
New York London New York     London 0
New York Los Angeles New York     Los Angeles 0
New York Madrid New York     Paris     Madrid 0
New York Cairo New York     Paris     Cairo 0
New York Rome New York     Paris     Rome 0
New York Athens New York     London     Athens 0
New York Tokyo New York     Los Angeles     Tokyo 0
New York Paris New York     Paris     Cairo     Paris 1
New York Nicosia New York     London     Athens     Nicosia 0
New York Hong Kong New York     Los Angeles     Tokyo     Hong Kong 0

In this example, the ARRIVAL column is defined in the CYCLE clause as the column to use for detecting a cycle in the data. When a cycle is found, a special column, CYCLIC_DATA in this case, is set to the character value of '1' for the cycling row in the result set. All other rows will contain the default value of '0'. When a cycle on the ARRIVAL column is found, processing will not proceed any further in the data so the infinite loop will not happen. To see if your data actually has a cyclic reference, the CYCLIC_DATA column can be referenced in the outer query.

 

Parent topic:

Retrieving data using the SELECT statement