Using recursive queries
Some applications work with data that is recursive in nature. To query this type of data, you can use a recursive common table expression or a recursive view.
For example, a Bill of Materials (BOM) application works with the expansion of parts and its component subparts. For example, a chair might be made of a seat unit and a leg assembly. The seat unit might consist of a seat and two arms. Each of these parts can be further broken down into its subparts until there is a list of all the parts needed to build a chair. In the following trip planner examples, airline flights and train connections are used to find transportation paths between cities. The following table definitions and data are used in the examples.
CREATE TABLE FLIGHTS (DEPARTURE CHAR(20), ARRIVAL CHAR(20), CARRIER CHAR(15), FLIGHT_NUMBER CHAR(5), PRICE INT)Now that the tables are set up, the data can be queried to find information about the airline network. Suppose you want to find out what cities you can fly to if you start in Chicago, and how many separate flights it will take to get there. The following query shows you that information.INSERT INTO FLIGHTS VALUES('New York', 'Paris', 'Atlantic', '234', 400)
INSERT INTO FLIGHTS VALUES('Chicago', 'Miami', 'NA Air', '2334', 300)
INSERT INTO FLIGHTS VALUES('New York', 'London', 'Atlantic', '5473', 350)
INSERT INTO FLIGHTS VALUES('London', 'Athens' , 'Mediterranean', '247', 340)
INSERT INTO FLIGHTS VALUES('Athens', 'Nicosia' , 'Mediterranean', '2356', 280)
INSERT INTO FLIGHTS VALUES('Paris', 'Madrid' , 'Euro Air', '3256', 380)
INSERT INTO FLIGHTS VALUES('Paris', 'Cairo' , 'Euro Air', '63', 480)
INSERT INTO FLIGHTS VALUES('Chicago', 'Frankfurt', 'Atlantic', '37', 480)
INSERT INTO FLIGHTS VALUES('Frankfurt', 'Moscow', 'Asia Air', '2337', 580)
INSERT INTO FLIGHTS VALUES('Frankfurt', 'Beijing', 'Asia Air', '77', 480)
INSERT INTO FLIGHTS VALUES('Moscow', 'Tokyo', 'Asia Air', '437', 680)
INSERT INTO FLIGHTS VALUES('Frankfurt', 'Vienna', 'Euro Air', '59', 200)
INSERT INTO FLIGHTS VALUES('Paris', 'Rome', 'Euro Air', '534', 340)
INSERT INTO FLIGHTS VALUES('Miami', 'Lima', 'SA Air', '5234', 530)
INSERT INTO FLIGHTS VALUES('New York', 'Los Angeles', 'NA Air', '84', 330)
INSERT INTO FLIGHTS VALUES('Los Angeles', 'Tokyo', 'Pacific Air', '824', 530)
INSERT INTO FLIGHTS VALUES('Tokyo', 'Hong Kong', 'Asia Air', '94', 330)
INSERT INTO FLIGHTS VALUES('Washington', 'Toronto', 'NA Air', '104', 250)
CREATE TABLE TRAINS(DEPARTURE CHAR(20), ARRIVAL CHAR(20), RAILLINE CHAR(15), TRAIN CHAR(5), PRICE INT)
INSERT INTO TRAINS VALUES('Chicago', 'Washington', 'UsTrack', '323', 90)
INSERT INTO TRAINS VALUES('Madrid', 'Barcelona', 'EuroTrack', '5234', 60)
INSERT INTO TRAINS VALUES('Washington' , 'Boston' , 'UsTrack', '232', 50)
WITH destinations (origin, departure, arrival, flight_count) AS (SELECT a.departure, a.departure, a.arrival, 1 FROM flights a WHERE a.departure = 'Chicago' UNION ALL SELECT r.origin, b.departure, b.arrival, r.flight_count + 1 FROM destinations r, flights b WHERE r.arrival = b.departure)SELECT origin, departure, arrival, flight_count FROM destinations
This query returns the following information.
Table 1. Results of the previous query ORIGIN DEPARTURE ARRIVAL FLIGHT_COUNT Chicago Chicago Miami 1 Chicago Chicago Frankfurt 1 Chicago Miami Lima 2 Chicago Frankfurt Moscow 2 Chicago Frankfurt Beijing 2 Chicago Frankfurt Vienna 2 Chicago Moscow Tokyo 3 Chicago Tokyo Hong Kong 4 This recursive query is written in two parts. The first part of the common table expression is called the intialization fullselect. It selects the first rows for the result set of the common table expression. In this example, it selects the two rows in the flights table that get you directly to another location from Chicago. It also initializes the number of flight legs to one for each row it selects.
The second part of the recursive query joins the rows from the current result set of the common table expression with other rows from the original table. It is called the iterative fullselect. This is where the recursion is introduced. Notice that the rows that have already been selected for the result set are referenced by using the name of the common table expression as the table name and the common table expression result column names as the column names.
In this recursive part of the query, any rows from the original table that you can get to from each of the previously selected arrival cities are selected. A previously selected row's arrival city becomes the new departure city. Each row from this recursive select increments the flight count to the destination by one more flight. As these new rows are added to the common table expression result set, they are also fed into the iterative fullselect to generate more result set rows. In the data for the final result, you can see that the total number of flights is actually the total number of recursive joins (plus 1) it took to get to that arrival city. A recursive view looks very similar to a recursive common table expression. You can write the previous recursive common table expression as a recursive view like this:
CREATE VIEW destinations (origin, departure, arrival, flight_count) AS SELECT departure, departure, arrival, 1 FROM flights WHERE departure = 'Chicago' UNION ALL SELECT r.origin, b.departure, b.arrival, r.flight_count + 1 FROM destinations r, flights b WHERE r.arrival = b.departure)The interactive fullselect part of this view definition refers to the view itself. Selection from this view returns the same rows as you get from the previous recursive common table expression.
Example: Two starting cities
Now, to make the query a bit more complicated, suppose you are willing to fly from either Chicago or New York, and you want to know where you could go and how much it would cost.
WITH destinations (departure, arrival, connections, cost) AS (SELECT a.departure, a.arrival, 0, price FROM flights a WHERE a.departure = 'Chicago' OR a.departure = 'New York' UNION ALL SELECT r.departure, b.arrival, r.connections + 1, r.cost + b.price FROM destinations r, flights b WHERE r.arrival = b.departure)SELECT departure, arrival, connections, cost FROM destinations
This query returns the following information.
Table 2. Results of the previous query DEPARTURE ARRIVAL CONNECTIONS COST Chicago Miami 0 300 Chicago Frankfurt 0 480 New York Paris 0 400 New York London 0 350 New York Los Angeles 0 330 Chicago Lima 1 830 Chicago Moscow 1 1,060 Chicago Beijing 1 960 Chicago Vienna 1 680 New York Madrid 1 780 New York Cairo 1 880 New York Rome 1 740 New York Athens 1 690 New York Tokyo 1 860 Chicago Tokyo 2 1,740 New York Nicosia 2 970 New York Hong Kong 2 1,190 Chicago Hong Kong 3 2,070 For each returned row, the results show the starting departure city and the final destination city. It counts the number of connections needed rather than the total number of flight and adds up the total cost for all the flights.
Example: Two tables used for recursion
Now, suppose you start in Chicago but add in transportation by railway in addition to the airline flights, and you want to know which cities you can go to.
The following query returns that information:
WITH destinations (departure, arrival, connections, flights, trains, cost) AS (SELECT f.departure, f.arrival, 0, 1, 0, price FROM flights f WHERE f.departure = 'Chicago' UNION ALL SELECT t.departure, t.arrival, 0, 0, 1, price FROM trains t WHERE t.departure = 'Chicago' UNION ALL SELECT r.departure, b.arrival, r.connections + 1 , r.flights + 1, r.trains, r.cost + b.price FROM destinations r, flights b WHERE r.arrival = b.departure UNION ALL SELECT r.departure, c.arrival, r.connections + 1 , r.flights, r.trains + 1, r.cost + c.price FROM destinations r, trains c WHERE r.arrival = c.departure)SELECT departure, arrival, connections, flights, trains, cost FROM destinations
This query returns the following information.
Table 3. Results of the previous query DEPARTURE ARRIVAL CONNECTIONS FLIGHTS TRAINS COST Chicago Miami 0 1 0 300 Chicago Frankfurt 0 1 0 480 Chicago Washington 0 0 1 90 Chicago Lima 1 2 0 830 Chicago Moscow 1 2 0 1,060 Chicago Beijing 1 2 0 960 Chicago Vienna 1 2 0 680 Chicago Toronto 1 1 1 340 Chicago Boston 1 0 2 140 Chicago Tokyo 2 3 0 1,740 Chicago Hong Kong 3 4 0 2,070 In this example, there are two parts of the common table expression that provide initialization values to the query: one for flights and one for trains. For each of the result rows, there are two recursive references to get from the previous arrival location to the next possible destination: one for continuing by air, the other for continuing by train. In the final results, you would see how many connections are needed and how many airline or train trips can be taken.
Example: DEPTH FIRST and BREADTH FIRST options
The two examples here show the difference in the result set row order based on whether the recursion is processed depth first or breadth first.
The search clause is not supported for recursive views. You can define a view that contains a recursive common table expression to get this function.
The option to determine the result using breadth first or depth first is a recursive relationship sort based on the recursive join column specified for the SEARCH BY clause. When the recursion is handled breadth first, all children are processed first, then all grandchildren, then all great grandchildren. When the recursion is handled depth first, the full recursive ancestry chain of one child is processed before going to the next child.
In both of these cases, you specify an extra column name that is used by the recursive process to keep track of the depth first or breadth first ordering. This column must be used in the ORDER BY clause of the outer query to get the rows back in the specified order. If this column is not used in the ORDER BY, the DEPTH FIRST or BREADTH FIRST processing option is ignored.
The selection of which column to use for the SEARCH BY column is important. To have any meaning in the result, it must be the column that is used in the iterative fullselect to join from the initialization fullselect. In this example, ARRIVAL is the column to use.
The following query returns that information:
WITH destinations (departure, arrival, connections, cost) AS (SELECT f.departure, f.arrival, 0, price FROM flights f WHERE f.departure = 'Chicago' UNION ALL SELECT r.departure, b.arrival, r.connections + 1, r.cost + b.price FROM destinations r, flights b WHERE r.arrival = b.departure) SEARCH DEPTH FIRST BY arrival SET ordcolSELECT * FROM destinations ORDER BY ordcol
This query returns the following information.
Table 4. Results of the previous query DEPARTURE ARRIVAL CONNECTIONS COST Chicago Miami 0 300 Chicago Lima 1 830 Chicago Frankfurt 0 480 Chicago Moscow 1 1,060 Chicago Tokyo 2 1,740 Chicago Hong Kong 3 2,070 Chicago Beijing 1 960 Chicago Vienna 1 680 In this result data, you can see that all destinations that are generated from the Chicago-to-Miami row are listed before the destinations from the Chicago-to-Frankfort row.
Next, you can run the same query but request the result to be ordered breadth first.
WITH destinations (departure, arrival, connections, cost) AS (SELECT f.departure, f.arrival, 0, price FROM flights f WHERE f.departure='Chicago' UNION ALL SELECT r.departure, b.arrival, r.connections + 1, r.cost + b.price FROM destinations r, flights b WHERE r.arrival = b.departure) SEARCH BREADTH FIRST BY arrival SET ordcolSELECT * FROM destinations ORDER BY ordcol
This query returns the following information.
Table 5. Results of the previous query DEPARTURE ARRIVAL CONNECTIONS COST Chicago Miami 0 300 Chicago Frankfurt 0 480 Chicago Lima 1 830 Chicago Moscow 1 1,060 Chicago Beijing 1 960 Chicago Vienna 1 680 Chicago Tokyo 2 1,740 Chicago Hong Kong 3 2,070 In this result data, you can see that all the direct connections from Chicago are listed before the connecting flights. The data is identical to the results from the previous query, but in a breadth first order.
Example: Cyclic
The key to any recursive process, whether it is a recursive programming algorithm or querying recursive data, is that the recursion must be finite. If not, you will get into a never ending loop. The CYCLE option allows you to safeguard against cyclic data. Not only will it terminate repeating cycles but it also allows you to optionally output a cycle mark indicator that may lead you to find cyclic data.
The cycle clause is not supported for recursive views. You can define a view that contains a recursive common table expression to get this function.
For a final example, suppose we have a cycle in the data. By adding one more row to the table, there is now a flight from Cairo to Paris and one from Paris to Cairo. Without accounting for possible cyclic data like this, it is quite easy to generate a query that will go into an infinite loop processing the data.
The following query returns that information:
INSERT INTO FLIGHTS VALUES('Cairo', 'Paris', 'Euro Air', '1134', 440)WITH destinations (departure, arrival, connections, cost, itinerary) AS (SELECT f.departure, f.arrival, 1, price, CAST(f.departure CONCAT f.arrival AS VARCHAR(2000)) FROM flights f WHERE f.departure = 'New York' UNION ALL SELECT r.departure, b.arrival, r.connections + 1 , r.cost + b.price, CAST(r.itinerary CONCAT b.arrival AS VARCHAR(2000)) FROM destinations r, flights b WHERE r.arrival = b.departure) CYCLE arrival SET cyclic_data TO '1' DEFAULT '0'
SELECT departure, arrival, itinerary, cyclic_data FROM destinations ORDER BY cyclic_data
This query returns the following information.
Table 6. Results of the previous query DEPARTURE ARRIVAL ITINERARY CYCLIC_DATA New York Paris New York Paris 0 New York London New York London 0 New York Los Angeles New York Los Angeles 0 New York Madrid New York Paris Madrid 0 New York Cairo New York Paris Cairo 0 New York Rome New York Paris Rome 0 New York Athens New York London Athens 0 New York Tokyo New York Los Angeles Tokyo 0 New York Paris New York Paris Cairo Paris 1 New York Nicosia New York London Athens Nicosia 0 New York Hong Kong New York Los Angeles Tokyo Hong Kong 0 In this example, the ARRIVAL column is defined in the CYCLE clause as the column to use for detecting a cycle in the data. When a cycle is found, a special column, CYCLIC_DATA in this case, is set to the character value of '1' for the cycling row in the result set. All other rows will contain the default value of '0'. When a cycle on the ARRIVAL column is found, processing will not proceed any further in the data so the infinite loop will not happen. To see if your data actually has a cyclic reference, the CYCLIC_DATA column can be referenced in the outer query.
Parent topic:
Retrieving data using the SELECT statement