Writing UNION statements in MySQL 3.x ------------------------------------- Date: 13 Nov, 2002 by Michael Bailey For those of us who use MySQL, I think we can all share in our annoyance at the lack of the UNION command. We've been told that it will be implemented in the version 4.0 release, but we've been waiting a long time, and I doubt this will be resolved anytime soon. So, I guess that means we have to find our own way around the UNION statement. For those of you who do not know, the UNION statement allows a query to combine more than one SELECT statement into one all-powerful result set. This is useful when you have two tables with similar fields which you wish to combine into one SELECT. An example of this would be payments and charges in an accounting database. Each would be in a separate table, but each has similar fields (ex. amount, date, description, account, etc...). Without the UNION, you have to make two queries and combine them on the client side. This can be laborious in many situations, especially those which require sorting. As a rule of thumb, it's always best to let SQL do as much of the processing as possible. It saves in transfer time and processor time. So, how can one use a UNION statement without "using" a UNION statement? The secret lies in the use of LEFT JOIN's and a dummy table. A LEFT JOIN is similar to an INNER JOIN (which is the standard type of join), except that even if the join requirements aren't met, a record is returned, but the fields from the joined table will be NULL. The dummy table is a table that must be created in the database with only one field. I call the table _dummy and I call the field num. The table simply contains a different number in each record, starting at zero. You only need as many records as UNION's you plan to use. For example, if you only need to join two SELECT's, you only need 0 and 1 in _dummy. If you plan to join four SELECT's, then you need 0 through 3. This dummy table will allow you to make separate queries within one query. Here's the basic overview of how it's done: SELECT [fields] FROM _dummy AS D LEFT JOIN [table1] ON (D.num = 0 AND [condition1]) LEFT JOIN [table2] ON (D.num = 1 AND [condition2]) . . . WHERE D.num < [table count] Let's explain what happens. The primary table from which the SELECT is performed is the _dummy table. The final where clause restricts how many numbers are drawn from _dummy. This can be left out, but putting it in speeds things up a little bit. SQL will first find the 0 record of _dummy. Since D.num is 0, the first LEFT JOIN will be performed. Note, if nothing fits the LEFT JOIN condition, a NULL record will be returned. This is an unfortunate drawback of this method, though there is a work around which will be shown later. Once all of the records from table1 have been grabbed, SQL will look at the next record in _dummy, the 1. This will grab everything from the LEFT JOIN on table2. These two results will be combined. The fields are accessed as they would be normally. Note though that in any record, only the fields of one table will be valid. For example, when table1.[field] contains a value, table2.[field] will be NULL. To combine the results, you use the IFNULL() function. IFNULL() takes two parameters. It returns the value of the first parameter unless it's NULL, in which case the second is returned. So, if there is a field "id" in table1 and table2 that you want, you would use IFNULL(table1.id,table2.id). This will return the id from table1 unless it's NULL, then the id from table2 will be returned. This is the way that you can remove the null records mentioned above. Add an IFNULL() statement in the WHERE clause. The IFNULL() needs to try to return a valid non-null field from one of the tables. If it can't then the record is blank and should not be returned. Example: SELECT IFNULL(table1.id, table2.id) AS id FROM _dummy AS D LEFT JOIN table1 ON (D.num = 0) LEFT JOIN table2 ON (D.num = 1) WHERE D.num < 2 AND IFNULL(table1.id,table2.id) IS NOT NULL ORDER BY id This query will return all of the id's from each table and sort them. If either table1 or table2 is empty, no null records will be returned. Without the second condition in the WHERE clause, if table2 had been empty a null record would have been returned. What I believe to be the most useful thing about this type of query, is the use of grouping. Going back to my first example of payments and charges. Here is a query that will go through and add all of the charges and payments to find the specified account's current balance. Three tables will be referenced. Accounts holds information about each account. Payments holds information regarding the payments performed. Charges holds all of the charges that have been applied to this account. The payment amount must be subtracted because all amounts are stored as positive values. Here's the query: SELECT A.acc_id, SUM(C.amount)-SUM(P.amount) AS balance FROM accounts AS A INNER JOIN _dummy AS D ON (D.num < 2) LEFT JOIN payments AS P ON (D.num = 0 AND A.acc_id = P.acc_id) LEFT JOIN charges AS C ON (D.num = 1 AND A.acc_id = C.acc_id) GROUP BY A.acc_id As you can see, grouping can be used to create complex and useful queries. The example just shown is only a small example of what can be done. ========== Conclusion ========== Eventhough we still wait for the anticipated release of MySQL 4.0, we need not live without UNION statements until then. I hope this is of some use to somebody. I know that when I came up with this idea that it has helped me in many situations. Realize that the dummy table and this method can be employed in many ways to create very complex queries and would have been impossible without it. If you have any questions, I would be happy to answer them as far as possible. My name is Michael Bailey and you can reach me at mpbailey@byu.edu