Handling duplicate rows

 

When SQL evaluates a select-statement, several rows might qualify to be in the result table, depending on the number of rows that satisfy the search condition of the select-statement. Some of the rows in the result table might be duplicate.

You can specify that you do not want any duplicates by using the DISTINCT keyword, followed by the list of expressions:

SELECT DISTINCT JOB, SEX …

DISTINCT means that you want to select only the unique rows. If a selected row duplicates another row in the result table, the duplicate row is ignored (it is not put into the result table). For example, suppose you want a list of employee job codes. You do not need to know which employee has what job code. Because it is probable that several people in a department have the same job code, you can use DISTINCT to ensure that the result table has only unique values.

The following example shows how to do this:

  SELECT DISTINCT JOB       FROM CORPDATA.EMPLOYEE       WHERE WORKDEPT = 'D11'

The result is two rows.

JOB
DESIGNER
MANAGER

If you do not include DISTINCT in a SELECT clause, you might find duplicate rows in your result, because SQL returns the JOB column's value for each row that satisfies the search condition. Null values are treated as duplicate rows for DISTINCT.

If you include DISTINCT in a SELECT clause and you also include a shared-weight sort sequence, fewer values might be returned. The sort sequence causes values that contain the same characters to be weighted the same. If 'MGR', 'Mgr', and 'mgr' are all in the same table, only one of these values is returned.

 

Parent topic:

Retrieving data using the SELECT statement

 

Related concepts


Sort sequences and normalization in SQL