Tips for using VARCHAR and VARGRAPHIC data types in databases
Variable-length column (VARCHAR or VARGRAPHIC) support allows you to define any number of columns in a table as variable length. If you use VARCHAR or VARGRAPHIC support, the size of a table can typically be reduced.
Data in a variable-length column is stored internally in two areas: a fixed-length or ALLOCATE area and an overflow area. If a default value is specified, the allocated length is at least as large as the value. The following points help you determine the best way to use your storage area.
When you define a table with variable-length data, decide the width of the ALLOCATE area. If the primary goal is:
- Space saving: use ALLOCATE(0).
- Performance: the ALLOCATE area should be wide enough to incorporate at least 90% to 95% of the values for the column.
It is possible to balance space savings and performance. In the following example of an electronic telephone book, the following data is used:
- 8600 names that are identified by: last, first, and middle name
- The Last, First, and Middle columns are variable length.
- The shortest last name is 2 characters; the longest is 22 characters.
This example shows how space can be saved by using variable-length columns. The fixed-length column table uses the most space. The table with the carefully calculated allocate sizes uses less disk space. The table that was defined with no allocate size (with all of the data stored in the overflow area) uses the least disk space.
Variety of Support Last Name Max/Alloc First Name Max/Alloc Middle Name Max/Alloc Total Physical File Size Number of Rows in Overflow Space Fixed Length 22 22 22 567 K 0 Variable Length 40/10 40/10 40/7 408 K 73 Variable-Length Default 40/0 40/0 40/0 373 K 8600 In many applications, performance must be considered. If you use the default ALLOCATE(0), it will double the disk unit traffic. ALLOCATE(0) requires two reads; one to read the fixed-length portion of the row and one to read the overflow space. The variable-length implementation, with the carefully chosen ALLOCATE, minimizes overflow and space and maximizes performance. The size of the table is 28% smaller than the fixed-length implementation. Because 1% of rows are in the overflow area, the access requiring two reads is minimized. The variable-length implementation performs about the same as the fixed-length implementation.
To create the table using the ALLOCATE keyword:
CREATE TABLE PHONEDIR (LAST VARCHAR(40) ALLOCATE(10), FIRST VARCHAR(40) ALLOCATE(10), MIDDLE VARCHAR(40) ALLOCATE(7))If you are using host variables to insert or update variable-length columns, the host variables should be variable length. Because blanks are not truncated from fixed-length host variables, using fixed-length host variables can cause more rows to spill into the overflow space. This increases the size of the table.
In this example, fixed-length host variables are used to insert a row into a table:
01 LAST-NAME PIC X(40). … MOVE "SMITH" TO LAST-NAME. EXEC SQL INSERT INTO PHONEDIR VALUES(:LAST-NAME, :FIRST-NAME, :MIDDLE-NAME, :PHONE) END-EXEC.The host-variable LAST-NAME is not variable length. The string “SMITH”, followed by 35 blanks, is inserted into the VARCHAR column LAST. The value is longer than the allocate size of 10. Thirty of thirty-five trailing blanks are in the overflow area.
In this example, variable-length host variables are used to insert a row into a table:
01 VLAST-NAME. 49 LAST-NAME-LEN PIC S9(4) BINARY. 49 LAST-NAME-DATA PIC X(40). … MOVE "SMITH" TO LAST-NAME-DATA. MOVE 5 TO LAST-NAME-LEN. EXEC SQL INSERT INTO PHONEDIR VALUES(:VLAST-NAME, :VFIRST-NAME, :VMIDDLE-NAME, :PHONE) END-EXEC.The host variable VLAST-NAME is variable length. The actual length of the data is set to 5. The value is shorter than the allocated length. It can be placed in the fixed portion of the column.
Running the Reorganize Physical File Member (RGZPFM) command against tables that contain variable-length columns can improve performance. The fragments in the overflow area that are not in use are compacted by the Reorganize Physical File Member (RGZPFM) command. This reduces the read time for rows that overflow, increases the locality of reference, and produces optimal order for serial batch processing.
Choose the appropriate maximum length for variable-length columns. Selecting lengths that are too long increases the process access group (PAG). A large PAG slows performance. A large maximum length makes SEQONLY(*YES) less effective. Variable-length columns longer than 2000 bytes are not eligible as key columns.
Using LOBs and VARCHAR in the same table
Storage for LOB columns allocated in the same manner as VARCHAR columns. When a column stored in the overflow storage area is referenced, currently all of the columns in that area are paged into memory. A reference to a "smaller" VARCHAR column that is in the overflow area can potentially force extra paging of LOB columns. For example, A VARCHAR(256) column retrieved by application has side-effect of paging in two 5 MB BLOB columns that are in the same row. In order to prevent this, you may want to use ALLOCATE keyword to ensure that only LOB columns are stored in the overflow area.
Parent topic:
General DB2 UDB for iSeries performance considerations
Related information
Reorganize Physical File Member (RGZPFM) command
Reorganizing a physical file
Embedded SQL programming