IBM Tivoli Directory Integrator
Delta feature for Iterator mode
Connectors in Iterator mode can generate Delta entries.
This feature uses the Delta Engine and the Delta Store to detect changes.
Delta Engine
The Delta Engine allows us to
read through a data source, and detect changes from the previous time
you did this. This way we can detect new entries, changed entries
and even deleted entries. For certain data sources (such as LDIF files
and LDAP servers), Tivoli Directory Integrator can even detect
if attributes and values within entries have been changed. We can
configure Delta settings on Connectors in Iterator mode only.
The
Delta Engine knows whether Entries or Attributes have been added, changed or deleted by keeping a local copy of each Entry in a persistent
store, which is part of the System Store. This local repository is
called a Delta Store and consists of Delta tables.
Each time the AssemblyLine is run, the Delta Engine compares the data
being iterated with its copy in the Delta Table. When a change is
detected the Connector returns a Delta Entry.
Do
not manually modify Delta Store tables. Otherwise, the Delta snapshot
information will become inconsistent, and the Delta Engine will fail.
In
versions earlier than IBM Tivoli Directory Integrator V6.1, snapshots written to the Delta Store during Delta engine processing
were committed immediately. As a result, the Delta engine would consider
a changed Entry as handled even though processing the AssemblyLine Flow section
failed. This limitation is addressed through the Commit parameter
on the Connector Delta tab. Setting this parameter controls when the
Delta engine commits snapshots taken of incoming data to the System
Store.
Unique Attribute name
In order for the Delta
mechanism to be able to uniquely identify each Entry, specify
a unique Attribute to use as a Delta key. The values of this attribute
must be unique in the used data source. We can specify the Delta
key in the Delta tab of the Connector, by entering or selecting an
attribute name in the Unique Attribute Name parameter. This attribute
must be found in the Input Map of the Iterator, and can either be
an attribute read from the connected system or a computed attribute
(using script in the Attribute Mapping).
We can also specify multiple
attributes by separating them with a plus sign ( + ):
LastName+FirstName+BirthDate
At
least one of the attributes specified in the Unique Attribute Name
parameter must contain a value. When several attributes are specified, their string values are concatenated into one string, which then becomes
the unique Delta identifier. Attributes with no values (for example, blank or NULL) are skipped when the Delta key is built for an Entry.
Delta Store
The Delta Store is physically located
in the System Store. It consist of one Delta Systable (DS) and one
or more Delta Tables. Each Delta Table is used for the Delta Store
of a different Iterator Connector with enabled Delta.
Although Delta
Store tables can be accessed with both the JDBC Connector and the
System Store Connector, it is unadvisable to change them without a
deep understanding of how these tables are structured and handled
by the Delta Engine.
Delta Table structure
Every Delta Table (DT)
contains information about each Entry processed by the Delta Engine
for a particular Connector. A Delta Systable (DS) maintains a list
of all Delta Tables currently in use by the Delta Store.
- Delta Systable - The Delta Systable (DS) contains information
about each Delta Table (DT) in the System Store. The purpose of the
DS is to maintain the sequence counter for each DT. The structure
for the DS is as follows:
Table 1. Delta Systable
structure
Column
| Type
| Description
|
---|
ID
| Varchar
| The DT identifier (name)
|
SequenceID
| Int
| The sequence ID from the last run
|
Version
| Int
| The DS version (1)
|
- Delta Table - Each Connector that requests a Delta store needs
to specify a unique Delta identifier to be associated with the Connector.
This identifier is also used as the name of the Delta Table in the
System Store. The Delta Table structure is as follows:
Table 2. Delta Table structure
Column
| Type
| Description
|
---|
ID
| Varchar
| The unique value identifying an Entry
|
SequenceID
| Int
| The sequence number for the Entry
|
Entry
| Long Varbinary
| The serialized Entry object
|
Delta process
Given the above Delta Store structure, the sequence number is used to determine which entries are no longer
part of the source data set. Every time an AssemblyLine is run the
sequence number for the Delta Table used in particular by the Connector
is read from the Delta Systable. Then it is incremented, and this
incremented value will be used for marking the updated entries during
the entire AssemblyLine execution.
The Delta Engine process works
in two passes.
- Read → Look up → Compare → Update → Set current SequenceID
- The Iterator reads entries from the input data source.
- The Delta process looks for corresponding Entry in the Delta Table
using the unique attribute's value.
- If a match is found the Delta process compares each Attribute
(and its values) to determine if there have been modifications to
the Entry. Based on the result from the comparison, the Delta Engine
returns Delta Entry tagged with the relevant operation codes: modify or unchanged:
- Modify Entry - the Entry that was read and the corresponding Entry
from the Delta Table are considered different; the Entry is updated
in the Delta Table
- Unchanged Entry - the Entry that was read and the corresponding
Entry from the Delta Table are considered equal.
- If a match is not found in the Delta Table the Entry is treated
as new:
- Add Entry - the Entry is added to the Delta Table.
- In both case c. and d. the sequence number value in the Delta
table is updated with the sequence number used for the current AssemblyLine
execution.
- Check for data with (SequenceID < current SequenceID) → Mark
as Deleted
Once End of Data is reached by the Iterator, the Delta
Engine makes a second pass through the Delta Table looking for those
entries not accessed during the first pass. These Entries are easily
recognized because their sequence number is not updated with the current
sequence number. Therefore any Entries in the Delta Table with a sequence
number lower than the current sequence number are considered to be
deleted entries and are returned as deleted.
This pass happens
only when the iteration trough the input data completes successfully.
If for some reason an error occurs during that iteration, no Entries
will be tagged as deleted and returned by the AssemblyLine or removed
from the Delta Table. This will not affect the original data source
and the next time the AssemblyLine is executed successfully the deleted
Entries will be processed correctly.
Row Locking
This parameter
is available in the Delta tab for Iterator connectors and the Delta
Function Component configuration. It allows us to set the transaction
isolation level used by the connection established to the Delta Store
database. Setting a higher isolation level reduces the transaction
anomalies known as 'dirty reads', 'non-repeatable reads' and 'phantom
reads' by using row and table locks. This parameter has the following
values:
- READ_UNCOMMITTED
- Corresponds to java.sql.Connection.TRANSACTION_READ_UNCOMMITTED;
indicates that dirty reads, non-repeatable reads and phantom reads
can occur. This level allows a row changed by one transaction to be
read by another transaction before any changes in that row have been
committed (a "dirty read"). If any of the changes are rolled back, the second transaction will have retrieved an invalid row.
- READ_COMMITTED
- Corresponds to java.sql.Connection.TRANSACTION_READ_COMMITTED;
indicates that dirty reads are prevented; non-repeatable reads and
phantom reads can occur. This level only prohibits a transaction from
reading a row with uncommitted changes in it.
- REPEATABLE_READ
- Corresponds to java.sql.Connection.TRANSACTION_REPEATABLE_READ;
indicates that dirty reads and non-repeatable reads are prevented;
phantom reads can occur. This level prohibits a transaction from reading
a row with uncommitted changes in it, and it also prohibits the situation
where one transaction reads a row, a second transaction alters the
row, and the first transaction rereads the row, getting different
values the second time (a "non-repeatable read").
- SERIALIZABLE
- Corresponds to java.sql.Connection.TRANSACTION_SERIALIZABLE;
indicates that dirty reads, non-repeatable reads and phantom reads
are prevented. This level includes the prohibitions in TRANSACTION_REPEATABLE_READ
and further prohibits the situation where one transaction reads all
rows that satisfy a WHERE condition, a second transaction inserts
a row that satisfies that WHERE condition, and the first transaction
rereads for the same condition, retrieving the additional "phantom"
row in the second read. This is generally the slowest but safest option, and the default value for the Row Locking parameter.
For more information about transaction isolation levels, see the online documentation of the java.sql.Connection interface: http://java.sun.com/j2se/1.5.0/docs/api/java/sql/Connection.html.
Each
database server sets a default transaction isolation level; the default
value for Derby, Oracle and MS SQL Server is TRANSACTION_READ_COMMITTED.
However, the default value of the Row Locking parameter
of SERIALIZABLE will override this when using a Delta component (that
is, the Delta functionality in Iterator Connectors or the Delta Function
Component).
Some database servers may not support all transaction
isolation levels, therefore please refer to the specific database
documentation for accurate information about supported transaction
isolation levels.
Transaction isolation levels are maintained
by the database server itself for every connection established to
the database. Therefore when a Delta component (with Transaction
isolation level set to REPEATABLE_READ or SERIALIZABLE
and the Commit parameter set to On
Connector Close starts its transaction, all other queries
trying to modify the same data will be blocked. This means that other
components which need to modify the same data will have to wait until
the first component commits its transaction on termination. This waiting
may cause the issued SQL queries to timeout and leave the data unmodified.
Also
when a component has the Commit parameter set
to No autocommit we should manually commit
the transactions in such manner that other components will not wait
forever to perform a modification.
Detect or ignore changes only
in specific attributes
The parameters Attribute List and Change
Detection Mode configure the ability of the Delta Engine
to detect changes only in specific attributes instead of in all received
attributes.
The Attribute List parameter is
a list of comma separated attributes which will be affected by Change
Detection Mode. This Change Detection Mode parameter
specifies how changes in these attributes will be handled. It has
three values:
- IGNORE_ATTRIBUTES
- (“Ignore changes for the following Attributes”) - Changes in every
attribute specified in the Attribute List parameter
will be ignored during the compute changes process.
- DETECT_ATTRIBUTES
- (“Detect changes for the following Attributes”) - This option
has the opposite effect - the only detected changes will be in the
attributes listed in the Attribute List parameter.
- DETECT_ALL
- (“Use all Attributes for change detection”) - This instructs the
Delta Engine to detect changes in all attributes. When this option
is selected the Attribute List parameter is disabled
since no list of affected attributes is needed.
Example use case
When using the
Delta Engine, sometimes the received entries contain attributes that
you consider as not important and wish to ignore. In such cases, these
attribute must not affect the result of the Delta computation, as
when several Entries differentiate only by these attribute it leads
to unnecessary updates of the Delta Store table.
The solution
for this case is using the Attribute List and Change
Detection Mode parameters
Here is an example scenario
where two AssemblyLines are receiving changelog entries from two replicas
of a LDAP server and these changes are applied to one Delta Store.
To illustrate this we will use the following example changelog entries:
Entry1:
Entry attributes:
targetdn (replace): 'cn=Niki,o=IBM,c=us'
changetime (replace): '20071015094646'
$dn (replace): 'changenumber=78955,cn=changelog'
ibm-changeInitiatorsName (replace): 'CN=ROOT'
changenumber (replace): '78955'
objectclass (replace): 'top' 'changelogentry' 'ibm-changelog'
changetype (replace): 'modify'
cn (replace): 'Niki' 'Niky'
changes (replace): 'replace: cn
cn: Niki
cn: Niky
-
'
Entry2:
Entry attributes:
targetdn (replace): 'cn=Niki,o=IBM,c=us'
changetime (replace): '20071015094817'
$dn (replace): 'changenumber=10076,cn=changelog'
ibm-changeInitiatorsName (replace): 'CN=ROOT'
changenumber (replace): '10076'
objectclass (replace): 'top' 'changelogentry' 'ibm-changelog'
changetype (replace): 'modify'
cn (replace): 'Niki' 'Nikolai'
changes (replace): 'replace: cn
cn: Niki
cn: Nikolai
-
'
Entry3:
Entry attributes:
targetdn (replace): 'cn=Niki,o=IBM,c=us'
changetime (replace): '20071037454817'
$dn (replace): 'changenumber=112,cn=changelog'
ibm-changeInitiatorsName (replace): 'CN=ADMIN'
changenumber (replace): '112'
objectclass (replace): 'top' 'changelogentry' 'ibm-changelog'
changetype (replace): 'modify'
cn (replace): 'Niki' 'Nikolai'
changes (replace): 'replace: cn
cn: Niki
cn: Nikolai
-
'
Modified attributes are marked in bold and
attributes that can be ignored are marked in italics. The ignored
attributes (such as changenumber, changetime, and so forth) will not
be considered when comparing the received Entry with the stored Entry.
Therefore these attributes have to be listed in the Attribute
List parameter. In order to specify that we want to ignore
them the Change Detection Mode parameter needs
to be set to Ignore changes for the following Attributes.
This
is the workflow when the AssemblyLines receive the entries:
- When AL1 receives Entry1, it will be returned as modify and
saved in the Delta Store table.
- When AL2 receives Entry2 , its changetime, $dn, bm-changeInitiatorsName, changenumber attributes are modified but will be ignored. However
the cn and changes attributes are also modified and therefore the
resulted Delta Entry will be tagged as modify and saved
in the Delta Store table.
- When AL2 receives Entry3, its changetime, $dn, bm-changeInitiatorsName, changenumber attributes are modified but will be ignored. The rest
of the attributes are equal so the resulted Delta Entry will be tagged
as unchanged and will be returned to the AssemblyLine
(only if the Return unchanged parameter is checked)
or skipped. The returned Delta Entry will be identical to the received
Entry3. In this case the Delta Store is not updated. If the Attribute
List and Change Detection Mode parameter were not used, the last Entry3
would have been tagged as modify and saved in the Delta
Store.
Parent topic: Producing Delta Entries