Recovering a corrupted search index after an unexpected failure

Recovering a corrupted search index after an unexpected failure
We can recover a corrupted, failed, damaged, or inaccessible search index if an unexpected hardware or software failure occurs. The search index can become corrupted from the following sample scenarios:

An indexing task was interrupted, causing a partially built index to be replicated to production.
A loss of network connectivity during index replication, causing one or more of the index data segment files to be corrupted.
Running out of file descriptors or storage capacity during indexing or replication, causing the entire index to be deleted.

To determine what to back up and restore in an event of an index corruption, we must understand which components are involved and how data flows through each component. A repeater is typically used in a production environment to perform indexing, where catalog data is read directly from the production database. After the repeater finishes indexing, members from the search subordinate server cluster pull the changes from the repeater and locally replicate the update on each subordinate server. A local copy of the index exists on each subordinate server in the solrhome location. The Solr home contains all the search server-related files. Therefore, it is important that your backup strategy copies everything under the Solr home of the repeater. That is, the master copy of the index in production.
If any index issues occur in the production system, the corrupted index can either be restored from a backup or rebuilt. IBM recommends that you set up a recurring task to copy your indexes to an alternative storage location at a regular time interval, or right before each reindexing occurs. This practice dramatically reduces index restoration time. For example, if an index becomes corrupted, we can restore the most recently known working version of the search index onto the repeater and set it as the current index. The subordinate servers will automatically be able to synchronize themselves with the restored index version.
In general, it is strongly recommended to keep at least one copy of the most recent working search index. This backup copy should be kept current and refreshed each time a change is made to the search index. Then, in an event of an index failure or corruption, restoring from the recent backup is quick and effective to bring the site back up and running. An index backup is simply a copy of the index data files on the file system. The best time to create a backup is right after the final indexing is complete, and a quick sanity and integrity test has been performed. It is optional but beneficial to keep multiple successive backups, so that you have more flexibility when rolling back to a prior version of the search index.
To restore the search index to a prior version, complete the following task: Backing up a WebSphere Commerce Search index. Although the indexed data from the selected backup might be slightly out of date, the site can be brought back up with minimal down time. While the site is up and running again on the restored backup, we can investigate the root cause and perform additional recovery plans, such as retrying to build the corrupted search index on the same or another indexing server.

Procedure
Backup Solr home, so that we can capture a working search environment, including search indexes.
Determine the locations of Solr home to back up, based on the environment:

Option Description
Staging environment

When indexing with staging propagation, business users apply changes to a staging area, which is later propagated into the production environment by IT administrators. An index repeater is used to capture the most recent deployed index content, while also serving as a backup.

Back up the Solr index home directory.
Determine the backup schedule based on when business users are not making thorough changes to be propagated into production.
Check for problematic search indexes by running the CheckIndex tool.
CheckIndex is a tool available in the Lucene library. It checks the files and creates new segments that do not contain problematic entries. Optionally, the CheckIndex tool is able to repair a broken index with little loss of data, so that you do not have to restore the index from a backup, or perform a full indexing of all documents that are stored in Solr.
Go to the directory containing the Lucene library files.
Run the following command:
java -cp .\lucene-core-5.5.4.jar org.apache.jar org.apache.lucene.index.CheckIndex lucene_indexdir
Where lucene_indexdir is the full path to the Lucene index data directory.
If we can identify a problematic search index, we can back up and rollback search indexes based on snapshots.
To do so, follow the steps in Backing up a WebSphere Commerce Search index.
In cases of problematic search indexes, restore the backed up Solr home directory on the problem area.