Wednesday, February 3, 2016

1931536 - HANA: Possible DB corruption with NFSv3 based scale out

Symptom
You run a distributed HANA installation with NFSv3 or IBRIX based storage subsystem and an STONITH based custom implementation of the HANA Storage Connector API scripts.


Other Terms
HANA, scale out, NFS, NFSv3, IBRIX, Storage Connector API, data corruption, persistence, file locking, failover


Reason and Prerequisites
You run a distributed HANA installation with NFSv3 or IBRIX based storage subsystem. To ensure HANA high availability, a custom implementation of the SAP HANA hardware partner of the HANA Storage Connector API is necessary for NFSv3 and IBRIX based systems that ensures reliable IO fencing and thus error-free auto-host failover. This is usually achieved with an STONITH based custom implementation that prevents two HANA nodes accessing the same HANA persistence during or after a failover.
If this IO fencing does not work reliably, corruption of the HANA persistence with data loss is possible, and for this reason, a failover should not be performed.
Because of incorrect error handling, with HANA revision 68 and earlier, a failover is carried out even if the high-availability script is not working correctly or reports that there is no a failover. If the currently active computer and the failover host are still active in this case, it is possible that the HANA services write to a database persistence from both hosts at the same time and therefore corrupts the persistence. The error exists in all SAP HANA DB revisions up to and including Revision 68.


Solution
Perform a HANA upgrade to revision 69 or higher. As of this revision, the system reacts to a failure of the high-availability script as expected, and there is no failover. This prevents corruption of the database.
Find out the latest version of the implementation from your hardware partner or the developer of the high-availability script and implement this.
Note:
  • If you cannot currently upgrade to revision 69 or higher, you must deactivate the high-availability mechanisms of the HANA database to prevent the possible corruption of HANA persistence in the event of a node failure. This is the only possible workaround for this problem.
You can deactivate the high-availability mechanisms as follows in the HANA Studio:
In the Administration panel choose "Configuration" on the "Landscape" tab. You see a list of the HANA nodes and their configured and current roles. Right click at the top on the Configure Hosts for Failover icon. Change the nameserver role MASTER2 and MASTER3 to SLAVE and save this change. Now stop all standby nodes in your landscape.
This deactivates the failover capability of your HANA installation. After an upgrade to revision 69 or higher, reactivate this functionality so that you have a high-availability HANA landscape once again.
  • Contact your SAP HANA hardware partner to evaluate the risk of how likely the problem is to occur. Discuss with this partner the circumstances in which the currently active host cannot be reached in the event of a failover and in which deactivation of this host is not ensured. To test this in a production system, it is essential for you to stop the SAP HANA database to prevent the problem being explicitly triggered through verification of the failover.




Header Data

Released On 08.11.2013 07:53:09
Release Status Released for Customer
Component HAN-DB SAP HANA Database
Priority Hot News
Category Program error

No comments:

Post a Comment