Saturday, January 30, 2016

1977242 - How to handle HANA Alert 53: 'Pagedump files'

Symptom
This Knowledge Base Article (KBA) is part of a series of HANA Operations Recommendations.
Its focus is on providing best practice instruction on handling HANA Alerts related to Page Dump Files (Alert ID 53).
End-Users may experience issues related to file-system or disk inconsistencies or even HANA Database inconsistencies.


Environment
SAP HANA 1.0 revision 81 or higher


Cause
This alert is triggered when a page dump is written. When HANA reads a page and the reading fails due to a wrong checksum, the read action is repeated once and a page dump is written.
  1. If the repeat succeeds, the system will continue its operation - yet this kind of problem indicates there might be problems affecting the underlying file system or disks. Additional investigations as outlined below are needed.
  2. If the repeat fails for a second time, the system will crash.


Resolution

PROCEDURE

  1. Check Alerts
  2. Check Repeat and File System Consistency
  3. Check HANA Consistency

PATH

  • SAP HANA Studio – SAP Hana Administration Console – Alerts
  • Solution Manager – DBACOCKPIT – Choose Hana System – Current Status – Alerts
  • SAP HANA Studio – SAP Hana Administration Console – Diagnosis Files – Fillter – Modified by time

HOW-TO

1.   Check Alerts
When checking the Alerts tab in HANA Studio or Solution Manager, there is an alert called “1 new pagedump files(s) found on host <HOSTNAME>”.
In HANA Studio, you will find the alert by going to Administration Console -> Alerts -> Show: All Alerts -> Filter: pagedump.
HANA_Studio.GIF
In Solution Manager you will find the alert by using transaction code DBACOCKPIT -> choose HANA system -> expand Current Status -> Alerts. Filters can be set on the Alert Name or Description. Note that the filter string in DBACOCKPIT should be *pagedump*.
Solman.GIF
Remember for this “Check for new page dump file” alert, it is auto-refreshed every 15 minutes, so the time stamp on the alert may not be the exact time when dump was created.
To check the exact time in HANA Studio, please go to Administration Console -> Diagnosis Files -> enter ‘pagedump’ in Filter -> sort the column Modified by time. The exact time can also be deduced from the Name of the page dump file.
HANA_Studio2.GIF

2.   Check Repeat and File System Consistency
As mentioned in the cause section, there are two different circumstances:
  a. Page dumps without a crash
When you see page dumps and the system did not crash, it means that the page read operation encounters a checksum mismatching but repeated the read successfully. This may be causes by potential problems of the underlying file-systems or disks.
Check file system consistency
In Unix and Unix-like operating systems, such as Linux and Mac OS X, there is a system utility which can be used to check the consistency of the file system:
'fsck' - the acronym for File System Check.
Generally, fsck is run automatically at boot time. There are two common triggers for automatically executing fsck. Either the operating system detects that a file system is in an inconsistent state (likely due to a non-graceful shutdown such as a crash or power loss), or after a certain number of times that the file system is mounted (to prevent small, undetected inconsistencies from becoming exacerbated).
A system administrator can also run fsck manually at any time.
Warning:
As running fsck to repair a file system which is mounted for read/write operations can potentially cause severe data corruption/loss, the file system is normally checked while unmounted, mounted read-only, or with the system in a special maintenance mode that limits the risk of such damage.
For more information about the usage of fsck, please refer to the linux manual page of "fsck".
If you are not confident about executing the system utility by yourself, we strongly recommend you consult your hardware partner about the page checksum mismatching situation.
  b. Page dumps with crash
If the system crashes after a page dump, it may indicate corrupted persistency. This might be caused by a HANA problem or a file-system problem:
Hardware issues
  • Disk (physical I/O errors, wrong data written to disks, bugs in controller firmware)
  • Network between database server and disks
  • Memory on database server (e.g. flipping bits)
 Software issues
  • Operating system on database server
  • SAP HANA
     
3. Check SAP HANA Consistency
If you already face symptoms that are indicating SAP HANA inconsistencies and you want to check if and to what extent corruptions exist, there are several build-in procedures, scripts and tools that might help:
  • Meta data: CHECK_CATALOG procedure
  • Row and column store: CHECK_TABLE_CONSISTENCY procedure
  • Column store: uniqueChecker.py script
  • Row store: checkRowStore.py script
  • Backups: hdbbackupcheck tool
  • Backups: hdbbackupdiag –check


See Also
Additional Support:
For more informoation about these technical consistency checks for SAP HANA DB, please refer to SAP Note 1977584.


Keywords

SAP, SAP HANA, KBA, Alert, Page Dump, Operation Recommendations, #OpsRec-HANA

No comments:

Post a Comment