Tuesday, November 1, 2016

1900257 - How to handle HANA Alert 43: 'Memory Usage of Services'

Symptom

When checking the Alerts tab in HANA Studio and Solution Manager, there is an alert called “indexserver (pid 45281) on host suse-hanajl01 is running out of memory! Used memory: 180500 MB, effective allocation limit for this service: 534219 MB”.
For HANA Studio, you would find the alert by going to Administration Console -> Alert -> Show: all alert.


For Solution Manager, you would find the alert by transaction DBACOCKPIT -> choose HANA system -> expand Current Status -> Alerts.


Remember for this “Check used memory against allocation limit” alert in Solution Manager, it is auto-refreshed every 15 minutes, so the time stamp on the alert may be not the exactly time when alert occurs.
To check the exact time, please go to Administration Console -> Alerts -> double click on the alerts.


Environment

  • SAP HANA DATABASE
  • BW on HANA

Cause

Normally there are 2 reasons for this alert:
  1. The threshold value of "Check used memory against allocation limit" in "Configure Check Settings" is improper. The default value is Low: 80, Medium: 90, High: 95.

  1. Out of memory (OOM) issue occurs on HANA DB
The other possible reasons for this current utilized percentage of main memory alert is OOM issue occurs on HANA system, including but not limited to
  • Execution of certain transactions
  • Data loading
  • Hanging of jobs
  • Long running SQL statements
They will generate a high workload on memory by either frequent access to database or joining between huge tables.
  1. Bug in some revisions
As we experienced in previous customer case, used memory cannot automatic released in some older revisions.  This bug would lead to memory issues and trigger the alert.

Resolution

  1. For the alert triggered due to improper threshold value setting, it could be resolved by reset it to default or more reasonable threshold value.


  1. For the alert triggered due to OOM issue

    1. Due to the various kinds of possible causes, the best way is raising a message to SAP to optimize the process. The component should be BC-DB-HDB, and open a remote connection to HANA DB.
    2. If we know which process runs out of memory, we could trace this process, and send the trace file to SAP for further analysis. The trace includes SQL trace and performance trace. Both of them could be done in the Trace Configuration tab.



For SQL trace, after changing it to “Active” and giving a new file name, click “Finish”.
For performance trace, give a new file name and a duration time, then click “Finish”.
    1. If we do not sure which process caused this alert, then we need try to figure out as the following steps.
- Go to Landscape tab to check which server encounters the high memory consumption issue. E.g: the memory consumption is very high in indexserver with Port 30003.


- Go to Performance -> Threads, check which threads are running. E.g: a SQL statement is executed in the figure.


- Running the SQL again, if the issue is reproduced, we could make sure that the statement is (one of) the main cause of the alert.
- Do the trace as previous mentioned, and enclose the trace files in the Customer Message.
    1. The trace files could be seen in Diagnosis Files tab
The SQL trace file can be opened by double clicks on it.
But for the performance trace file, if you want to read it, you need to use HDBAdmin tool to open it. Please refer to wiki page wiki HDBAdmin on Linux to execute the tool.


In the SQL trace file (richard0128.py), we could download it to local and open it with Notepad++, then find the SQL tracing info,

After you prepare your environment and execute HDBAdmin tool, we could load the .tpt file and read it.

Then we could filter the result with known info (e.g Service is indexserver, Method is SqlExecuteStatement, User is SYSTEM) and take a further analysis on the SQL statement.

Even if you could do it by yourself, we still recommends you to send the trace files to SAP unless you are very sure about how to analysis, optimize and configure in further steps.
    1. Before the statement analysis finished, we could try to unload the current unused huge tables from memory, to save the memory resource for the process execution.
The huge tables could be found in “System Information” -> “Schema Size (MB) for loaded tables” and “Size of tables on disk”.

After double click on them, we could sort the size column as the following figure shows, to justify if a table is in use, you can go to Performance -> Threads tab, to check if the table is used by any thread.

  1. If the revision of HANA DB is not the latest version, we strongly recommend to upgrade to the HANA DB to the latest version. If the current licensed memory cannot meet the workload requirements, you may need to increase it.

Keywords

allocation limit, Configure Check Settings, OOM, out of memory, threshold, Diagnosis Files, performance, HDBAdmin, licensed memory,

No comments:

Post a Comment