Symptom
You receive alerts on Savepoint Duration (Alert ID 54).
There may be overall performance issues in the HANA system with the response time of dialog queries and runtime of jobs, long running backups, temporarily a high percentage of blocked transactions (Alert ID 39) or general performance issues.
There may be overall performance issues in the HANA system with the response time of dialog queries and runtime of jobs, long running backups, temporarily a high percentage of blocked transactions (Alert ID 39) or general performance issues.
Cause
The duration of savepoints is exceeding the configured thresholds due
to performance issues in your SAP HANA System. The default
configuration for this alert is as follows:
Severity | Low | Medium | High |
Threshold | 5 min | 10 min | 15 min |
Resolution
Procedure
- Check the Alert Status with regard to performance alerts indicating CPU or IO related issues.
- Get details on the savepoint duration using the SQL statement HANA_IO_Savepoints_Rev71+ (HANA_IO_Savepoints) from KBA 1969700 - SQL Statement Collection for SAP HANA.
- Analyze IO Issues.
- Analyze CPU usage related issues.
- Perform further technical performance analysis.
Path
- SAP HANA Studio - SAP HANA Administration Console - Alerts.
- SAP Solution Manager - Transaction SM_WORKCENTER - Technical Monitoring Workcenter - System Monitoring.
- SAP HANA Studio - SAP HANA Administration Console - SQL Console.
How-to
1. Check Alert Status with regard to performance alerts indicating CPU or IO related issues
Check in particular for the following performance alerts:Layer | Alert ID | Alert Name | Issue |
SAP HANA | 5 | Host CPU Usage | high CPU usage |
60 | Sync/async read ratio | IO issue (storage, network) | |
61 | Sync/async write ratio | IO issue (storage, network) |
Layer | Alert Name | Issue |
SAP HANA | Savepoint critical phase duration | If red then the reported alert on savepoint duration is critical |
Host | Disk I/O - Average Wait time per disk | IO issue (storage) |
CPU - CPU Utilization | high CPU usage | |
CPU - I/O Wait | CPU I/O performance |
2. Get details on savepoint duration using the SQL statement HANA_IO_Savepoints_Rev71+ (HANA_IO_Savepoints)
from KBA 1969700 - SQL Statement Collection for SAP HANA
Execute the SQL Statement HANA_IO_Savepoints_Rev71+
(HANA_IO_Savepoints when your HANA revision is lower than 71) in the SQL
Console of SAP HANA Studio.SELECT START_TIME, CRIT_PHASE_START_TIME, HOST, PORT, SAVEPOINTS, SUM_DURATION_S, MAX_DURATION_S, AVG_DURATION_S, MAX_CRITICAL_PHASE_S, AVG_CRITICAL_PHASE_S, SUM_SIZE_MB, AVG_SIZE_MB, RS_SIZE_PCT FROM ( SELECT CASE WHEN NUM_SAVEPOINTS = 1 THEN CRIT_PHASE_START_TIME ELSE 'various' END CRIT_PHASE_START_TIME, START_TIME, HOST, LPAD(PORT, 5) PORT, LPAD(NUM_SAVEPOINTS, 10) SAVEPOINTS, LPAD(TO_DECIMAL(SUM_DURATION_S, 12, 2), 14) SUM_DURATION_S, LPAD(TO_DECIMAL(MAX_DURATION_S, 12, 2), 14) MAX_DURATION_S, LPAD(TO_DECIMAL(AVG_DURATION_S, 12, 2), 14) AVG_DURATION_S, LPAD(TO_DECIMAL(MAX_CRITICAL_PHASE_S, 12, 2), 20) MAX_CRITICAL_PHASE_S, LPAD(TO_DECIMAL(AVG_CRITICAL_PHASE_S, 12, 2), 20) AVG_CRITICAL_PHASE_S, LPAD(ROUND(SUM_SIZE_MB), 11) SUM_SIZE_MB, LPAD(ROUND(AVG_SIZE_MB), 11) AVG_SIZE_MB, LPAD(TO_DECIMAL(RS_SIZE_PCT, 12, 2), 11) RS_SIZE_PCT FROM ( SELECT CASE WHEN BI.AGGREGATE_BY = 'NONE' OR INSTR(BI.AGGREGATE_BY, 'TIME') != 0 THEN TO_CHAR(SP.START_TIME, BI.TIME_AGGREGATE_BY) ELSE 'any' END START_TIME, CASE WHEN BI.AGGREGATE_BY = 'NONE' OR INSTR(BI.AGGREGATE_BY, 'HOST') != 0 THEN SP.HOST ELSE MAP(BI.HOST, '%', 'any', BI.HOST) END HOST, CASE WHEN BI.AGGREGATE_BY = 'NONE' OR INSTR(BI.AGGREGATE_BY, 'PORT') != 0 THEN TO_CHAR(SP.PORT) ELSE MAP(BI.PORT, '%', 'any', BI.PORT) END PORT, TO_CHAR(MIN(SP.CRITICAL_PHASE_START_TIME), 'YYYY/MM/DD HH24:MI:SS') CRIT_PHASE_START_TIME, COUNT(*) NUM_SAVEPOINTS, SUM(SP.DURATION) / 1000000 SUM_DURATION_S, MAX(SP.DURATION) / 1000000 MAX_DURATION_S, AVG(SP.DURATION) / 1000000 AVG_DURATION_S, MAX(SP.CRITICAL_PHASE_DURATION + SP.CRITICAL_PHASE_WAIT_TIME) / 1000000 MAX_CRITICAL_PHASE_S, AVG(SP.CRITICAL_PHASE_DURATION + SP.CRITICAL_PHASE_WAIT_TIME) / 1000000 AVG_CRITICAL_PHASE_S, SUM(SP.TOTAL_SIZE) / 1024 / 1024 SUM_SIZE_MB, AVG(SP.TOTAL_SIZE) / 1024 / 1024 AVG_SIZE_MB, AVG(MAP(SP.TOTAL_SIZE, 0, 0, SP.FLUSHED_ROWSTORE_SIZE / SP.TOTAL_SIZE * 100)) RS_SIZE_PCT FROM ( SELECT HOST, PORT, BEGIN_TIME, END_TIME, MIN_CRITICAL_PHASE_DURATION_S, MIN_SAVEPOINT_DURATION_S, DATA_SOURCE, AGGREGATE_BY, MAP(TIME_AGGREGATE_BY, 'NONE', 'YYYY/MM/DD HH24:MI:SS:FF7', 'HOUR', 'YYYY/MM/DD HH24', 'DAY', 'YYYY/MM/DD (DY)', 'HOUR_OF_DAY', 'HH24', TIME_AGGREGATE_BY ) TIME_AGGREGATE_BY FROM ( SELECT /* Modification section */ '%' HOST, '%' PORT, TO_TIMESTAMP('1000/01/12 12:00:00', 'YYYY/MM/DD HH24:MI:SS') BEGIN_TIME, TO_TIMESTAMP('9999/01/13 12:00:00', 'YYYY/MM/DD HH24:MI:SS') END_TIME, -1 MIN_CRITICAL_PHASE_DURATION_S, -1 MIN_SAVEPOINT_DURATION_S, 'HISTORY' DATA_SOURCE, /* CURRENT, HISTORY */ 'NONE' AGGREGATE_BY, /* HOST, PORT, TIME or comma separated combinations, NONE for no aggregation */ 'NONE' TIME_AGGREGATE_BY /* HOUR, DAY, HOUR_OF_DAY or database time pattern, NONE for no aggregation */ FROM DUMMY ) ) BI, ( SELECT 'CURRENT' DATA_SOURCE, HOST, PORT, START_TIME, CRITICAL_PHASE_START_TIME, DURATION, CRITICAL_PHASE_DURATION, CRITICAL_PHASE_WAIT_TIME, TOTAL_SIZE, FLUSHED_ROWSTORE_SIZE FROM M_SAVEPOINTS UNION ALL SELECT 'HISTORY' DATA_SOURCE, HOST, PORT, START_TIME, START_TIME CRITICAL_PHASE_START_TIME, /* not available in history */ DURATION, CRITICAL_PHASE_DURATION, 0 CRITICAL_PHASE_WAIT_TIME, /* not available in history */ TOTAL_SIZE, FLUSHED_ROWSTORE_SIZE FROM _SYS_STATISTICS.HOST_SAVEPOINTS ) SP WHERE SP.HOST LIKE BI.HOST AND TO_CHAR(SP.PORT) LIKE BI.PORT AND SP.START_TIME BETWEEN BI.BEGIN_TIME AND BI.END_TIME AND ( BI.MIN_CRITICAL_PHASE_DURATION_S = -1 OR (SP.CRITICAL_PHASE_DURATION) / 1000000 >= BI.MIN_CRITICAL_PHASE_DURATION_S ) AND ( BI.MIN_SAVEPOINT_DURATION_S = -1 OR (SP.DURATION) / 1000000 >= BI.MIN_SAVEPOINT_DURATION_S ) AND SP.DATA_SOURCE LIKE BI.DATA_SOURCE GROUP BY CASE WHEN BI.AGGREGATE_BY = 'NONE' OR INSTR(BI.AGGREGATE_BY, 'TIME') != 0 THEN TO_CHAR(SP.START_TIME, BI.TIME_AGGREGATE_BY) ELSE 'any' END, CASE WHEN BI.AGGREGATE_BY = 'NONE' OR INSTR(BI.AGGREGATE_BY, 'HOST') != 0 THEN SP.HOST ELSE MAP(BI.HOST, '%', 'any', BI.HOST) END, CASE WHEN BI.AGGREGATE_BY = 'NONE' OR INSTR(BI.AGGREGATE_BY, 'PORT') != 0 THEN TO_CHAR(SP.PORT) ELSE MAP(BI.PORT, '%', 'any', BI.PORT) END ) ) ORDER BY START_TIME DESC, HOST, PORT |
- START_TIME: Start time of considered time interval
- CRIT_PHASE_START_TIME: Start time of (first) critical phase
- HOST: Host name
- PORT: Port
- SAVEPOINTS: Number of savepoints
- SUM_DURATION_S: Total savepoint duration (in s)
- MAX_DURATION_S: Maximum savepoint duration (in s)
- AVG_DURATION_S: Average savepoint duration (in s)
- MAX_CRITICAL_PHASE_S: Maximum duration of critical path (in s)
- AVG_CRITICAL_PHASE_S: Average duration of critical path (in s)
- SUM_SIZE_MB: Total savepoint write size (in MB)
- AVG_SIZE_MB: Average savepoint write size (in MB)
- RS_SIZE_PCT: Flushed data related to rowstore compared to overall flushed data (in %)
[EXAMPLE OUTPUT]
If critical savepoint phases repeatedly take more than 2 seconds than the issue should be handled with high priority.
3. Analyze IO Issues
If the alert status suggests IO related issues, then follow the analysis recommendations outlined in KBA 1999930 - FAQ: SAP HANA I/O Analysis.4. Analyze CPU usage related issues
If the alert status suggests CPU usage related issues, then follow the analysis recommendations outlined in KBA 1909670 - How to Handle Alert ‘Check host CPU utilization’.5. Perform further technical performance analysis
If further technical performance analysis from HANA side seems to be required, then create and proceed a Technical Performance Analysis Session from SAP Engagemenmt and Service Delivery workcenter in your SAP Solution Manager.If further assistance is required create an SAP incident on component SV-BO-DB-HAN and attach the documentation of the performed analysis results.
See Also
The attachment of this KBA provides a more detailed document for
analyzing savepoint duration. While the document follows the approach
outlined in the KBA text, there are some minor deviations on detail
level (e.g. use of alternative SQL-Statements for retrieving information
from system views and statistics tables).
Further information is available in the SAP HANA Troubleshoouting and Performance Analysis Guide.
Further information is available in the SAP HANA Troubleshoouting and Performance Analysis Guide.
Keywords
HANA Alert
Savepoint
Operations Recommendation
#OpsRec-HANA
Savepoint
Operations Recommendation
#OpsRec-HANA
No comments:
Post a Comment