Helping Notes: 1977220 - How to handle HANA Alert 54: Savepoint Duration

Symptom

You receive alerts on Savepoint Duration (Alert ID 54).
There may be overall performance issues in the HANA system with the response time of dialog queries and runtime of jobs, long running backups, temporarily a high percentage of blocked transactions (Alert ID 39) or general performance issues.

Cause

The duration of savepoints is exceeding the configured thresholds due to performance issues in your SAP HANA System. The default configuration for this alert is as follows:

Severity	Low	Medium	High
Threshold	5 min	10 min	15 min

Resolution

Procedure

Check the Alert Status with regard to performance alerts indicating CPU or IO related issues.
Get details on the savepoint duration using the SQL statement HANA_IO_Savepoints_Rev71+ (HANA_IO_Savepoints) from KBA 1969700 - SQL Statement Collection for SAP HANA.
Analyze IO Issues.
Analyze CPU usage related issues.
Perform further technical performance analysis.

Path

SAP HANA Studio - SAP HANA Administration Console - Alerts.
SAP Solution Manager - Transaction SM_WORKCENTER - Technical Monitoring Workcenter - System Monitoring.
SAP HANA Studio - SAP HANA Administration Console - SQL Console.

How-to

1. Check Alert Status with regard to performance alerts indicating CPU or IO related issues

Check in particular for the following performance alerts:

Layer	Alert ID	Alert Name	Issue
SAP HANA	5	Host CPU Usage	high CPU usage
	60	Sync/async read ratio	IO issue (storage, network)
	61	Sync/async write ratio	IO issue (storage, network)

If you operate your system with SAP Solution Manager, then check for the following alerts:

Layer	Alert Name	Issue
SAP HANA	Savepoint critical phase duration	If red then the reported alert on savepoint duration is critical
Host	Disk I/O - Average Wait time per disk	IO issue (storage)
	CPU - CPU Utilization	high CPU usage
	CPU - I/O Wait	CPU I/O performance

2. Get details on savepoint duration using the SQL statement HANA_IO_Savepoints_Rev71+ (HANA_IO_Savepoints)
from KBA 1969700 - SQL Statement Collection for SAP HANA

Execute the SQL Statement HANA_IO_Savepoints_Rev71+ (HANA_IO_Savepoints when your HANA revision is lower than 71) in the SQL Console of SAP HANA Studio.

SELECT
START_TIME,
CRIT_PHASE_START_TIME,
HOST,
PORT,
SAVEPOINTS,
SUM_DURATION_S,
MAX_DURATION_S,
AVG_DURATION_S,
MAX_CRITICAL_PHASE_S,
AVG_CRITICAL_PHASE_S,
SUM_SIZE_MB,
AVG_SIZE_MB,
RS_SIZE_PCT
FROM
( SELECT
    CASE WHEN NUM_SAVEPOINTS = 1 THEN CRIT_PHASE_START_TIME ELSE 'various' END CRIT_PHASE_START_TIME,
    START_TIME,
    HOST,
    LPAD(PORT, 5) PORT,
    LPAD(NUM_SAVEPOINTS, 10) SAVEPOINTS,
    LPAD(TO_DECIMAL(SUM_DURATION_S, 12, 2), 14) SUM_DURATION_S,
    LPAD(TO_DECIMAL(MAX_DURATION_S, 12, 2), 14) MAX_DURATION_S,
    LPAD(TO_DECIMAL(AVG_DURATION_S, 12, 2), 14) AVG_DURATION_S,
    LPAD(TO_DECIMAL(MAX_CRITICAL_PHASE_S, 12, 2), 20) MAX_CRITICAL_PHASE_S,
    LPAD(TO_DECIMAL(AVG_CRITICAL_PHASE_S, 12, 2), 20) AVG_CRITICAL_PHASE_S,
    LPAD(ROUND(SUM_SIZE_MB), 11) SUM_SIZE_MB,
    LPAD(ROUND(AVG_SIZE_MB), 11) AVG_SIZE_MB,
    LPAD(TO_DECIMAL(RS_SIZE_PCT, 12, 2), 11) RS_SIZE_PCT
FROM
( SELECT
      CASE WHEN BI.AGGREGATE_BY = 'NONE' OR INSTR(BI.AGGREGATE_BY, 'TIME') != 0 THEN TO_CHAR(SP.START_TIME, BI.TIME_AGGREGATE_BY) ELSE 'any'                             END START_TIME,
      CASE WHEN BI.AGGREGATE_BY = 'NONE' OR INSTR(BI.AGGREGATE_BY, 'HOST') != 0 THEN SP.HOST                                      ELSE MAP(BI.HOST, '%', 'any', BI.HOST) END HOST,
      CASE WHEN BI.AGGREGATE_BY = 'NONE' OR INSTR(BI.AGGREGATE_BY, 'PORT') != 0 THEN TO_CHAR(SP.PORT)                             ELSE MAP(BI.PORT, '%', 'any', BI.PORT) END PORT,
      TO_CHAR(MIN(SP.CRITICAL_PHASE_START_TIME), 'YYYY/MM/DD HH24:MI:SS') CRIT_PHASE_START_TIME,
      COUNT(*) NUM_SAVEPOINTS,
      SUM(SP.DURATION) / 1000000 SUM_DURATION_S,
      MAX(SP.DURATION) / 1000000 MAX_DURATION_S,
      AVG(SP.DURATION) / 1000000 AVG_DURATION_S,
      MAX(SP.CRITICAL_PHASE_DURATION + SP.CRITICAL_PHASE_WAIT_TIME) / 1000000 MAX_CRITICAL_PHASE_S,
      AVG(SP.CRITICAL_PHASE_DURATION + SP.CRITICAL_PHASE_WAIT_TIME) / 1000000 AVG_CRITICAL_PHASE_S,
      SUM(SP.TOTAL_SIZE) / 1024 / 1024 SUM_SIZE_MB,
      AVG(SP.TOTAL_SIZE) / 1024 / 1024 AVG_SIZE_MB,
      AVG(MAP(SP.TOTAL_SIZE, 0, 0, SP.FLUSHED_ROWSTORE_SIZE / SP.TOTAL_SIZE * 100)) RS_SIZE_PCT
    FROM
    ( SELECT
        HOST,
        PORT,
        BEGIN_TIME,
        END_TIME,
        MIN_CRITICAL_PHASE_DURATION_S,
        MIN_SAVEPOINT_DURATION_S,
        DATA_SOURCE,
        AGGREGATE_BY,
        MAP(TIME_AGGREGATE_BY,
          'NONE',        'YYYY/MM/DD HH24:MI:SS:FF7',
          'HOUR',        'YYYY/MM/DD HH24',
          'DAY',         'YYYY/MM/DD (DY)',
          'HOUR_OF_DAY', 'HH24',
          TIME_AGGREGATE_BY ) TIME_AGGREGATE_BY
      FROM
      ( SELECT                                                         /* Modification section */
          '%' HOST,
          '%' PORT,
          TO_TIMESTAMP('1000/01/12 12:00:00', 'YYYY/MM/DD HH24:MI:SS') BEGIN_TIME,
          TO_TIMESTAMP('9999/01/13 12:00:00', 'YYYY/MM/DD HH24:MI:SS') END_TIME,
          -1 MIN_CRITICAL_PHASE_DURATION_S,
          -1 MIN_SAVEPOINT_DURATION_S,
          'HISTORY' DATA_SOURCE,                        /* CURRENT, HISTORY */
          'NONE' AGGREGATE_BY,         /* HOST, PORT, TIME or comma separated combinations, NONE for no aggregation */
          'NONE' TIME_AGGREGATE_BY     /* HOUR, DAY, HOUR_OF_DAY or database time pattern, NONE for no aggregation */
        FROM
          DUMMY
      )
    ) BI,
    ( SELECT
        'CURRENT' DATA_SOURCE,
        HOST,
        PORT,
        START_TIME,
        CRITICAL_PHASE_START_TIME,
        DURATION,
        CRITICAL_PHASE_DURATION,
        CRITICAL_PHASE_WAIT_TIME,
        TOTAL_SIZE,
        FLUSHED_ROWSTORE_SIZE
      FROM
        M_SAVEPOINTS
      UNION ALL
      SELECT
        'HISTORY' DATA_SOURCE,
        HOST,
        PORT,
        START_TIME,
        START_TIME CRITICAL_PHASE_START_TIME,          /* not available in history */
        DURATION,
        CRITICAL_PHASE_DURATION,
        0 CRITICAL_PHASE_WAIT_TIME,                    /* not available in history */
        TOTAL_SIZE,
        FLUSHED_ROWSTORE_SIZE
      FROM
        _SYS_STATISTICS.HOST_SAVEPOINTS
    ) SP
    WHERE
      SP.HOST LIKE BI.HOST AND
      TO_CHAR(SP.PORT) LIKE BI.PORT AND
      SP.START_TIME BETWEEN BI.BEGIN_TIME AND BI.END_TIME AND
      ( BI.MIN_CRITICAL_PHASE_DURATION_S = -1 OR
        (SP.CRITICAL_PHASE_DURATION) / 1000000 >= BI.MIN_CRITICAL_PHASE_DURATION_S ) AND
      ( BI.MIN_SAVEPOINT_DURATION_S = -1 OR
        (SP.DURATION) / 1000000 >= BI.MIN_SAVEPOINT_DURATION_S ) AND
      SP.DATA_SOURCE LIKE BI.DATA_SOURCE
    GROUP BY
      CASE WHEN BI.AGGREGATE_BY = 'NONE' OR INSTR(BI.AGGREGATE_BY, 'TIME') != 0 THEN TO_CHAR(SP.START_TIME, BI.TIME_AGGREGATE_BY) ELSE 'any'                             END,
      CASE WHEN BI.AGGREGATE_BY = 'NONE' OR INSTR(BI.AGGREGATE_BY, 'HOST') != 0 THEN SP.HOST                                      ELSE MAP(BI.HOST, '%', 'any', BI.HOST) END,
      CASE WHEN BI.AGGREGATE_BY = 'NONE' OR INSTR(BI.AGGREGATE_BY, 'PORT') != 0 THEN TO_CHAR(SP.PORT)                             ELSE MAP(BI.PORT, '%', 'any', BI.PORT) END
)
)
ORDER BY
START_TIME DESC,
HOST,
PORT

[OUTPUT PARAMETERS]
- START_TIME:            Start time of considered time interval
- CRIT_PHASE_START_TIME: Start time of (first) critical phase
- HOST:                  Host name
- PORT:                  Port
- SAVEPOINTS:            Number of savepoints
- SUM_DURATION_S:        Total savepoint duration (in s)
- MAX_DURATION_S:        Maximum savepoint duration (in s)
- AVG_DURATION_S:        Average savepoint duration (in s)
- MAX_CRITICAL_PHASE_S: Maximum duration of critical path (in s)
- AVG_CRITICAL_PHASE_S: Average duration of critical path (in s)
- SUM_SIZE_MB:           Total savepoint write size (in MB)
- AVG_SIZE_MB:           Average savepoint write size (in MB)
- RS_SIZE_PCT:           Flushed data related to rowstore compared to overall flushed data (in %)
[EXAMPLE OUTPUT]

If critical savepoint phases repeatedly take more than 2 seconds than the issue should be handled with high priority.

3. Analyze IO Issues

If the alert status suggests IO related issues, then follow the analysis recommendations outlined in KBA 1999930 - FAQ: SAP HANA I/O Analysis.

4. Analyze CPU usage related issues

If the alert status suggests CPU usage related issues, then follow the analysis recommendations outlined in KBA 1909670 - How to Handle Alert ‘Check host CPU utilization’.

5. Perform further technical performance analysis

If further technical performance analysis from HANA side seems to be required, then create and proceed a Technical Performance Analysis Session from SAP Engagemenmt and Service Delivery workcenter in your SAP Solution Manager.
If further assistance is required create an SAP incident on component SV-BO-DB-HAN and attach the documentation of the performed analysis results.

Helping Notes

Monday, February 1, 2016

1977220 - How to handle HANA Alert 54: Savepoint Duration

Procedure

Path

How-to

1. Check Alert Status with regard to performance alerts indicating CPU or IO related issues

2. Get details on savepoint duration using the SQL statement HANA_IO_Savepoints_Rev71+ (HANA_IO_Savepoints)
from KBA 1969700 - SQL Statement Collection for SAP HANA

3. Analyze IO Issues

4. Analyze CPU usage related issues

5. Perform further technical performance analysis

No comments:

Post a Comment

Monday, February 1, 2016

1977220 - How to handle HANA Alert 54: Savepoint Duration

Procedure

Path

How-to

1. Check Alert Status with regard to performance alerts indicating CPU or IO related issues

2. Get details on savepoint duration using the SQL statement HANA_IO_Savepoints_Rev71+ (HANA_IO_Savepoints) from KBA 1969700 - SQL Statement Collection for SAP HANA

3. Analyze IO Issues

4. Analyze CPU usage related issues

5. Perform further technical performance analysis

No comments:

Post a Comment

2. Get details on savepoint duration using the SQL statement HANA_IO_Savepoints_Rev71+ (HANA_IO_Savepoints)
from KBA 1969700 - SQL Statement Collection for SAP HANA