Symptom
You want to understand the impact of I/O operations on your SAP HANA database and determine potential bottlenecks.
Environment
SAP HANA
Cause
1. What is the definition of I/O in the context of this SAP Note?
2. Where do I find information about storage requirements in SAP HANA environments?
3. Why do I need to take care for I/O although SAP HANA is supposed to be an in-memory database?
4. Which indications exist that the SAP HANA database suffers from I/O accesses?
5. What can I check in order to understand if SAP HANA suffers from I/O?
6. What can I do if the SAP HANA related figures indicate bottlenecks in the I/O area?
2. Where do I find information about storage requirements in SAP HANA environments?
3. Why do I need to take care for I/O although SAP HANA is supposed to be an in-memory database?
4. Which indications exist that the SAP HANA database suffers from I/O accesses?
5. What can I check in order to understand if SAP HANA suffers from I/O?
6. What can I do if the SAP HANA related figures indicate bottlenecks in the I/O area?
Resolution
1. What is the definition of I/O in the context of this SAP Note?
I/O are input / output operations in various constellations, e.g. logical I/O when accessing memory, physical I/O when requesting blocks from the operating system respectively disk and network I/O when transferring data via network.This SAP Note covers the area of disk I/O. This includes all read and write requests sent to the operating system - regardless if the actual data is physically read from disk or found in a low-level cache.
2. Where do I find information about storage requirements in SAP HANA environments?
Attached to SAP Note 1900823 you can find a storage whitepaper describing SAP HANA storage requirements in detail.3. Why do I need to take care for I/O although SAP HANA is supposed to be an in-memory database?
It is correct that the majority of SAP HANA activities is purely performed in memory, but the following scenarios require I/O:Scenario | Details | Impact of slow I/O |
Row store load | As part of the SAP HANA startup the row store is loaded from disk into memory. This involves a significant amount of read I/O. | Increased SAP HANA startup times |
Column store table load | Also the tables located in the column store need to be loaded into memory before they are accessed. Unlike row store tables the loading doesn't take place during the startup. Instead it happens later on (e.g. during preloading or if the table is accessed the first time). As columns can be unloaded it is possible that the same columns are loaded into memory various times. | Partially increased database request times |
Hybrid LOBs | Large values of hybrid LOB columns are not loaded into the column store. Instead they are always retrieved from disk (or from the SAP HANA internal page cache). See SAP Note 1994962 for more information. | Reduced memory footprint, performance overhead when accessing the same data several times |
Savepoints, snapshots | During savepoints the information in memory is synchronized with the disks, so all modified information is flushed to disk involving write I/O to data files. Snapshots are savepoints that are kept for future use. | INSERT / UPDATE / DELETE statements are blocked during critical savepoint phase |
Commits | When a commit is executed, all changes performed by the transaction are written to the logs involving write I/O to the log area. | Increased commit times |
Backups | Backups copy data and log information in order to be able to restore the database in case of problems. | Long running backups If I/O is overloaded due to backup activities the overall system performance can suffer |
Persistence garbage collection | Persistence garbage collection cleans up no longer required relics on persistence level after a transactions are finished. See SAP Note 2169283 for more information related to garbage collection. | SAP HANA internal locking and performance regressions if persistence garbage collection isn't able to keep of with the transaction load. |
4. Which indications exist that the SAP HANA database suffers from I/O accesses?
The following SAP HANA alerts indicate problems in the I/O area:Alert | Name | SAP Note | Description |
28 | Most recent savepoint operation | 1977291 | Determines how long ago the last savepoint was defined, that is, how long ago a complete, consistent image of the database was persisted to disk. |
34 | Unavailable volumes | 1900682 | Determines whether or not all volumes are available |
53 | Pagedump files | 1977242 | Identifies new pagedump files that have been generated in the trace directory of the system. |
54 | Savepoint duration | 1977220 | Identifies long-running savepoint operations |
60 | Sync/Async read ratio | Identifies a bad trigger asynchronous read ratio. This means that asynchronous reads are blocking and behave almost like synchronous reads. This might have negative impact on SAP HANA I/O performance in certain scenarios. | |
61 | Sync/Async write ratio | Identifies a bad trigger asynchronous write ratio. This means that asynchronous writes are blocking and behave almost like synchronous writes. This might have negative impact on SAP HANA I/O performance in certain scenarios. |
Check ID | Details |
310 | I/O read throughput data min. (MB/s, last day) |
311 | I/O read throughput data avg. (MB/s, last day) |
315 | I/O write throughput data min. (MB/s, last day) |
316 | I/O write throughput data avg. (MB/s, last day) |
320 | I/O read throughput log min. (MB/s, last day) |
321 | I/O read throughput log avg. (MB/s, last day) |
325 | I/O write throughput log min. (MB/s, last day) |
326 | I/O write throughput log avg. (MB/s, last day) |
330 | Max. trigger read ratio (data) |
331 | Max. trigger write ratio (data, log) |
340 | Log switch wait count ratio (%) |
341 | Log switch race count ratio (%) |
350 | Blocking savepoint phases > 10 s (last day) |
351 | Blocking savepoint phase avg. (s, last day) |
352 | Blocking savepoint phase max. (s, last day) |
355 | Time since last savepoint (s) |
357 | Savepoint write throughput (MB/s) |
358 | Savepoints taking longer than 900 s (last day) |
360 | Number of failed I/O reads |
361 | Number of failed I/O writes |
370 | Unused space in data files (%) |
652 | Number of page dumps (last day) |
915 | Min. data backup throughput (GB/h, last week) |
916 | Avg. data backup throughput (GB/h, last week) |
1160 | Average COMMIT time (ms) |
__close_nocancel
DataRecovery::LoggerImpl::writeLog
disk_space
__endmntent_internal+0xa
FileAccess::DirectoryEntry::DirectoryEntry
FileAccess::DirectoryEntry::findFirst
FileAccess::DirectoryEntry::findNext
FileAccess::DirectoryIterator::operator++
FileAccess::fileSize
FileAccess::FileSystem::init
FileAccess::getFileSystem
__fopen_internal
__getdents
__GI__IO_file_open
__GI___lxstat64
__GI___statfs
_IO_new_fclose
_IO_new_file_close_it
_IO_new_file_fopen
_IO_setb_internal
__opendir
__open_nocancel
__readdir_r
__realpath
System::UX::endmntent
System::UX::opendir
System::UX::readdir_r
System::UX::stat
An increased number of SAP HANA threads with one of the following details exist (SAP Note 2114710):
Thread State | Lock Name |
Wait IO | |
Semaphore Wait | LogBufferFreeWait |
Semaphore Wait | LoggerBufferSwitch |
Semaphore Wait | POSTCOMMIT_FINISH_SMP |
Semaphore Wait | PrefetchCallback |
Semaphore Wait | PrefetchIteratorCallback |
5. What can I check in order to understand if SAP HANA suffers from I/O?
You can perform the following detailed checks to understand if and when there are indications for problems in the I/O area:Action | SQL statement (SAP Note 1969700) | Details | ||||||||||
Check duration of critical savepoint phases | SQL: "HANA_IO_Savepoints" | The critical savepoint phase is particularly critical for the
application because during that time all change operations (like INSERT,
UPDATE, DELETE) are blocked. Long durations of the critical savepoint
phase are often linked to I/O writes. If critical savepoint phases
repeatedly take more than 2 seconds, and I/O issue might be the root
cause. See SAP Note 2100009 for more information regarding savepoints. |
||||||||||
Check for long running savepoints | SQL: "HANA_IO_Savepoints" | Savepoints may run for a long time for several reasons, e.g. high write volume, I/O bottlenecks or a high prepare flush retry count. See SAP Note 2100009 for more information. | ||||||||||
Check trigger ratios | SQL: "HANA_IO_KeyFigures_Total" | Increased trigger ratios can indicate synchronous rather than asynchronous I/O requests. This constellation can increase the runtime of critical I/O activities - e.g. the critical savepoint phases - and should therefore be avoided. See SAP Note 1930979 for more information. | ||||||||||
Check current total I/O information | SQL: "HANA_IO_KeyFigures_Total" | The total I/O information (collected since last SAP HANA restart)
provides a first global perspective on I/O activities including average
times and read and write throughputs. You can check if these values meet
your expectations and if you see differences between different SAP HANA
systems. Be aware that the values also depend on the usual request sizes. If the average request size is higher, the average request times are typically higher, but at the same time the throughputs are better. |
||||||||||
Check historic total I/O information | SQL: "HANA_IO_KeyFigures_Total_History" | The historic total I/O information provides further insight in the I/O activities over time. It can help you identifying times with degraded I/O times or with a particularly high I/O volume. | ||||||||||
Check current detailed I/O information | SQL: "HANA_IO_KeyFigures_Details" | The detailed I/O information (collected since last SAP HANA restart) breaks the total values up into different I/O buffer size classes. In this way you can understand better how small, medium and large I/O requests behave. Additionally it is possible to distinguish between synchronous and asynchronous I/O. | ||||||||||
Check historic detailed I/O information | SQL: "HANA_IO_KeyFigures_Details_History" | The historic detailed I/O information provides the most granular I/O information, because it allows to differentiate by historic times, I/O buffer size classes and synchronous / asynchronous I/O. It can help to pinpoint temporary I/O performance regressions, e.g. due to increased load or external factors. | ||||||||||
Check for tables with most I/O | SQL: "HANA_Tables_IOStatistics" | This command can be used if a particularly high amount of I/O is caused by specific tables. If certain tables are identified, you can check from application side if the amount of changes can be reduced (e.g. delta load rather than full load, reduced tracing or logging). | ||||||||||
Check for backups with a low throughput | SQL: "HANA_Backups_BackupRuns" | The column MB_PER_S in the output of this command provides information about the backup throughput. Values below 200 GB / h can indicate bottlenecks in the I/O area. | ||||||||||
Check active threads | SQL: "HANA_Threads_CurrentThreads" | Increased activities of the following special threads can point to I/O problems in certain areas (see SAP Note 2114710).
|
||||||||||
Check call stacks | SQL: "HANA_Threads_Callstacks" | If an unusual high amount of threads is active in call stacks mentioned in "Which indications exist that the SAP HANA database suffers from I/O accesses?" above, I/O problems can be responsible for it. |
6. What can I do if the SAP HANA related figures indicate bottlenecks in the I/O area?
If you found indications on SAP HANA side that disk I/O is slower than expected, you have to perform a more detailed analysis on lower layers of the I/O stack:- Operating system
- File system
- I/O sub system
- Network between SAP HANA instances and I/O sub system
SAP Note 1943937 provides a hardware check tool (hwcct) that can be used to check if the measured performance is in line with the expectations. SAP Note 2212741 contains some hwcct troubleshooting tips. 1661146 provides a tool to check SAP HANA appliances on Lenovo / IBM.
Be aware that the I/O performance rules of thumb don’t apply for X5 / Westmere based hardware (SAP Note 2187426) and worse values can be accepted if the business needs are met.
If the problem is not resolved with all recommended settings in place, you should get in touch with your teams and partners from the operating system, network, hardware and I/O area so that they can perform more detailed analysis on these lower layers of the I/O stack.
No comments:
Post a Comment