Tuesday, January 19, 2016

1999998 - FAQ: SAP HANA Lock Analysis

Symptom
SAP HANA connections are hanging because they have to wait for locks.


Environment
SAP HANA


Cause
In order to synchronize the access to resources (records, objects, low level locks, ...) it is required to use lock mechanisms.


Resolution

1. Where do I find information about SAP HANA lock analysis?

SAP Note 1858357 provides an overview how to analyze lock situations.
The SAP HANA Troubleshooting and Performance Analysis Guide at SAP HANA Troubleshooting and Performance Analysis Guide contains further information related to lock analysis.

2. Which indications exist for SAP HANA locking issues?

You suffer from terminations and short dumps due to SQL errors 131 or 133:
SQL error 131: transaction rolled back by lock wait timeout
SQL error 133: transaction rolled back by detected deadlock
SQL error 146: resource busy and NOWAIT specified
The following SAP HANA alerts indicate problems in the locking area:
Alert Name SAP Note  Description
16 Lock wait timeout configuration 1909707 Determines whether the 'lock_wait_timeout' parameter in the 'transaction' section of the indexserver.ini file is between 100,000 and 7,200,000.
49 Long-running blocking situations 2079396 Identifies long-running blocking situations.
59 Percentage of transactions blocked 2081856 Determines the percentage of transactions that are blocked.
SQL: "HANA_Configuration_MiniChecks" (SAP Notes 1969700, 1999993) returns a potentially critical issue (C = 'X') for one of the following individual checks:
Check ID Details
870 SelfWatchDog activity time (%, last hour)
1010 Age of oldest active trans. lock wait (s)
1011 Trans. lock wait durations > 600 s (last day)
1020 Threads currently waiting for locks
1021 Maximum threads waiting for locks (last day)
1030 Concurrently blocked transactions (last hour)
1031 Concurrently blocked transactions (last day)
1720 Supported nameserver lock file location

3. What types of locks exist?

We can distinguish the following SAP HANA lock types:
Lock type Lock wait thread state Scope  Views Details
Record lock ConditionalVariable Wait
(RecordLockWaitCondStat / TransactionLockWaitCondStat)
transactional M_RECORD_LOCKS
M_CONDITIONAL_VARIABLES
M_BLOCKED_TRANSACTIONS
M_OBJECT_LOCK_STATISTICS
Exclusive locks on record level, typically caused by concurrent changes of the same records by different transactions
Object lock ConditionalVariable Wait
(TableLockWaitCondStat / TransactionLockWaitCondStat)
transactional M_OBJECT_LOCKS
M_CONDITIONAL_VARIABLES
M_BLOCKED_TRANSACTIONS
M_OBJECT_LOCK_STATISTICS
Locks on object level, typically caused by DDL operations requiring an object lock:
  • Shared locks (INTENTIONAL EXCLUSIVE): Set in case of DELETE, INSERT, MERGE, SELECT FOR UPDATE, UPDATE and UPSERT operations on table
  • Exclusive locks (EXCLUSIVE): Set in case of DDL operations on table and in case of an explicit LOCK TABLE operation 
Read / write lock SharedLock Enter
ExclusiveLock Enter
internal M_READWRITELOCKS Read / write lock waits, e.g. waits during critical savepoint phase
Semaphore Semaphore Wait internal M_SEMAPHORES Low level locks based on semaphores, e.g. waits for a critical delta merge phase
Mutex Mutex Wait internal M_MUTEXES Low level locks based on mutexes
Barrier Barrier Wait internal Low level locks similar to mutexes
liveCache lock internal M_LIVECACHE_LOCKS Locks related to integrated liveCache (if used)
Nameserver topology special The nameserver topology is locked based on a file (/tmp/.hdb_<sid>_<inst_id>_lock)
Record and object locks are mainly linked to the application transactions while read / write locks, semaphores, mutexes and barriers are managed by SAP HANA internally.
Be aware that the table above concludes from a lock type to a thread state. The opposite way is not generally valid. For example, there can be "ConditionalVariable Wait" situation that are not linked to a record lock or object lock.

4. How can I check if my SAP HANA database suffers from lock waits?

On a very elementary level you can identify lock waits based on the thread states.
Current threads can be displayed via:
  • Transaction DBACOCKPIT: Performance --> Threads
  • SAP HANA Studio: Performance --> Threads
  • SQL: "HANA_Threads_CurrentThreads" (SAP Note 1969700)
Historic thread activities can be determined via:
  • SQL: "HANA_Threads_ThreadSamples_FilterAndAggregation" and "HANA_Threads_ThreadSamples_AggregationPerTimeSlice" (SAP Note 1969700)
The following thread states indicate lock wait situations:
  • Barrier Wait
  • ConditionalVariable Wait
  • ExclusiveLock Enter
  • Mutex Wait
  • Semaphore Wait
  • SharedLock Enter
  • Sleeping
Not every occurrence of these states is critical. It happens frequently that a thread submits a request to another thread and waits for a related semaphore until the called thread has returned the result.
The comprehensive check command SQL: "HANA_Configuration_MiniChecks" (SAP Note 1969700) contains also lock related checks. An 'X' in column C indicates potentially critical situations. Example output:


The following SQL errors also indicate lock related problems:
  • SQL error 131: transaction rolled back by lock wait timeout
  • SQL error 133: transaction rolled back by detected deadlock
The following SAP HANA alerts are another indication for locking issues:
  • Alert 49: Long-running blocking situations
  • Alert 59: Percentage of transactions blocked
Below you can find more specific information how to analyze certain lock wait situations.

5. How can transactional lock waits be analyzed and optimized?

If a high number of transactional (record and object) lock waits exist, but at the same time there are also many semaphore and mutex waits, we can usually assume that the record and object lock waits are a consequence of the mutex and semaphore waits. In this case you should focus on the analysis of mutex and semaphore waits (see below). Otherwise you can use the following approaches to analyze transactional lock situations.
Current lock wait situations are visible as blocked transactions:
  • View M_BLOCKED_TRANSACTIONS
  • SQL: "HANA_Locks_Transactional_LockWaits_Current" (SAP Note 1969700)
  • SAP HANA Studio: Performance -> Blocked Transactions
  • DBACOCKPIT: Diagnostics -> Locks -> Blocked Transactions
Historic lock wait situations can be found in the related history:
  • View HOST_BLOCKED_TRANSACTIONS
  • SQL: "HANA_Locks_Transactional_LockWaits_History" (SAP Note 1969700)
  • SQL: "HANA_LoadHistory_Services" (SAP Note 1969700)
Among others this information provides information about the involved table and transactions. With some further research it is usually possible to identify the related business scenarios.
The following table lists typical approaches to optimize record and object locks:
Lock type Optimization approaches
Record lock Check from an application and / or scheduling perspective if you can reduce the amount of concurrent changes of the same records by different transactions.
Check from application side if you can reduce the critical time frame between DML operation and next COMMIT (lock is always held until next COMMIT is executed).
Manually terminate the lock holding transaction if it is responsible for a critical current problem.
Lock wait related rcommendations for specific tables like FKKDIHDTMP, NRIV or /PLMB/FRW_BUFFER can be found in SAP Note 2000002 ("Are there standard recommendations for specific SQL statements available?").
Object lock Make sure that critical DDL operations (e.g. offline repartitioning) are executed at times with minimum concurrent workload.
Check if you can reduce the amount of critical DDL operations (e.g. by using online operations).
If another transaction holds an uncommitted change on a table, a DDL operation on the same table like CREATE INDEX will escalate this row level lock to a table level lock because it prevents any new DML operation to be started until it acquires the lock itself and finalizes its tasks. In order to minimize this risk, you can reduce the lock wait timeout on transaction level by executing the following command before starting the DDL operation:
SET TRANSACTION LOCK WAIT TIMEOUT <timeout_in_ms>
A value of 0 for <timeout_in_ms> would result in an immediate termination of the DDL operation with the following error:
SQL error 131: transaction rolled back by lock wait timeout: Lock timeout occurs while waiting OBJECT_LOCK of mode EXCLUSIVE
For index creations additional options exist to reduce the risk and duration of object locks. See SAP Note 2160391 ("Are accesses to the underlying table locked when an index is created?") for more information. If an index is created with NOWAIT option and there is still an open change on the table, it is terminated with the following error:
SQL error 146: resource busy and NOWAIT specified 
Manually terminate the lock holding transaction if it is responsible for a critical current problem.

6. How can internal lock waits be analyzed?

Read / write lock, semaphore and mutex waits are typically related to internal lock situations. If you experience a high number of threads waiting for these locks you can run SQL: "HANA_Sessions" (SAP Note 1969700) in the first step. The column WAITING_FOR can provide some additional expert insight for threads with states 'SharedLock Enter', 'ExclusiveLock Enter', 'Mutex Wait' and 'Semaphore Wait'.
Example:
waiting_for.JPG
In this example the lock names already indicate a contention related to the nameserver process.
Similar information can be found in column LOCK_WAIT_NAME of views M_SERVICE_THREADS and M_SERVICE_THREAD_SAMPLES.
The following table contains typical general reasons for contention on internal locks:
Check Details
Memory bottleneck Check if there are indications for a memory bottleneck like paging or column store unloads at times of the internal lock waits. If yes, identify and eliminate the root cause of the memory bottleneck. See SAP Note 1999997 for more details.
CPU bottleneck Check if there are indications for a high CPU consumption at times of the internal lock waits. If yes, try to identify the root cause like expensive SQL statements (SAP Note 2000002) or CPU-intensive processes outside of SAP HANA. Be aware that sometimes the high CPU consumption is a consequence of the internal locks because of active waits.
SAP HANA bug Bugs like permanent deadlocks on internal resources can be responsible for internal lock waits. Therefore check SAP Notes for potential bugs.
Furthermore the following typical root causes exist for certain scenarios:
Thread state Lock type  Reason
ExclusiveLock Enter
SharedLock Enter
Semaphore Wait
Sleeping
unnamed Mutex
unnamed Mutex
BTree GuardContainer
SleepSemaphore

If SELECT or DML accesses on one particular table are massively stuck in these states during some times (typically seconds to minutes), it can be caused by an expensive join holding a lock on the table for a longer time. Typically the lock types and call stacks are:
  • DML operations: ExclusiveLock Enter or Semaphore Wait / AttributeEngine::BtreeAttribute::GuardContainer::_acquire
  • SELECT operations: SharedLock Enter / AttributeEngine::BtreeAttribute::GuardContainerLocks::acquireRead
Known situations are:
Patch level Context Details
Rev. 70 - 73 Join statistics creation When join statistics are created during the first execution of a join query, a deadlock with DML operations may happen on Rev. 70 to 73. See SAP Note 2009806 for more information.
SPS <= 09 Join with expensive sorting Up to SAP HANA SPS 09 sorting is secured by a GuardContainerLock, so you have to avoid joins with expensive sorting when it causes trouble (e.g. by using restricting selection conditions or by avoiding the ORDER BY). Alternatively you can upgrade to SAP HANA SPS 10 or higher.
Fulltext indexing Check if the underlying table has fulltext indexes (e.g. by using SQL: "HANA_Indexes_Columns", SAP Note 1969700) and look at for threads in suspicious states like 'indexing' at the time of the locking issue (SAP Note 2114710). If fulltext indexes exist, you can suspend the indexing by executing the following command:
ALTER FULLTEXT INDEX "<schema_name>"."<table_name>" SUSPEND QUEUE
If the problem disappears, you can activate the queue during non-critical times in order to update the index using the following command:
ALTER FULLTEXT INDEX "<schema_name>"."<table_name>" ACTIVATE QUEUE
Be aware that the SUSPEND isn't persisted, so after a SAP HANA restart the queue will be processed again and you have to executed the SUSPEND command another time to deactivate it.
Concurrent read and update transactions If a column is read and at the same time another transaction wants to change it, both the update transactions and all subsequent readers could be blocked until the reader is finished. Optimizing the critical read activities (SAP Note 2000002) can help to improve this situation.
The same issue can theoretically also happen if no join is involved (but join engine functionality is implicitly used) or if other expensive operations like aggregations or text processing happen.
Job Exec Waiting CSPlanExecutorWaitForResult This lock indicates that a SQL executor waits for its job workers executing a column store statement (thread method: PlanExecutor calc). So it is basically an "idle" lock and itself not critical. If the lock is visible for a long time, you have to check why the execution of the SQL statement by the job worker(s) takes so long.
Mutex Wait AttributeStore Resource Load This lock is used when a database query requires a column which currently isn't in memory and needs to be loaded into the column store. The actual load is done by special threads (LoadField) and the thread preparing or executing the query has to wait for the "AttributeStore Resource Load" lock during this time. See SAP Note 2127458 for more information about loads and unloads.
Mutex Wait ceManager_GCLock This lock indicates a wait for the calculation engine garbage collector (CalcEngineGarbageCollector, see SAP Note 2169283).
Mutex Wait CSPlanExecutorLock Job worker threads of a parallelized execution have to wait for other threads of the same connection. You can consider these waits as idle waits. In order to improve the performance and reduce the wait time, you have to check what the other threads of the same connection are doing and how their activity can be optimized (e.g. by eliminating lock scenarios, resource bottlenecks or expensive SQL statements).
Mutex Wait Federation This lock is linked to Smart Data Access (SDA), i.e. accesses to remote databases. See SAP Note 2180119 and check if you can optimize the configuration and use of SDA. Make sure that you use supported client software (e.g. ODBC driver).
Mutex Wait fo-Stats This lock is required to synchronize the collection of operations of a specific job graph (i.e. of a specific database request), so only job worker processes are impacted. It mainly protects adding new jobs.
Mutex Wait jx_pq01-nopref
jx_pq02-nopref
jx_pq03-nopref
jx_pq04-nopref
jx_pq05-nopref
jx_pq06-nopref
These locks synchronize operations on job queues (separated by NUMA node and priority), so only job worker processes are impacted.
If you observe a high number of waits for this lock you can check if there are many expensive SQL statements (SAP Note 2000002) being processed with a high number of jobs.
Mutex Wait LastValuesHolder::AccessCoordinator This mutex synchronizes the accesses to the monitoring view M_STATISTICS_LASTVALUES, where the standalone statistics server stores data historized during the most recent snapshot. The most efficient way to bypass this bottleneck is an upgrade to the embedded statistics server that is possible as of Rev. 74 (see SAP Note 1917938). Alternatively you can reduce the collected data volume as described in SAP Note 2084747.
Mutex Wait LoggerBufferSwitch This lock indicates wait situations for a log buffer switch. Typical reasons are:
Reason Details
Slow I/O writes to log area Check according to SAP Note 1999930 if there are general or temporary I/O write regressions to the local log area.
System replication performance See SAP Note 1999880 ("Can problems with system replication impact the performance on the primary system?") and make sure that system replication is running smoothly without impacting the primary system.
Number and size of log buffers configured too small If none of the above reasons can explain the wait situations, you can consider to increase the number and the size of the log buffers. You can use SQL: "HANA_Logs_LogBuffers" (SAP Note 1969700) to check the current log buffer configuration. As a starting point you can double the configured size and number. The following parameters can be used to adjust size and number:
global.ini -> [persistence] -> log_buffer_size_kb









global.ini -> [persistence] -> log_buffer_count

Mutex Wait Queue.cpp: queueserver2::Queue::Queue Idle wait of "Request" / "Queue Pull" threads from preprocessor to indexserver (see SAP Note 2114710), usually not critical
Mutex Wait TRexAPI_TableDeltaMerge_statisticLock This lock is held while CPU and memory information is collected. This information is required to determine a proper number of merge tokens. See SAP Note 2057046 for more information.
Mutex Wait TrexNet_Requestor_EndPointsLock Among others, the critical path of this lock contains domain name service (DNS) accesses (e.g. "Communication::getHostAddress") up to SAP HANA SPS 09. If the DNS lookups are slowing down, threads may queue up on the TrexNet_Requestor_EndPointsLock lock. Therefore you should check if there are issues with DNS if you see threads queueing on this lock. See SAP Note 2222200 for more information regarding network checks in SAP HANA environments.
Consider switching the listeninterface parameter from .global to .internal in scale-out environments as recommended in SAP Note 2183363, because with .internal setting will suppress DNS requests.
Mutex Wait unnamed Mutex If DML operations are executed and committed with a high frequency (e.g. because of concurrent single record INSERTs during migrations and data loads), the undo free list can become a central bottleneck, persistence garbage collection is no longer able to keep up and result in mutex waits of DML operations. The call stack then shows a location like "getPageWithFreeSpaceFromFreeList". Avoid frequent COMMITs of small DML operations to come around this issue. See SAP Note 2169283 for more information related to SAP HANA garbage collection.
Mutex Wait
Semaphore Wait
Semaphore Wait
unnamed Mutex
AsyncRequest.cpp: TrexNet::RequestSet::go
AsyncRequest.cpp: (anonymous namespace)::IOThread::run
If you see these kinds of locks and only accesses to a certain table are blocked, you can check for the related thread types and thread methods (see SAP Note 2114710). In case you see significant amounts of
  • THREAD_TYPE: Request, THREAD_METHOD: __globalTransControl
  • THREAD_TYPE: AsyncWaiter, THREAD_METHOD: waiting
it is possible that the problem is caused by problems when modifying a partitioned table with more than one unique index. Check if the involved table has more than one unique index and is partitioned (across different nodes). If yes, check if it is possible to implement a workaround like disabling partitioning on this table. In particular you should avoid partitioning of BW SID tables (/BI0/S*). The underlying problem is fixed as of Rev. 97.
Semaphore Wait IndexMgr.cpp: TRexConfig::IndexHandle::acquire
Acquire Index Access
These waits are usually caused by delta merges. Typical constellations are:
Situation Details
Move of uncommitted rows from current to new delta storage  During a delta merge uncommitted rows need to be moved from the source delta storage to the target delta storage. When this happens, a lock is set on the table and change operations are blocked. See SAP Notes 2098801 and 2057046 for more information and make sure that the merge decision functions are set properly.
Lock escalation if index handle is held by other transaction If another transaction holds a - usually non-critical - shared lock and the delta merge wants to move uncommitted rows to the new delta storage, it escalates the lock to a critical exclusive lock blocking all kind of operations. This can e.g. happen in the following situations:
  • If on Rev. <= 102 both the table changes and the delta merge are blocked by this lock and at the same time the involved table has a LOB field that is frequently changed, the lock holder can be the LobGarbageCollector (see SAP Note 2169283 regarding LOB garbage collection). Due to a bug the LOB garbage collector can run much longer than necessary and at the same time the auto merge escalates the usual shared lock to an exclusive lock. As a workaround you can disable auto merges on the underlying table and trigger hard merges during non-critical times instead (see SAP Note 2057046). As a permanent solution implement Revision 102.01 or higher.
  • Also expensive queries can allocate the index handle lock for a long time, e. g. when writing data to an internal table (call stack: ItabCreateGuard::~ItabCreateGuard). The best solution in this case is to tune these bad queries, so that both their own runtime and the effect on the overall system is minimized.
If you suffer from this situation on a specific table, you can disable auto merge as a workaround and perform a hard merge during non-critical times.

Semaphore Wait IpmmTaskWait
MemoryReclaim
These locks can show up when memory reclaim activities like shrinks happen. See SAP Note 1999997 and make sure that memory is used efficiently so that the amount of reclaim operations is reduced.
Semaphore Wait LogBufferFreeWait This lock indicates that no freespace is available in the log buffer to record redo information. This is typically caused by disk I/O issues. See SAP Note 1999930 for more information.
Semaphore Wait MailSenderCallback Waits for this semaphore can be caused by the statistics server having problem to send out mails. In the related statistics server trace file you may find an entry related to the SMTP port 25 like:
(invalid)-><ip_address>/25_tcp ConnectWait
The permanent hanging situation is caused by SAP HANA bugs which are fixed with Rev. 83 and Rev. 90.
As a consequence of this problem the statistics server may no longer be replicated properly. In order to restart the system replication, you can restart the statistics server.
Semaphore Wait POSTCOMMIT_FINISH_SMP This lock is related to COMMIT processing and can be caused by the following issues:
  • Problems with system replication like increased log shipping latencies or full log shipping buffer. See SAP Note 1999880 -> "Can problems with system replication impact the performance on the primary system?" for more details.
  • I/O bottlenecks on primary system (see SAP Note 1999930)
  • Other general issues on primary system
Semaphore Wait PrefetchCallback
PrefetchIteratorCallback
These waits are active when the thread waits for pages requested from disk. A high number of these waits can indicate that the I/O stack is not able to keep up with the number of I/O requests. This may be normal (e.g. when a large number of columns is loaded in parallel after startup), but it can also indicate bottlenecks in the I/O area. See SAP Note 1999930 for more information regarding I/O optimization.
Threads waiting for this lock typically have call stacks including modules like DataContainer::PageChainContainerPrefetchedIterator::PrefetchBuffer::getNextPage.
Semaphore Wait TableDeltaMerge.cpp: TRexAPI::TableMergeRequest::TableMergeRequest Threads waiting in modules like TRexAPI::MergeMonitor::acquireToken for this lock, the number of merge tokens is too small to handle the current merge load. Occasional merge peaks are acceptable, but if it happens frequently, you should consider the following optimizations:
  • Check if the amount of DML change operations can be reduced, so that the merge pressure is implicitly reduced.
  • Check if the merges performance can be improved.
  • Check if there are merge failures on large tables (e.g. due to OOM).
  • Increase the number of merge tokens if feasible.
See SAP Note 2057046 for more details about merges and merge tokens.
Semaphore Wait TableLoad.cpp: TRexAPI::LoadFieldThread
TableLoad.cpp: ltt::vector::vector
This semaphore is used when TableReload threads trigger LoadField threads in order to load columns into memory, e.g. after a startup of SAP HANA. See SAP Note 2127458 for more information related to column loads. It is normal to see spikes of this wait at some times. Its runtime mainly depends on the I/O performance when columns are read from the data area on disk. If it takes too long, you should check the I/O performance based on SAP Note 1999930.
Semaphore Wait

SharedLock Enter
ExclusiveLock Enter
TranslationTable.cpp: JoinEvaluator::TranslationTable::fill
TranslationTable.cpp: JoinEvaluator::TranslationTable::completeFill

RWLock[TranslationTable.cpp]: JoinEvaluator::TranslationTable::TranslationTable

These locks happen in the context of populating translation tables which are used to map join columns (see SAP Note 1998599). Check the following suggestions to optimize the situation:
  • See SAP Note 2124112 and check if there are SQL statements with a high number of instantiations in the SQL cache as – in case of joins – they are candidates for creating a significant amount of translation tables.
  • See SAP Note 2217936 and make sure that caching of translation tables for temporary tables is deactivated.
  • If this is not possible (e.g. because you are on a release <= 97.01), you can consider increasing the translation table cache size (indexserver.ini -> [joins] -> translator_cache_size). You can for example start with doubling the existing value. Be aware that this setting is a trade-off between performance and memory consumption, because the heap allocator Pool/JoinEvaluator/TranslationTable will increase (SAP Note 1998599).
SharedLock Enter
ExclusiveLock Enter
calcEngine_CalcEngineManager_ScenarioMapRWLock These locks can be secondary symptoms when a thread is blocked by the ceManager_GCLock lock, i.e. by the calculation engine garbage collector (CalcEngineGarbageCollector, see SAP Note 2169283).
SharedLock Enter
ExclusiveLock Enter
ConsistentChangeLock During the blocking phase of savepoints DML operations are blocked with "SharedLock Enter" related to the lock type ConsistentChangeLock. The call stacks contain modules like DataAccess::SavepointSPI::lockSavepoint. See SAP Note 2100009 and check if the blocking phase of savepoints can be optimized.
SharedLock Enter
ExclusiveLock Enter
ExpensiveStatements_RingBufferLock This lock secures the ring buffer linked to the expensive statement trace. Usually these locks are not held for a long time. If you suspect that they are critical, you can disable the expensive statement trace for testing purposes. See SAP Note 2180165 for more information related to the expensive statement trace.
SharedLock Enter
ExclusiveLock Enter
FileIDMapping This lock secures persistence container accesses (e.g. for hybrid LOBs or virtual files). A high contention can happen when many persistence garbage collector threads (GCJob*) are concurrently active. In this case you can reduce the maximum number of persistence garbage collectors by setting the following parameter to a lower value (SAP Note 2100040).
global.ini -> [persistence] -> max_gc_parallelity
Additionally please open a SAP incident in order to report the contention and request a fix.
See SAP Note 2169283 for more information regarding SAP HANA garbage collection.
ExclusiveLock Enter
SharedLock Enter
JoinEvaluator_JEPlanData_Lock This lock situation happens when multiple threads of the same join engine job need to access the same plan data objects. This is a rather normal situation so that it is not necessarily an issue that this lock belongs to the most frequent ones in a system. The lock waits are always local to a single request / single connection. If the locks result in performance issues, you should check if the underlying SQL statement is already processed optimally or if it can be tuned (SAP Note 2000002).
SharedLock Enter
ExclusiveLock Enter
MergeMonitor_Lock
TRexAPI::TableMergeRequest::TableMergeRequest
These lock wait situations indicate that merge operations have to wait for merge tokens, e.g. due to a high amount of concurrent merges or bottlenecks during merging. See SAP Note 2057046 for more details. In very special cases you can adjust parameters like the following in order to control the merge throughput:
indexserver.ini -> [mergedog] -> load_balancing_func
indexserver.ini -> [mergedog] -> token_per_table
SharedLock Enter
ExclusiveLock Enter
MergeMVCCLock This lock can happen in case of concurrent merges and is typically a consequence of other issues (e.g. I/O problems or highly concurrent merges). See SAP Note 2057046 for more details regarding merges.
SharedLock Enter
ExclusiveLock Enter
SavepointLock This lock synchronizes savepoints and makes sure that a savepoint (or snapshot) isn't started at a time when another savepoint (or snapshot) is already running. As an example, a system replication related snapshot will wait for SavepointLock as long as a normal savepoint is in progress on primary side. If you frequently see these locks, you should check and optimize the savepoint runtime and performance (SAP Note 2100009).
SharedLock Enter
ExclusiveLock Enter
SessionContextCacheMapLock This lock secures the access to session context information. If you see many waits, it is usually not a problem itself, but a symptom of another underlying issue (e.g. slow or hanging COMMIT operations).
SharedLock Enter
ExclusiveLock Enter
TRexConfig_IndexMgrIndex_ReplayLock This lock is required when delta log information is loaded or replayed. If for example a large delta storage isn't loaded into memory, yet, the first thread requiring this information will load it in a module like Delta::log::Replay::replayLog. Other processes have to wait for the TRexConfig_IndexMgrIndex_ReplayLock lock. This situation can happen if the delta merge fails for some reason. Therefore you should check according to SAP Note 2057046 that delta merges are running fine.
An overview of internal lock waits can be retrieved via:
  • M_READWRITELOCKS, M_SEMAPHORES, M_MUTEXES
  • SQL: "HANA_Locks_Internal_LockWaits_Overview" (SAP Note 1969700)
  • liveCache related internal lock waits can be checked via SQL: "HANA_liveCache_LockWaits_Current" (SAP Note 1969700)
Be aware that many harmless internal lock waits are included in these tables (e.g. threads being idle, but waiting for internal locks at that time), so a top down analysis is not helpful.
If the SAP HANA database is no longer accessible due to the internal lock escalations, see SAP Note 1999020 for potential additional analysis steps.
If you are not able to determine the root cause of critical internal lock waits, open a SAP incident on component HAN-DB and provide a full system dump if possible (SAP Note 1732157).

7. How can special lock waits be analyzed?

Special lock waits are all lock waits that aren't covered by normal thread states and monitoring views. Due to a lack of monitoring data it is particularly hard to analyze them. In the following table you can find pro-active suggestions and further details:
Lock type Details
Nameserver topology The namesever topology tracks the current state of the database (e.g. existing tables and locations) and is implemented as shared memory. Concurrent accesses to the nameserver topology are synchronized via a file on operating system level:
/tmp/.hdb_<sid>_<inst_id>_lock
If there is a bottleneck accessing the /tmp directory (e.g. because it is mounted via NFS), the performance can be significantly impacted. In order to minimize the risk of synchronization delays you have to make sure that /tmp is located on fast, local devices. See SAP Note 2100296 for more information. Starting with SAP HANA Rev. 91 a message is written to the daemon trace if the /tmp file system is potentially critical. Mini check 1720 ("Supported nameserver lock file location") evaluates this information.
Typical call stacks of threads waiting for the lock are:
TrexIpc::NameserverLock::acquireRead
NameServer::FTreeContainer::getValue
An indirect indication for problems accessing this file can be increased SelfWatchDog thread activities (see SAP Note 2114710). Mini Check 870 performs a related check (see SAP Note 1999993). If you assume a problem, you can run SQL: "HANA_Threads_Callstacks" (THREAD_TYPE = 'SelfWatchDog') available via SAP Note 1969700 in order to have a look at the call stacks of the SelfWatchDog threads. If threads typically work in the following call stack, a problem accessing the lock file is likely:
__open_nocancel+0x20
__GI__IO_file_open+0x30
_IO_new_file_fopen+0xc7
__fopen_internal+0x80
checkLockFile
TrexService::SelfWatchdog::checkLockFile
TrexService::SelfWatchdog::run

8. Do timeouts for lock waits exist?

Transactional lock waits are terminated when the time limit defined with the following parameter is exceeded:
indexserver.ini -> [transaction] -> lock_wait_timeout
Its default value 1800000 which represents 1800000 ms / 1800 s / 30 minutes. This means that a lock wait is terminated after 30 minutes and the following error message is issued:
SQL error 131: transaction rolled back by lock wait timeout
This behavior is different from other databases like Oracle where no timeout for exclusive lock waits exist and transactions willl wait for the lock until the lock is available or the transaction is manually terminated.
For internal lock waits no timeout is implemented.

9. What happens in case of deadlocks?

Deadlocks are situations where two or more transactions lock each other cross-wise so that no transaction will ever be able to proceed. If SAP HANA recognizes a deadlock (typically in combination with record locks), one of the transaction will be terminated and the following error is issued:
SQL error 133: transaction rolled back by detected deadlock
This behavior is different from other databases like Oracle where a deadlock will only terminate the current DML operation and not the whole transaction.

10. Is it possible to identify the record responsible for a record lock?

It is currently not possible to identify the record responsible for a record lock. The view M_BLOCKED_TRANSACTIONS contains a reference to the record (WAITING_RECORD_ID), but this reference can't be evaluated in a reasonable way.

11. How can I determine how long certain types of locks are held?

Normally it is most important to analyze lock wait situations, i.e. the concurrent access to the same resources. For transactional locks you can additionally see how long locks are held (even if no transaction is waiting) by using the following approaches:
  • M_OBJECT_LOCKS, M_RECORD_LOCKS, M_OBJECT_LOCK_STATISTICS
  • SQL: "HANA_Locks_Transactional_Current" / SQL: "HANA_Locks_Transactional_Total" (SAP Note 1969700)

2 comments:

  1. Hi There,
    Thanks a trillion mate!
    It works like charm, saved a lot of energy & time.

    I have a seldom problem. Just to give some back ground.
    1. We have a Z DB table with 126 fields (Yes exactly 126) having 3 key fields which has like 250+ million entries.
    2. I need to update four fields in the Z table(Step1) from a CDS view which has 7 fields (3 key fields of the ZDB table + four fields which need to be updated).
    Here is the problem.
    Here is my code approach.
    a. this tab = data from the CDS_VIEW. (which will fetch like 20 million entries)
    I'm not sure how to approach the updating part. I cannot use MODIFY as I have to get the entire table data (which is huge performance).
    Just for Info we are using HANA DB.



    Appreciate your effort for making such useful blogs and helping the community.

    Thank you,
    Kumar

    ReplyDelete
  2. Hi There,

    THANKS SO MUCH for sharing this! I would love to buy you a coffee since I now won’t be up all night that has been driving me crazy (until now!!). I just wish I knew what was going wrong but so glad it’s in the right place now! Thanks again:)

    I have to create a table if it doesn't exist. SAP HANA Training
    I have gone through this answer https://archive.sap.com/discussions/thread/3383646 but here we use procedures and in it we are having a count of number of rows if greater than zero then the required execution is being made. But in my case the table is empty, even if it is created, so how could I check for it to be present in the DB? Is there any alternative way of creating a table (if it doesn't exists) other than the procedure check?


    Excellent tutorials - very easy to understand with all the details. I hope you will continue to provide more such tutorials.
    Obrigado,
    Radhey

    ReplyDelete