Tuesday, February 2, 2016

2169283 - FAQ: SAP HANA Garbage Collection

Symptom
You are interested in details about garbage collection activities within SAP HANA.


Environment
SAP HANA


Cause
1. What is garbage collection in SAP HANA environments?
2. Why is it important to pay attention for proper garbage collection?
3. Which indications exist for SAP HANA garbage collection problems?
4. What is multi-version concurrency control?
5. How can garbage collection be monitored?
6. What are typical reasons and solutions for problems with garbage collection?
7. When is garbage collection automatically triggered?
8. How can garbage collection be triggered manually?
9. Which threads are responsible for garbage collection?
10. How can the mvcc_anti_ager behavior and trace file entries be controlled?
11. Can garbage collection be considered as defragmentation?
12. What can be the reason for alerts 73 and 74 for services different from the indexserver?


Resolution

1. What is garbage collection in SAP HANA environments?

The term "garbage collection" in SAP HANA environments is rather unprecise, because it can be related to several disjunct activities. In this SAP Note we will use the following naming conventions when discussing different kinds of garbage collection:
Area Details
Row store version consolidation Garbage collection in the row store refers to the cleanup of no longer required consistent read versions of data. In order to avoid misunderstandings, it is better to name this activity "version consolidation".
Column store version consolidation Garbage collection in column store refers to the cleanup of no longer required consistent read versions of data. In order to avoid misunderstandings, it is better to name this activity "version consolidation". In column store both current and old versions of records are identified via unique document ID vectors (UDIV). If the number of UDIV values is significantly larger than the number of (current) records in a table, it can indicate the necessity of version consolidation.
Merges and reclaims of delta storage (SAP Note 2057046) and compression optimizations (SAP Note 2112604) happen when significant changes are executed against tables and implicitly take care for a cleanup of no longer required versions. Therefore version consolidation is typically no issue in column store.
Memory garbage collection Allocated memory areas are cleaned in order to make more memory available for other purposes.
Persistence garbage collection No longer required persistence areas (e.g. undo data) are cleaned so that the disk space can be re-used. Be aware that this is no actual defragmentation, so the allocated disk size won't reduce (see "Can garbage collection be considered as defragmentation?" below).
LOB garbage collection Garbage collection related to hybrid LOB columns (see SAP Note 1994962)
liveCache garbage collection Garbage collection related the the SAP HANA integrated liveCache
Calculation engine garbage collection Model garbage collection of the calculation engine

2. Why is it important to pay attention for proper garbage collection?

Problems with garbage collection can result in various critical problems like:
  • Increased memory requirements
  • Increased disk space utilization
  • Performance degradations up to system standstills

3. Which indications exist for SAP HANA garbage collection problems?

Garbage collection problems are often linked to long running transactions. The following indications can directly or indirectly point to problems in the garbage collection area.
The following trace file entries indicate potential issues in the area of garbage collection:
mvcc_anti_ager.cc(01291) : There are too many un-collected versions. 
('number of versions > 1000000' or 'maximum number of versions per record > 100000')
The cursor possibly block the garbage collection of HANA database.
mvcc_anti_ager.cc(01291) : There are too many un-collected versions on table "<schema>"."<table>" 
('number of versions for one of the partitions > 1000000' or 'maximum number of versions per record > 100000')
The transaction blocks the garbage collection of HANA database.
mvcc_anti_ager.cc(01199) : long running uncommitted write transaction detected.
mvcc_anti_ager.cc(01082) : The Connection is disconnected forcefully because it is blocking garbage collection for too long period.
Statement.cc(03190) : session control command is performed by ..., user=SYSTEM, query=ALTER SYSTEM DISCONNECT SESSION '<conn_id>'
mvcc_anti_ager.cc(00834) : long running cursor detected.
The open cursor possibly blocks the garbage collection of HANA database. 
Please close a cursor in application or kill the connection by "ALTER SYSTEM DISCONNECT SESSION '<conn_id>' "
The following SAP HANA alerts indicate potential problems linked to the garbage collection area:
Alert Name SAP Note  Description
42 Long-idling cursors 1900261 Identifies long-idling cursors.
47 Long-running serializable transactions 1977293 Identifies long-running serializable transactions.
48 Long-running uncommitted write transactions 1977276 Identifies long-running uncommitted write transactions.
73 Overflow of rowstore version space   Determines the overflow ratio of the rowstore version space.
74 Overflow of metadata version space   Determines the overflow ratio of the metadata version space.
75 Rowstore version space skew   Determines whether the rowstore version chain is too long.
SQL: "HANA_Configuration_MiniChecks" (SAP Notes 1969700, 1999993) returns a potentially critical issue (C = 'X') for one of the following individual checks:
Check ID Details
810 MVCC versions in row store
811 Max. MVCC versions in row store (last day)
812 Max. versions per record in row store
815 Age of transaction blocking row store MVCC (s)
830 Active commit ID range
832 Max. active commit ID range (last day)
840 Active update transaction ID range
841 Oldest active update transaction (s)
842 Max. active update trans. ID range (last day)
846 Table MVCC snapshot range
850 Persistence garbage collection history count
852 Undo and cleanup files
854 Undo and cleanup file size (GB)
856 Max. undo size of current transaction (MB)
857 Max. redo size of current transaction (MB)
1115 Longest running current SQL statement (s)
The following heap allocators are larger than usual (see SAP Note 1999997):
  • Pool/BitVector
  • Pool/PersistenceManager/UndoDirectory
  • Pool/RowEngine/Version
Threads are working significant times in call stacks containing at least one of the following modules:
  • DataContainer::VarSizeEntryUserDataHandler::getPageWithFreeSpaceFromFreeList
  • ptime::AntiAger::garbage_collector
  • ptime::filter_slot_backend
  • ptime::FlushVersion::collect
  • ptime::LinkVersion::removeDependency
  • ptime::LinkVersion::removeSlotVersionDependency
  • ptime::LockTable::collectGarbage
  • ptime::LockTable::getVersion
  • ptime::LockTable::hash
  • ptime::Transaction::get_snapshot_trans_token
  • ptime::Transaction::lock_slot_checking
  • ptime::Transaction::validate_latest_version
  • ptime::TransVersion::collect
  • ptime::TransVersion::collectVersion
  • ptime::VersionCollector::collect
  • ptime::VersionCollector::Partition::collect
  • ptime::VersionCollectorJobNode::run
  • ptime::VersionMgr::getPageDeleteVersion
  • ptime::VersionMgr::getSlotVersion
  • ptime::VersionMgr::getVersion
  • ptime::VersionTs::compare
  • TRexAPI::TableUpdate::writeUdivs
  • TrexStore::UdivListManager::insertUdivsWriter
  • UnifiedTable::TableContainerImpl::insertMVCCDoWork

4. What is multi-version concurrency control?

Multi-version concurrency control (MVCC) is an implementation of isolation levels based on multiple versions of data. The most frequently used isolation level in SAP HANA is "read committed". This means that every database request sees the committed data of the point in time when the request (not the transaction) started. As a consequence different current and historic versions of the same data may need to be preserved in order to support the points in time when different SQL statements were started.

5. How can garbage collection be monitored?

The following possibilities exist to check for garbage collection related details:
Area SQL statement (SAP Note 1969700) / monitoring views Details
General SQL: "HANA_GarbageCollection_Overview"
Overview about various garbage collection aspects including row store, column store and persistence
General SQL: "HANA_GarbageCollection_Blockers_History"
HOST_LONG_IDLE_CURSORS
HOST_LONG_RUNNING_STATEMENTS
HOST_LONG_SERIALIZABLE_TRANSACTION
HOST_UNCOMMITTED_WRITE_TRANSACTION
Information about historic activities potentially responsible for a blocked garbage collection:
  • Idle cursors
  • Long-running database requests
  • Serializable transactions
  • Uncommitted update transactions
Row store version consolidation M_MVCC_TABLES
M_MVCC_OVERVIEW (>= SPS 08)
M_RS_TABLE_VERSION_STATISTICS
SQL: "HANA_GarbageCollection_RowStore_Versions"
Current information about versions in row store
Checked also by SQL: "HANA_Configuration_MiniChecks" (SAP Note 1999993):
Check ID Check Description Information Source
811 Number of existing versions M_MVCC_TABLES.NUM_VERSIONS
812 Maximum versions per record M_MVCC_TABLES.MAX_VERSIONS
Row store version consolidation M_LOAD_HISTORY_SERVICE (>= SPS 09)
SQL: "HANA_LoadHistory_Services"
Historic information about versions in row store also shown in the SAP HANA Studio load graph section, also evaluated via SQL: "HANA_Configuration_MiniChecks" (SAP Note 1999993):
Check ID Check Description Information Source
811 Max. MVCC versions in row store (last day) M_LOAD_HISTORY_SERVICE.MVCC_VERSION_COUNT

Column store version consolidation  M_CS_MVCC (>= SPS 09)
SQL: "HANA_GarbageCollection_ColumnStore"
This view provides memory allocation information related to column store version consolidation.
Persistence garbage collection M_GARBAGE_COLLECTION_STATISTICS
M_UNDO_CLEANUP_FILES
SQL: "HANA_GarbageCollection_Persistence"
SQL: "HANA_Transactions_UndoCleanupFiles"

These views contain information about persistence related garbage collection and still existing undo and cleanup files for which garbage collection is still required. It is also checked by SQL: "HANA_Configuration_MiniChecks" (SAP Note 1999993):
Check ID Check Description Information Source
850 Persistence garbage collection history count M_GARBAGE_COLLECTION_STATISTICS.HISTORY_COUNT
852 Undo and cleanup files M_UNDO_CLEANUP_FILES (TYPE != 'FREE')
854 Undo and cleanup file size (GB) M_UNDO_CLEANUP_FILES.RAW_SIZE (TYPE != 'FREE')

6. What are typical reasons and solutions for problems with garbage collection?

Typical reasons and solutions for problems in the area of garbage collection described in the table below. Additional information can be found in the SAP HANA Troubleshooting and Performance Analysis Guide ("Transactional Problems") and in SAP Note 1858357.
Reason Details
Known bugs Already known bugs are described in the following SAP Notes:
SAP Note Impacted Revisions Description
1833835 < 52 A long running operation on a single table blocked the garbage collection globally with earlier revisions
1977214 < 70 Sequences (e.g. DDLOG_SEQ) not being buffered can impact garbage collection
1979087 < 71 BW infocube compression blocking garbage collection
2021186 < 90 Persistence garbage collection impacts SAP HANA restart times
2083809 60 - 69.07
70 - 74.01
80
Increasing number of versions without corresponding long running transaction
2122316 <= 85.01, < 93 Blocked garbage collection in case of backups and auditing
2124306 < 93 SAP HANA Studio may keep a cursor open permanently when administering users and roles, fixed as of SAP HANA Studio Rev. 93
2135729 90 - 92 Long running garbage collection on Rev. 90 to 92
2145269 90 - 94 When a user with an EMAIL ADDRESS parameter is created on Rev. 90 to 94, it is not properly closed afterwards and so garbage collection is impacted.
2146673 80 - 85.02 Blocked persistence garbage collection after SAP HANA node crash on Rev. 80 to 85.02
1999997 < 85.04
90 - 94
Due to a bug with Rev. <= 85.03 and Rev. 90 to 94 index garbage collection is not necessarily triggered in time and the Pool/RowEngine/CpbTree allocator can unnecessarily grow. With Rev. 85.04 and Rev. 95 a fix is delivered.
2181671  96 Due to a bug in Rev. 96 a database inconsistency can be introduced if an out-of-memory situation collides with a garbage collection.
  < 102.01 This bug can result in threads permanently stuck in ptime::Join_hash::merge_job without the possibility to cancel it.
2116157 < 102.01 The consistency check CHECK_TABLE_CONSISTENCY can be stuck for a long time in the NOT NULL check resulting in a blocked garbage collection.
2253017 102.01 - 102.02
Time consuming deleted page list recycling (call stack ptime::DeletedPageList::recycle) can lead to performance regressions and stand-still situations.
  102.04 TREXviaDBSL statements can remain open with status 'Suspended' due to an improper handling of implicit LOB streaming.

Long running update transactions, statements, idle cursors and serializable transactions You can use SQL: "HANA_GarbageCollection_Blockers_History" (SAP Note 1969700) in order to check for long running operations being potentially critical for the garbage collection.
If you identify a particularly long running operation, you should check if the long runtime is expected. If it is not expected or if it causes significant trouble, you can terminate it, either on client side (e.g. by terminating the related SAP work process) or on SAP HANA side. SAP Note 2092196 describes how active threads and connections can be terminated.
Hanging threads You can use SQL: "HANA_Threads_CurrentThreads" (SAP Note 1969700) in order to check for hanging threads.
If you identify a long running thread, you should check if the long runtime is expected. If it is not expected or if it causes significant trouble, you can terminate it, either on client side (e.g. by terminating the related SAP work process) or on SAP HANA side. SAP Note 2092196 describes how active threads and connections can be terminated.
Internal inconsistency In rare cases it can happen that the garbage collection is permanently blocked, but SQL: "HANA_GarbageCollection_Overview" doesn't report any responsible, old transaction. This can happen in case of orphan internal transaction tokens or consistent view entries. You may find these orphan objects by looking at the SAP HANA internal monitoring views M_TRANSTOKEN_DIRECTORY_ and M_CONSISTENT_VIEW_STATISTICS_. In this scenario you can resolve the problem only by restarting SAP HANA. One critical bug caused by insufficient BLOB error handling is resolved with Rev. 102.
High frequency of DML operations on single row store records Avoid massive changes of single row store table records like the QIWKTAB issue described in SAP Note 2000002, because the row store version consolidation may no longer be able to keep up with the changes. Up to SPS 08 you should try to avoid more than 100,000 changes of individual records per hour. Starting with SPS 09 the row store version consolidation is optimized and so also higher throughputs are theoretically possible (although the application design would remain questionable).
High COMMIT frequency A high COMMIT frequency like the getPageWithFreeSpaceFromFreeList issue described in SAP Note 1999998 can result in situations where the persistence garbage collection no longer keeps up with the COMMIT rate. Avoid massive short and concurrent DML operations and use packages in case of mass DML operations rather than single record operations with COMMITs.
Blocking of garbage collection on global level Row store and column store version consolidation is typically blocked on a granular table level, there are some scenarios where garbage collection is blocked globally on all tables:
  • Internal transactions like delta merges (SAP Note 2057046)
  • Certain CHECK_TABLE_CONSISTENCY activities (see SAP Note 2116157 -> "How does CHECK_TABLE_CONSISTENCY behave in terms of MVCC and garbage collection?")
Therefore you have to make sure that these operations don't run particularly long during critical time frames.
Persistence garbage collection is generally blocked globally. This means that it will not start at all, even if only a single table's version consolidation is still blocked.
Calculation engine garbage collection: Frequent CREATE and DROP operations for scenarios If the calculation engine garbage collection causes trouble (e.g. due to ceManager_GCLock and calcEngine_CalcEngineManager_ScenarioMapRWLock waits as described in SAP Note 1999998), you should check if you can reduce the number of CREATE and DROP operations for calculation scenarios from an application perspective. Starting with SAP HANA SPS 09 the garbage collection behavior is improved and so the amount of problems should reduce.

7. When is garbage collection automatically triggered?

The following table describes when garbage collection is automatically triggered:
Area Details
Row store version consolidation Garbage collection is triggered after a COMMIT and when the time limit defined with the following parameter is reached:
Parameter Default Unit
indexserver.ini -> [transaction] -> mvcc_aged_checker_timeout
3600 s

Column store version consolidation Versions are automatically consolidated during merges (see SAP Note 2057046) and compression optimizations (see SAP Note 2112604).
Memory garbage collection Memory is reclaimed in order to make sure that SAP HANA doesn't run out of memory. It is controlled by the following SAP HANA parameters:
Parameter Default Unit Details
global.ini -> [memorymanager] -> async_free_target
95 % When proactive memory garbage collection is triggered, SAP HANA tries to reduce allocated memory below async_free_target percent of the global allocation limit.
global.ini -> [memorymanager] -> async_free_threshold
100  % Proactive memory garbage collection is triggered, when the async_free_threshold of the global allocation limit is reached. With the default of 100 % the garbage collection is quite "lazy" and only kicks in when there is a memory shortage. This is in general no problem and provides performance advantages, as the number of memory allocations and deallocations is minimized.
global.ini -> [memorymanager] -> gc_unused_memory_threshold_abs 
0 (disabled) MB Memory garbage collection is triggered when the amount of allocated, but unused memory exceeds the configured value (in MB).
global.ini -> [memorymanager] -> gc_unused_memory_threshold_rel
 -1 (disabled) Memory garbage collection is triggered when the amount of allocated memory exceeds the used memory by the configured percentage.
See SAP Note 1999997 for more information related to SAP HANA memory.
LOB garbage collection LOB garbage collection (LGC) is executed on a regular basis. The frequency is based on the following parameter setting:
Parameter Default Unit Store Details
indexserver.ini -> [lobhandling] -> garbage_collect_interval_s
900 s column Defines the time interval of column store LOB garbage collections, value 0 disables LOB garbage collection
indexserver.ini -> [row_engine] -> garbage_lob_file_handler_enabled
true   row Value 'true' activates row store LOB garbage collection
indexserver.ini -> [row_engine] -> garbage_lob_file_handler_interval
10 s row Defines the time interval of row store LOB garbage collections, value 0 disables LOB garbage collection
liveCache garbage collection Garbage collection in the embedded liveCache is triggered automatically based on the following parameters:
Parameter Default Unit Details
indexserver.ini -> [livecache] -> max_early_garbage_collection_jobs
0 (early garbage collection disabled) number of threads Setting this parameter to a positive value (<= 16) activates early garbage collection with up to the defined number of concurrent jobs.
indexserver.ini -> [livecache] -> early_garbage_collection_interval
30 minutes This parameter defines the garbage collection check interval. If max_early_garbage_collection_jobs is set to a positive value and a container exceeds the early_garbage_collection_threshold (see below), garbage collection will be triggered for this container.
indexserver.ini -> [livecache] -> early_garbage_collection_threshold
50 % The threshold is specified as the ratio of the container object history count (M_LIVECACHE_CONTAINER_STATISTICS.HISTORY_COUNT+GC_PENDING_OBJECT_DELETE_COUNT) to the overall object count (M_LIVECACHE_CONTAINER_STATISTICS.HISTORY_COUNT+OBJECT_COUNT) in percent. If max_early_garbage_collection_jobs is set to a positive value andthe threshold is exceeded, the EarlyGarbageCollection jobs try to remove object history data in the container.
indexserver.ini -> [livecache] -> version_threshold
2097152 byte Defines the size of a OMS version when liveCache considers garbage collection of this version
Calculation engine garbage collection It is triggered when a calculation view / calculation scenario is dropped or at latest after 5 minutes.

8. How can garbage collection be triggered manually?

Usually there is no requirement to trigger garbage collection manually, because SAP HANA will automatically take care for it whenever necessary and possible. In specific cases garbage collection can be triggered in the following ways:
Area SAP Note Details
Row store version consolidation  
ALTER SYSTEM RECLAIM VERSION SPACE
Column store version consolidation 2112604 Old (UDIV) versions within a table can only be cleaned with a compression optimization:
UPDATE "<table_name>" WITH PARAMETERS ('OPTIMIZE_COMPRESSION' = 'FORCE')
Memory garbage collection 1999997
hdbcons 'mm gc -f'
LOB garbage collection   Hybrid LOB garbage collection can be triggered for a table using:
ALTER TABLE "<table_name>" WITH PARAMETERS ('LOB_GARBAGE_COLLECTION' = '1') 

9. Which threads are responsible for garbage collection?

Garbage collection is performed by the following threads (see SAP Note 2114710 for details):
Area Threads Details
Row store version consolidation SQLExecutor
MvccAntiAgerChecker
MVCCGarbageCollector
JobWorker (RSVersionCollectorJobNode/Partition<part>)
Version consolidation during a COMMIT is executed by the SQLExecutor thread itself.
The MvccAntiAgerChecker is responsible to check for potentially critical situations like long-running update transactions and report it in the trace files.
The MVCCGarbageCollector performs periodic version consolidation that is not done by the SQLExecutor threads as part of COMMITs.
The JobWorker threads of type RSVersionCollectorJob are triggered by the MVCCGarbageCollector in order to perform parallelized version consolidation.
Column store version consolidation MergeAttributeThread
MergedogMerger
MergedogMonitor
SaveMergedAttributeThread
JobWorker (ChangeCompressionJob)
As a side-effect of merges and compression optimizations versions in column store are consolidated by the mentioned threads.
Memory garbage collection  MemoryCompactor This thread releases no longer required memory, e.g. in cases of a memory shortage.
Persistence garbage collection JobWorker (GCJob*)
LobGarbageCollector
These threads are responsible for cleaning up garbage in the persistence area. There can be a potentially high number of concurrently running GCJob* collectors. The maximum amount of these threads is controlled by the following parameter (see SAP Note 2100040):
Parameter
 Default
 Unit
global.ini -> [persistence] -> max_gc_parallelity
0 (-> Number of logical CPU cores)
number of threads
Internally the relevant module is named GarbageCollectorJob (e.g. visible in call stacks).
liveCache garbage collection EarlyGarbageCollection
These threads are responsible for liveCache garbage collection. The amount of these threads is controlled by the following parameter:
Parameter
 Default
 Unit
indexserver.ini -> [livecache] -> max_early_garbage_collection_jobs
0 (early garbage collection disabled)
number of threads (<= 16)
Calculation engine garbage collection CalcEngineGarbageCollector
This thread is responsible for model garbage collection of the calculation engine.

10. How can the mvcc_anti_ager behavior and trace file entries be controlled?

The trace file entries generated by the MvccAntiAgerChecker can be controlled with the following SAP HANA parameters:
Parameter Default Unit Warning
indexserver.ini -> [transaction] -> idle_cursor_alert_timeout
60 minutes long running cursor detected
indexserver.ini -> [transaction] -> idle_cursor_lifetime
720 minutes The open cursor possibly blocks the garbage collection of HANA database.
Please close a cursor in application or kill the connection by "ALTER SYSTEM DISCONNECT SESSION <session>".
The long running serializable transaction possibly blocks the garbage collection of HANA database.
The long running uncommitted transaction possibly incurs performance degradation of HANA database.
The long running external transaction possibly incurs performance degradation of HANA database.
Please commit/rollback a transaction in application or kill the connection by "ALTER SYSTEM DISCONNECT SESSION <session>".
indexserver.ini -> [transaction] -> max_versions_per_record_alert_threshold
100000 number of row store versions There are too many un-collected versions
indexserver.ini -> [transaction] -> max_versions_per_table_alert_threshold
100000 number of row store versions There are too many un-collected versions
indexserver.ini -> [transaction] -> number_versions_alert_threshold
1000000 number of row store versions There are too many un-collected versions
indexserver.ini -> [transaction] -> uncommitted_write_transaction_alert_timeout
60  minutes  long running uncommitted write transaction detected
Normally general the default values are fine and there is no need for adjustments.

11. Can garbage collection be considered as defragmentation?

Garbage collection removes no longer required information and so it can more be compared with data management and archiving than with defragmentation. This means that garbage collection doesn't necessarily reduce the amount of allocated space. This is particularly valid for the persistence garbage collection. If persistence garbage collection has problems, the allocated space on disk grows, but when garbage collection is properly executed, it won't reduce. See SAP Note 2000003 ("Where can fragmentation occur in SAP HANA environments?") and consider using ALTER SYSTEM RECLAIM DATAVOLUME if you want to release fragmented space on disk level.

12. What can be the reason for alerts 73 and 74 for services different from the indexserver?

The alerts 73 ("Overflow of rowstore version space") and 74 ("Overflow of metadata version space") are issued if a significant portion of the configured version space is occupied. Normally this indicates problems with garbage collection. If the alert is reported for services different from the indexserver although their number of row store versions is at a small level, it may be a consequence of a small version space configuration. The default version space is configured with the following parameter:
Parameter Default Details
<service>.ini -> [row_engine] -> lock_table_array_size
indexserver: 1000003

xsengine, scriptserver: 101 (SPS <= 10) / 10007 (SPS >= 11)

other services: 101
This parameter defines the size of the lock table / version space. It should be set to a prime number. Large values increase the memory footprint, smaller ones impose the risk of overflows and alerts.
If you receive the alert for services different from the indexserver, you can consider to increase the related lock table array size. The increase from 101 to 10007 for the xsengine and scriptserver starting with SPS 11 will only have a minimal impact on the memory footprint.


Keywords
START_MVCC_TIMESTAMP MIN_MVCC_SNAPSHOT_TIMESTAMP



Header Data

Released On 26.01.2016 09:43:31
Release Status Released to Customer
Component HAN-DB SAP HANA Database
Priority Normal
Category How To

1 comment: