Wednesday, February 3, 2016

1698281 - Assess the memory consumption of a SAP HANA System

Symptom
Assess the memory consumption of a SAP HANA System


Other Terms
BW on HANA, sizing, memory


Reason and Prerequisites
After a DB migration (from a not - HANA Database to HANA Database) you need detailed infomation about memory consumption.


Solution
The python script memorySizing.py attached to this note analyzes the HANA database after completed migration; it collects a statistics on the memory consumption of a SAP HANA system. This comprises memory usage information also on single table and attribute level.

The information may be used for the following use cases:
  • provide information for sizing a SAP HANA system,
  • understand actual memory usage of a running SAP HANA database system
  • collect information to be used for support requests and for debugging.

The output of this script is not aimed to be directly used for licence fee or pricing considerations.

The script helps to avoid some pitfalls when using information like
provided by the operating system, for example:

1) The internal memory management of SAP HANA may
report allocated memory larger than actually used memory.

2) Tables may be loaded into main memory only when they are accessed
for the first time. Thus, the reported memory may appear too low as
unloaded tables are not considered.

In the case you miss to understand some parts of the output, you might create a message on component HAN-DB.
Installation and usage hints for the attached script

Log into the HANA system with the NewDB admin user (sidadm user) and install the attachment with these steps:
1. download the attachment to the exe/python_support directory
2. navigate to the exe/python_support directory
3. unzip   memorySizing.zip
4. tar xvf memorySizing.tar

After installing you remain logged-in as sidadm user, in the exe/python_support directory. Make sure that all environment variables like SAP_RETRIEVAL_PATH are set, e.g. execute
"source /usr/sap/<SID>/HDB<instance>/HDBSettings.sh" (for bash) or
"source /usr/sap/<SID>/HDB<instance>/HDBSettings.csh" (for csh).

The script reads metadata of all available tables in the system.  Schemas or tables that are not readable will not included in the analysis and error messages will be traced. For full information, please use a user that has read permissions for all relevant tables.

Notice:
If you call the script on a HANA database with revision lower Rev 24,
on all tables to be loaded, an explicit unload will be triggered. This is done to make sure that all parts not yet loaded are loaded into memory to generate correct evaluation results of the script. Specially the explicit unload can result in very long runtimes of the script. On higher revisions, only a 'LOAD ALL' SQL statement is issued on the tables which loads the missing parts.
Usage Information:
This usage info is pasted here informally, you should check the parameters by calling "python memorySizing.py --help" on command line before using the script.

Usage:
    This script collects a number of sizing parameters from a HANA database instance.
    The results are printed to stdout, unless you give a file name by option --file.
    The output is formatted as CSV data with semicolon as field separator; suitable for programs like MS Excel.
    The script tries to load all tables and to merge all existing deltas, in order to get correct results,
    because table parts not loaded are not counted into aggregation, and delta has a different memory to data ratio.

    For HANA Revisons less than 24, the tables are explicitely unloaded before loading, assuring correctness of evaluation.
    This may cause the script to run for several hours!

    DO NOT USE THIS SCRIPT DURING DATA LOAD PHASES:

    the load and merge loops may run some long time,
    the merging may slow down the system performance, and the new data loaded after a merge is done will produce new deltas,
    this may decrease correctness of the evaluation results

    Usage: memorySizing.py [options]

Options:
  -h, --help            show this help message and exit

  General Options:
    -a ADDRESS, --address=ADDRESS
                        server address (<server>[:<port>])for DB connection
                        [default: localhost]
    -u USER, --user=USER
                        user for DB connection
    -p PASSWORD, --password=PASSWORD
                        password for DB connection
    -s SYSPASSWD, --syspasswd=SYSPASSWD
                        password for user SYSTEM
    -v, --verbose       if present write verbose output [default: False]
    -f OUTFILE, --file=OUTFILE
                        the file to write to [default: memorySizing.csv]

  Measurement Options:
    -l LOADTABLES, --loadTables=LOADTABLES
                        if yes (no), (do not) load tables before sizing
                        [default: yes]
   -i IMMEDIATEUNLOAD, --immediateUnload=IMMEDIATEUNLOAD
                        immediately unload a table after loading (possible
                        values: no, unloaded, partially_loaded, all) - only
                        relevant together with --loadTables yes [default: no]
    -m MERGECOMPRESS, --mergeCompress=MERGECOMPRESS
                        if yes (no), (do not) merge and compress tables before
                        sizing  [default: yes]
Remarks on Usage:

Parameters -a, -p -u, -s are used as connection parameters. The user must have privileges to read the relevant schemas and system views, e.g. SYS.M_CS_COLUMNS.
Information is collected for all nodes of a distributed SAP HANA DB that is identified by the connection parameters.

The file specified in parameter -f is overwritten if a file with the same name already exists. Output is formatted as Comma Separated Values (CSV) that can be imported into tools like MS Excel, the field delimiter is semicolon; the file may also be attached to support messages. The file consists of different blocks of measured results, each preceeded by a line with a brief description of the data and a heading line giving the meaning of the columns of the data block.

Parameter -l makes sure that all tables are loaded into main memory for sizing purposes.  Depending on the data volume this can take several hours.  However, on a multi-core system with x cores, the script will spawn x/2 threads to parallelize the loading task.
The script can be used to assess the memory when all data is in use, e.g. for purposes of sizing. Notice, that additional main memory may be required, e.g. for merging tables. So the overall memory consumption reported cannot be used immediately for sizing.
Parameter -i can be used to unload tables after they were loaded.  After a table was loaded into main memory the main statistics on the table are available for the remaining process.  To save main memory or to recover the state of loaded tables as of before the script was running one can use this option:
  • "-i no" will not unload any tables after they were loaded.  Use this for most precise results.  This is the default
  • "-i unloaded" will unload all tables that were completely unloaded before the script was started. Partially or fully loaded column tables will remain in memory as fully loaded.
  • "-i partially_loaded" will unload all tables that were completely unloaded or partially loaded before the script was started. Fully loaded column tables will remain fully loaded.
  • "-i all" will unload all tables unconditionally.
Parameter -m makes sure that all table updates are merged into the main storage and compressed before applying the sizing for more robust information on memory usage.

The overall runtime of the script on a large database may take several hours to complete. If you want to see quickly a rough overview of the memory usage, you may call the script with merging and loading switched off (-m no -l no), it needs then only a few minutes. For a correct sizing information, you call it thereafter with both options switched on.

Besides merging and compressing of table content the system remains unchanged. Thus, the script can be applied multiple times. The result changes between different calls will be caused by user or system activity, collected system statistics data for example.
Understanding the results:
For understanding the meaning of the script output, you should read the HANA Administration Guide, to be found with this link: http://help.sap.com/hana_appliance#section4 -  click on the right side on "SAP HANA Database # Administration Guide" and goto page 31, chapter "Monitoring.Memory.Usage".



Header Data

Released On 25.11.2014 08:10:13
Release Status Released for Customer
Component HAN-DB SAP HANA Database
Priority Recommendations / Additional Info
Category Consulting

1 comment: