Data Storage and Data flow

Data Storage and Data Flow

SAP NetWeaver BI offers a number of options for data storage. These include the implementation of a data warehouse or an operational data store as well as the creation of the data stores used for the analysis.

Architecture

A multi-layer architecture serves to integrate data from heterogeneous sources, transform, consolidate, clean up and store this data, and stage it efficiently for analysis and interpretation purposes. The data can be stored with varying granularity in the layers.

●      Persistent Staging Area

After being extracted from a source system, data is transferred to the entry layer of the Enterprise Data Warehouse, the persistent staging area (PSA). The data from the source system is stored unchanged in this layer. It provides the backup status at a granular level and can offer further information at a later time in order to ensure a quick restart if an error occurs.

●      Data Warehouse

The way in which data is transferred from the PSA to the next layer incorporates quality-assuring measures and the clean up required for a uniform, integrated view of the data. The results of these first transformations and cleanups are stored in the data warehouse layer. It offers integrated, granular, historic, stable data that has not yet been modified for a concrete usage and can therefore be seen as neutral. The data warehouse forms the foundation and the central data base for further (compressed) data retentions for analysis purposes (data marts). Without a central data warehouse, the enhancement and operation of data marts often cannot be properly designed.

●      Architected Data Marts

The data warehouse layer provides the mainly multidimensional analysis structures. These are also called architected data marts. Data marts should not necessarily be equated with added or aggregated; highly granular structures that are only oriented to the requirements of the evaluation can also be found here.

●      Operational Data Store

An operational data store supports the operational data analysis. In an operational data store, the data is processed continually or in short intervals, and be read for operative analysis. In an operational data store, the mostly uncompressed datasets therefore are quite up-to-date, which optimally supports operative analyses.

Data Store

Various structures and objects that can be used depending on your requirements are available for the physical store when modeling the layers.

In the persistent staging area (PSA), the structure of the source data is represented by DataSources. The data of a business unit (for example, customer master data or item data of an order) for a DataSource is stored in a transparent, flat database table, the PSA table. The data storage in the persistent staging area is short- to medium-term. Since it provides the backup status for the subsequent data stores, queries are not possible on this level and this data cannot be archived.

Whereas a DataSource consists of a set of fields, the data stores in the data flow are defined by InfoObjects. The fields of the DataSource must be assigned with transformations in the SAP NetWeaver BI system to the InfoObjects. InfoObjects are thus the smallest (metadata) units within BI. Using InfoObjects, information is mapped in a structured form. This is required for building data stores. They are divided into key figures, characteristics and units.

●      Key figures provide the transaction data, that is the values to be analyzed. They can be quantities, amounts, or numbers of items, for example sales volumes or sales figures.

●      Characteristics are sorting keys, such as product, customer group, fiscal year, period, or region. They specify classification options for the dataset and are therefore reference objects for the key figures. Characteristics can contain master data in the form of attributes, texts or hierarchies. Master data is data that remains unchanged over a long period of time. The master data of a cost center, for example, contains the name (text), the person responsible (attribute), and the relevant hierarchy area (hierarchy).

●      Units such as currencies or units of measure define the context of the values of the key figures.

Consistency on the metadata level is ensured by your consistently using identical InfoObjects to define the data stores in the different layers.

DataStore objects permit complete granular (document level) and historic storage of the data. As for DataSources, the data is stored in flat database tables. A DataStore object consists of a key (for example, document number, item) and a data area. The data area can contain both key figures (for example, order quantity) and characteristics (for example, order status). In addition to aggregating the data, you can also overwrite the data contents, for example to map the status changes of the order. This is particularly important with document-related structures.

Modeling of a multidimensional store is implemented using InfoCubes. An InfoCube is a set of relational tables that are compiled according to an enhanced star schema. There is a (large) fact table (containing many rows) that contains the key figures of the InfoCube as well as multiple (smaller) surrounding dimension tables containing the characteristics of the InfoCube. The characteristics represent the keys for the key figures. Storage of the data in an InfoCube is additive. For queries on an InfoCube, the facts and key figures are automatically aggregated (summation, minimum or maximum) if necessary. The dimensions combine characteristics that logically belong together, such as a customer dimension consisting of the customer number, customer group and the steps of the customer hierarchy, or a product dimension consisting of the product number, product group and brand. The characteristics refer to the master data (texts or attributes of the characteristic). The facts are the key figures to be evaluated, such as revenue or sales volume. The fact table and the dimensions are linked with one another using abstract identifying numbers (dimension IDs). As a result, the key figures of the InfoCube relate to the characteristics of the dimension. This type of modeling is optimized for efficient data analysis. The following figure shows the structure of an InfoCube:

You can create logical views (MultiProviders, InfoSets) on the physical data stores in the form of InfoObjects, InfoCubes and DataStore objects, for example to provide data from different data stores for a common evaluation. The link is created across the common Info Objects of the data stores.

The generic term for the physical data stores and the logical views on them is InfoProvider. The task of an InfoProvider is to provide optimized tools for data analysis, reporting and planning.

Data Flow

The data flow in the Enterprise Data Warehouse describes how the data is guided through the layers until it is finally available in the form required for the application. Data extraction and distribution can be controlled in this way and the origin of the data can be fully recorded. Data is transferred from one data store to the next using load processes. You use the InfoPackage to load the source data into the entry layer of SAP NetWeaver BI, the persistent staging area. The data transfer process (DTP) is used to load data within BI from one physical data store into the next one using the described transformation rules. Fields/InfoObjects of the source store are assigned to InfoObjects of the target store at this time.

You define a load process for a combination of source/target and define the staging method described in the previous section here. You can define various settings for the load process; some of them depend on the type of data and source as well as the data target. For example, you can define data selections in order to transfer relevant data only and to optimize the performance of the load process. Alternatively, you can define if the entire source dataset or only the new data since the last load should be loaded into the source. The latter means that data transfer processes automatically permit delta processing, individually for each data target. The processing form (delta or entire dataset) for InfoPackages, that is the loading into the SAP NetWeaver BI System, depends on the extraction program used.

The Developer

Sunday, December 12, 2010