Intro Sap Data Archiving
Intro Sap Data Archiving
Intro Sap Data Archiving
Version 1.2
Copyright
Copyright 2006 SAP AG. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. Microsoft, WINDOWS, NT, EXCEL, Word, PowerPoint and SQL Server are registered trademarks of Microsoft Corporation. IBM, DB2, OS/2, DB2/6000, Parallel Sysplex, MVS/ESA, RS/6000, AIX, S/390, AS/400, OS/390, and OS/400 are registered trademarks of IBM Corporation. ORACLE is a registered trademark of ORACLE Corporation. INFORMIX-OnLine for SAP and INFORMIX Dynamic ServerTM are registered trademarks of Informix Software Incorporated. UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group. Citrix, the Citrix logo, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, MultiWin and other Citrix product names referenced herein are trademarks of Citrix Systems, Inc. HTML, DHTML, XML, XHTML are trademarks or registered trademarks of W3C, World Wide Web Consortium, Massachusetts Institute of Technology. JAVA is a registered trademark of Sun Microsystems, Inc. JAVASCRIPT is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape. SAP, SAP Logo, R/2, RIVA, R/3, SAP ArchiveLink, SAP Business Workflow, WebFlow, SAP EarlyWatch, BAPI, SAPPHIRE, Management Cockpit, mySAP Business Suite Logo and mySAP Business Suite are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other products mentioned are trademarks or registered trademarks of their respective companies.
SAP AG 2007
Page 2
Version 1.2
Table of Contents
INTRODUCTION ............................................................................................................................................................ 6 CONTENTS AND OBJECTIVE OF THIS GUIDE ..................................................................................................................... 6 1 THE PURPOSE OF DATA ARCHIVING............................................................................................................. 7 1.1 THE CRITICAL EFFECTS OF GROWING DATA VOLUMES....................................................................................... 7 1.2 WHAT ARE THE BENEFITS OF DATA ARCHIVING? ............................................................................................... 8 1.2.1 Greater System Availability ........................................................................................................................ 8 1.2.2 Improved Performance and Response Times.............................................................................................. 8 1.2.3 Save Costs by Optimizing Your Available Resources ................................................................................. 8 1.3 DATA ARCHIVING IN THE CONTEXT OF INFORMATION STORAGE ........................................................................ 9 1.3.1 Requirements for Stored Data .................................................................................................................... 9 1.3.2 Print Lists.................................................................................................................................................. 11 1.3.3 Data Retention Tool.................................................................................................................................. 12 1.4 DATA MANAGEMENT: PREVENTION, AGGREGATION, DELETION, AND ARCHIVING ........................................... 12 1.5 PURPOSE AND SUITABILITY OF DATA ARCHIVING ............................................................................................. 14 2 THE BASIC DATA ARCHIVING FUNCTIONS ............................................................................................... 16 2.1 BASIC TERMS USED IN DATA ARCHIVING ......................................................................................................... 16 2.1.1 Archive Development Kit .......................................................................................................................... 16 2.1.2 The Archiving Object ................................................................................................................................ 16 2.1.3 Archive Administration ............................................................................................................................. 18 2.1.4 Archiving Session...................................................................................................................................... 18 2.1.5 Archive Management ................................................................................................................................ 19 2.1.6 Residence Time ......................................................................................................................................... 19 2.2 THE DATA ARCHIVING PROCESS ....................................................................................................................... 20 2.2.1 Data is Written to the Archive .................................................................................................................. 20 2.2.2 Deleting Data from the Database ............................................................................................................. 22 2.2.3 The Storing of Archive Files ..................................................................................................................... 22 2.2.4 Other Processes and Functions ................................................................................................................ 24 2.3 IMPORTANT DATA ARCHIVING FEATURES ......................................................................................................... 26 2.3.1 Data Security ............................................................................................................................................ 26 2.3.2 Archiving in Online Mode......................................................................................................................... 26 2.3.3 Data Compression .................................................................................................................................... 26 2.3.4 Release- and Platform-Independent.......................................................................................................... 26 3 TECHNOLOGY AND ADMINISTRATION ...................................................................................................... 27 3.1 THE BASIS TECHNOLOGY OF THE SAP ARCHIVING SOLUTIONS: ADK.............................................................. 27 3.1.1 ADK Positioning and Components ........................................................................................................... 27 3.1.2 ADK Runtime Environment....................................................................................................................... 27 3.1.3 ADK as a Development Environment ....................................................................................................... 29 3.2 TASKS OF THE DATA ARCHIVING ADMINISTRATOR ........................................................................................... 29 3.2.1 The role Data Archiving Administrator................................................................................................ 29 3.2.2 Analysis..................................................................................................................................................... 30 3.2.3 Monitoring ................................................................................................................................................ 32 3.2.4 Settings...................................................................................................................................................... 34 3.2.5 Data Archiving Statistics .......................................................................................................................... 36 3.2.6 Reorganization of the Database After Data Archiving ............................................................................. 37 3.3 AUTOMATED PRODUCTION OPERATION ............................................................................................................ 39 3.3.1 Periodic Archiving.................................................................................................................................... 39 3.3.2 Scheduling Data Archiving Jobs............................................................................................................... 40 3.3.3 Interrupting and Continuing the Write Phase........................................................................................... 41 3.3.4 Options for Automating Dependent Processes ......................................................................................... 42 3.3.5 Controlling Data Archiving Jobs Using an External Job Scheduler ........................................................ 42 4 STORING ARCHIVED DATA ............................................................................................................................. 43 4.1 4.2 INTRODUCTION .................................................................................................................................................. 43 STORAGE ON AN SAP CERTIFIED STORAGE SYSTEM USING SAP ARCHIVELINK .............................................. 44
SAP AG 2007
Page 3
Version 1.2
4.2.1 Important Terms in Conjunction with SAP ArchiveLink........................................................................... 44 4.2.2 The Function of Archive Link ................................................................................................................... 46 4.2.3 Storing Archive Files ................................................................................................................................ 47 4.2.4 Accessing Archive Files ............................................................................................................................ 47 4.2.5 Certified Systems and SAP Content Server............................................................................................... 48 4.2.6 Searching for Documents.......................................................................................................................... 48 4.3 STORAGE USING AN HSM SYSTEM ................................................................................................................... 49 4.3.1 What is HSM? ........................................................................................................................................... 49 4.4 MANUAL STORAGE ............................................................................................................................................ 50 4.4.1 Direct Integration ..................................................................................................................................... 50 4.4.2 Indirect Integration................................................................................................................................... 50 4.4.3 Advantages and Disadvantages of Manual Storage ................................................................................. 50 4.4.4 Criteria for Choosing a Storage Strategy ................................................................................................. 50 4.4.5 ICC: Premium Integration Service ........................................................................................................... 52 5 ACCESS TO ARCHIVED DATA......................................................................................................................... 52 5.1 INTRODUCTION .................................................................................................................................................. 52 5.2 THE FUNDAMENTALS ........................................................................................................................................ 53 5.3 SEQUENTIAL READ PROGRAMS ......................................................................................................................... 54 5.4 ARCHIVE INFORMATION SYSTEM ...................................................................................................................... 54 5.4.1 Creating an Infostructure ......................................................................................................................... 55 5.4.2 Activating an Infostructure ....................................................................................................................... 56 5.4.3 Building an Infostructure.......................................................................................................................... 56 5.4.4 Displaying an Infostructure ...................................................................................................................... 56 5.4.5 Deleting an Infostructure.......................................................................................................................... 56 5.4.6 Using Display Variants............................................................................................................................. 57 5.5 DOCUMENT RELATIONSHIP BROWSER ............................................................................................................... 57 5.5.1 DRB and Data Archiving .......................................................................................................................... 57 5.5.2 Searching Options..................................................................................................................................... 57 5.5.3 Configuring and Personalizing DRB ........................................................................................................ 58 A. GLOSSARY ......................................................................................................................................................... 60 B. ADDITIONAL INFORMATION AND SERVICES ....................................................................................................... 63 SAP Service Marketplace.......................................................................................................................................... 63 SAP Library .............................................................................................................................................................. 63 Archiving Your SAP Data ......................................................................................................................................... 63 Training Courses ...................................................................................................................................................... 64 Consulting and Services............................................................................................................................................ 64
SAP AG 2007
Page 4
Version 1.2
Document History
Version 1.0 1.1 1.2 Date May 2004 May 2005 July 2007 Changes n.a. Corrected minor translation errors Updated terminology (incl. in graphics), sources of information, documentation path names, SAP Notes.
SAP AG 2007
Page 5
Version 1.2
Introduction
Even the most modern and technologically advanced database systems can suffer from performance bottlenecks caused by large data volumes. On the application side these bottlenecks manifest themselves in the form of poor system performance and on the administration side in the form of an increased use of resources. High data volumes can also have a considerable effect on the Total Cost of Ownership of a system, in spite of falling storage prices. To avoid the negative effects of large data volumes on costs, performance and system availability, business complete data, which is data no longer needed in everyday business processes, should be removed from the database. However, simply deleting this data is not a useful option in most cases, because often times the data still needs to be available for read accesses. Therefore the data needs to be removed from the database and stored in such a way that it can still be read-accessed later. SAP data archiving is the only method supported by SAP to remove application data from the database in a consistent, secure and comprehensive manner. Consistency is ensured through the use of checks performed by the archiving programs. A purely database-integrated archiving is not used, because the database does not know the business context of the data to be archived. Using data archiving you can select significant objects, such as accounting documents, material master records or HR master data, and remove them from the database, without having to worry about the fundamental table design of the linked data. The archived data is stored in a file system and from there can be moved to other storage media. For security reasons the archived data is not deleted from the database until the archive files have been read and hereby confirmed.
SAP AG 2007
Page 6
Version 1.2
100 0
Mar Jun Sep
24 M onths
Figure 1: Database Growth with and without Data Archiving This example is typical for many business systems: The database contains data of closed business processes, although this data is no longer needed in everyday business processes and should no longer reside in the database. To avoid bottlenecks in your system, which are mainly caused by the amount of data in the database, you have the following options: Expand your system resources This implies upgrading your hardware, which would entail for example adding more disk space, increasing your main memory, or improving your CPU. However, it is only a matter of time before your system's limits would be reached and performance would be affected again. Moreover, upgrades of this kind usually involve high costs. If you are facing especially strong data growth, data archiving would be preferable, because it offers a better cost-benefits relationship than a hardware upgrade does. Reduce or limit data growth The goal of this strategy is for you to be able to live with your existing resources for as long as possible, without having to perform constant hardware upgrades. The idea is to focus on your current data volumes
SAP AG 2007
Page 7
Version 1.2
in the database and data growth. By actively working on these parameters you will be able to most optimally use your system resources. You can reduce data volumes by deleting or archiving data, if it makes sense from a business point of view. You can also use data prevention in addition to the aforementioned measures to avoid the creation and persistent storing of unnecessary data, such as log files or spool requests. For more detailed information see the section entitled "Data Management: Prevention, Aggregation, Deletion, and Archiving".
SAP AG 2007
Page 8
Version 1.2
important performance parameters and resources that come into play, such as the number of processors and their speed, as well as the size of the main memory. In an extreme case scenario the database server will have already reached its load limits and cannot be expanded any further. In this situation a company would have to look towards other options that may involve new investments. There are also other factors that affect the overall cost of a companys system. These include the fact that data needs to be transferred from one production system to other development and consolidation systems. As a result it is not uncommon for a company to have three to four copies of its entire data set. The pure costs of the storage disks then only comprise a small fraction of total storage costs; administration costs are five to seven times higher than the cost of hard disks, and including the costs of database administrators often times lie even higher. If we take all these factors into account, it is not surprising that customers are paying up to several thousand euros per gigabyte of production disk space. Thus, any reduction in your total data holdings will help you use your existing resources, such as hard disks, memory, CPU, and network, and put off new investments for a longer period of time. Lower data volumes also mean that you will have to spend less time on database administration, which of course has a positive effect on overall costs. From these arguments we can see that data archiving plays an important part in lowering a company's TCO and helps generate a faster ROI.
SAP AG 2007
Page 9
Version 1.2
Display The archiving solution must ensure that when archived data is accessed it can be read and interpreted, and that it is possible to display the content of the data in a visually correct manner. In this context readable means that the information can be displayed by a third person using the appropriate read programs. 1.3.1.2 Business Requirements From a business point of view only data that is no longer needed in everyday business processes can be removed from the database. Special checks have to be performed to make sure that only data from closed business processes is archived. Business objects within a business process can have many dependencies, due to the high integration of applications in the SAP System. Therefore, before archiving a specific object, checks have to be performed in case other objects have to be archived first or at the same time as the business object in question. Although data archiving can also be used for master data, such as material, condition or HR data, transaction data makes up the bulk of the data that is archived. Master data generally remains in the system longer and uses less space in the database than transaction data. Therefore master data is not archived as often as transaction data. Transaction data is created during practically every transaction within a process chain, which causes data growth in the database. Therefore transaction data is also called mass data. Figure 2 shows a part of a business process used to complete a sales order in the SAP R/3 Enterprise application Sales and Distribution (SD) and the corresponding archiving objects. A delivery is created based on a sales document. An invoice is then created in response to the delivery, and the sales process in SD is completed. The invoice is then sent to financial accounting, where more documents are created.
Sales document
Delivery document
RV_LIKP
Billing document
SD_VBRK
...
SD_VBAK
...
Figure 2: Documents and Archiving Objects Used During the Business Process "Sales Order" Depending on the complexity of the business processes other documents could be created in addition to the ones generated in SD. This could be in the area of Production Planning (PP), if goods are produced internally. Unlike most master data the documents that were created as part of this kind of process chain are only relevant to operations for a limited amount of time. They can therefore be removed from the database after a relatively short period of time, using the appropriate archiving objects. An example of master data that is often archived is product data. Depending on the life cycle of the product, the corresponding data may have to be kept for several decades. The customer service departments of a company have to be able to access certain product information during the entire life cycle of the product. This includes construction relevant CAD drawings, bills of material, built in components, production resource tools, and task lists. This list shows that in these processes the system has to access different data objects that characterize the same final product. In certain cases it may even be necessary to reconstruct the entire process in which the product was involved. During internal revisions it is often necessary to access archived data, to monitor specifications and provide material to support decisions. Generally internal revisions take on the monitoring and control functions of the management. Here it is important for the archiving solution to guarantee truth and fairness, security, and efficiency. Internal revisions can touch any area of a business. Therefore, it is important to decide within a company which data should be kept in addition to that required by law.
SAP AG 2007
Page 10
Version 1.2
Statistical analyses bring together important information about production, sales, and markets, and help the management make decisions for their business. Some analysis programs offer you the option to also analyze, format and weight archived data. However, this is not very common, because for technical reasons, analyzing archived data can involve long runtimes. More specifically, this has to do with how the analysis program selects the archived data. An analysis involving archived data requires the system to open many archive files and read all of the relevant data objects, in order to select the requested fields. As a result runtimes for the analysis of archived data can be very long. Therefore, if you need to run comprehensive analyses that include archived data, we recommend that you use a data warehouse, such as that from SAP NetWeaver BI. 1.3.1.3 Legal and Tax Requirements Businesses have to follow general accounting rules and produce legally compliant, complete, correct and verifiable data. Although this information may vary from country to country in terms of its content, all countries adhere to the same general accounting guidelines. Therefore, we will not discuss these any further in the context of data archiving. We want to focus on tax relevant data in this section. Considerations in this context are becoming increasingly more important, because many countries are switching to paperless tax audits, that is tax audits take place on electronic data instead. For example, the new German law that went into effect on January 1, 2002 in Germany (GDPdU), sets forth regulations regarding the actual data being audited, but also the form in which the auditor has access to this data. Although tax laws may vary greatly from country to country, data retention requirements seem to be a common denominator among them. These requirements determine that tax relevant documents may not be deleted. They must be kept in case they are needed during a tax audit. However, it is important to differentiate between keeping or storing data and data archiving. How the data is to be 'kept' and for how long can differ from country to country. Therefore it is difficult to make any general statements in this respect. In Germany, for example, a company is required to keep most of its documents for ten years, although some only have to be kept for six. Another common requirement is that data should be kept in such a way that it cannot be changed, to avoid any type of data manipulation. What role does data archiving play in this context? Data archiving was not designed to be used for tax purposes or to meet other requirements in the context of data storage. Rather its purpose was to help reduce the data load on the database. Data archiving deals with data from completed business processes, which only needs to be accessed in the read-only mode. Therefore, the display programs for archived data often do not have the same functions as display programs for online data that resides in the database. Data archiving ensures that no data is lost. The data is merely moved from the database to the file system and continues to be part of the original system in which it was produced. It has the same significance as the database data, which has not been archived yet. This means that tax authorities have access to all data in the system, with some more restricted reading functionalities for certain data. If, however, the data is taken out of the original system, for example by being stored on a storage system1, which cannot be accessed from the original system, then it is necessary to also archive the context of this data, to ensure that it can be interpreted at a later point in time. The context must be limited to only the most necessary information, due to storage space limitations. SAP offers a tool called DART to extract tax relevant data from the original system. It has functions that can be used to create transparent files with actual data, and to display this data. For more detailed information about DART see Chapter 1.3.3. Finally, we would like to refer back to the differences that exist between the different countries. Because of these differences it is impossible for us to issue any general statements as to which and how much information must be stored and for how long. It is very important that you familiarize yourself with the requirements specific to your country. For more information about data archiving in the context of GDPdU and DART see SAP Note 670447.
In this case we are not referring to the storage of archive files on an external storage system connected via the ArchiveLink interface, because with ArchiveLink storage systems the data can still be accessed from the original system. Rather we are referring to cases where alternative archiving and storage processes, which are different from SAP data archiving, are used.
SAP AG 2007
Page 11
Version 1.2
print lists has several advantages: It saves space, data can be made available more readily because it is stored centrally, it is possible to conduct indexed or text-based searches, and the reports, which are often used for tax audits, can be managed centrally. Print lists are produced as long as the data is in the database. A stored or printed print list represents a defined business data relationship, which is based on the SAP data set as it appears at the time the list is stored or printed. Because print lists can be created individually at any time, existing print lists must differentiate whether they are based on already stored data or data that still resides in the database. As of SAP R/3 3.1 it is possible to refer to already stored documents from the print lists themselves, using hyperlinks. If, for example, the document number 4711" is part of the print list information, you can set a hyperlink to the already scanned or stored original document invoice R816. Depending on the type of print list used, the actual data relationship in the SAP system, which implies a new version of the print list, can vary from the original print list. It is also possible that the data of the stored original document no longer coincides with the data of the SAP document. This is especially important to remember with print lists that are used in tax audits. Print lists are not used to facilitate data searches. Rather they provide an excerpt of the data in a system at a specific point in time and in a specific business context. Unlike other data that is archived, print lists do not need context information to be interpreted, because they already contain the context information. For this reason, print lists can be viewed as complete information units that can also be interpreted outside the original SAP system. As of SAP R/3 3.1 it is possible to conduct indexed searches using freely defined indexes within a print list. Free searches in the list area currently being read has always been possible. The Archive Information System offers functions for searching historical, meaning archived data. SAP ArchiveLink offers a search function for stored print lists. You can search via the management record using the object type, document type and the report name of the print list. In addition, the Document Management System (DMS) offers the function to access print lists via individually definable search criteria; also, SAP ArchiveLink can store document information records in DMS.
SAP AG 2007
Page 12
Version 1.2
can indicate that spools should be automatically deleted after their output. This will considerably reduce the growth of spool table TST03 in the future. 2. Aggregation In some cases it is possible to aggregate data, for example through forming sums. If aggregated information fits your requirements for certain data, you should make use of this method. An example of how to apply this method would be line item aggregation in CO. As a result you avoid writing a line in CO for every line that is written in the original document (for example a material booking). Note that aggregation does not have an immediate effect, because it only influences postings that take place in the future. Old documents are not affected, and may require archiving. 3. Deletion A lot of the data that is not required to be stored in an archive can be deleted from the system shortly after its creation. An example of this type of data are batch input folders. Generally, these folders are created when data is transferred to the SAP system in the background, for example during a legacy data transfer. These folders are not needed after the transfer. Processed folders can then be deleted using a delete program. 4. Archiving Archiving should be used for data that cannot be prevented nor deleted. Check how long your data needs to remain in the system from a business point of view. Only archive the data that is no longer needed in the processes of the production system. Archiving is not a cure-all means to reduce your data volumes as much as you like. The following figure can help you decide when to use data archiving:
Legend:
Impact on data volumes in future Impact on current data volumes
No
Data prevention
Yes
Data prevention
Yes
Data deletion
No
Yes
Data archiving
Figure 3: Deciding If and When to Archive Data archiving should be an integral part of every companys data management process, as should the other methods for reducing data mentioned in this section. For more information about data management, including detailed recommendations for specific critical tables see the Data Management Guide, which you can download from the SAP Service Marketplace (/data-archiving Media Library Literature & Brochures).
SAP AG 2007
Page 13
Version 1.2
Before you start archiving make sure you fulfill all documentation requirements by creating all the necessary information (such as DART extracts) you may need for later audits. 4. Is data archiving enough for auditing purposes? Data archiving was not designed as a tax and audit tool. It can support you in meeting the requirements of tax authorities, by conserving data and making it available over a longer period of time. The tax and audit tools are the Audit Information System (AIS) and DART. If you want to use DART you should do so before you archive your data. During an audit it is also possible to access archived data, in case more detailed information is needed that does not appear in the other documents you previously created. Archived data should only be included in the auditing process if additional data is requested for the audit. To make the auditing process easier you should try to meet all requirements during the time the data is still in the database. 5. When is data archiving beneficial? Archiving application data is beneficial when the effort spent on maintaining your database is becoming too expensive and when, at the same time, you can store and manage the archived data without spending huge amounts of money. If the costs of accessing the archived data grow higher than the costs of storing and managing the data in the database, then the data should remain in the database. If you are weighing the costs of using data archiving, possibly to avoid having to purchase a new, more powerful database server, you should also include the costs of accessing the archived data in your calculation.
SAP AG 2007
Page 14
Version 1.2
6. What status does archived data have? Archived data cannot be modified and can therefore no longer be used in the processes of current business operations. It is inseparable from the system in which it was created and can only be accessed and interpreted from this system. If you need historical data for informational purposes, you can read access the archived data. However, because this data has been stored away from any changes in the system (for example a reorganization), it cannot be guaranteed that its contents and structure match the context currently being used. Moreover, some archiving objects only offer a very restricted access to archived data for example only with single document displays or evaluations. 7. Can archived data be reloaded into the database? From a technical standpoint, archived data can be reloaded into the database. However, because we are dealing with historical data, which has not been part of any changes in the database, you run the risk of generating inconsistencies in your database. Therefore we always discourage the reloading of data back into the database. An exception would be if you reload archived data immediately after it has been archived. A reload of this kind would be an emergency measure after a failed or unsuccessful archiving session, because, for example, you archived the wrong data due to an error in the Customizing settings. Reloading is only possible with some archiving objects. 8. When should you start archiving your data? You should begin with data archiving before it is too late and you have exhausted all the alternative measures for improving the condition of your system. This includes planning how big your system needs to be based on your anticipated data volume (sizing), and determining the residence times of your data. The latter point refers to the amount of time the data must remain in the database. You should also identify and fulfill any audit requirements before you begin archiving your data. Always keep in mind that data archiving slows down the growth of your database. It cannot stop the growth completely. Therefore the goal of data archiving is to keep your system under control over a long period of time, not to return your system to a controllable state. 9. How can you access archived data? There is not general answer to this question valid for all components of the SAP Business Suite. How archived data is accessed depends on the application and the archiving object that was used. You have the following options to access archived data: Ideally the user displays the archived data directly from his or her usual transaction. However, this can only work if index information about the archived data is kept in the database. This can be done using the Archive Information System (AS), for example. In certain cases it is even possible to run an analysis using archived data. Because these types of evaluations usually involve long runtimes (see 0) and usually do not make sense from a business point of view (because the data stems from closed business processes), only a few reports are available for this purpose. Therefore, if you need to run comprehensive analyses that include archived data, we recommend that you use a data warehouse, such as that from SAP NetWeaver BI. An analysis function always depends on the application, which decides how detailed the result of the analysis will be and whether or not a mixed (online and archived data together) analysis is allowed. When data is displayed in a list form, you are not notified that the data selection may be incomplete, because some of the data has already been archived. The Archive Development Kit (ADK) and the Archive Information System (AS) can be used to adapt the access to archived data to customer-specific needs. For more information about how to do this, visit the training course BIT 670 or see the corresponding documentation.
10. Can print list archiving be used as a substitute for data archiving? No, print list archiving is not a substitute for data archiving. Print list archiving and data archiving complement each other. Print lists are produced as long as the data is in the database. Later print lists can be archived, by being moved to a storage system that is integrated via an SAP ArchiveLink interface. Print lists are mainly created and archived to use in future
SAP AG 2007
Page 15
Version 1.2
evaluations and audits. If at a later point in time you find that you need more information for an evaluation or an audit, you can access the archived data at any time.
Database
ADK
Adjustment of codepage, character formats, structural changes, compression, file handling, job control
Archive files
Figure 4: ADK Integration and Functions For more detailed information on ADK see Chapter 3 Technology and Administration.
SAP AG 2007
Page 16
Version 1.2
can be completely removed from the database. In addition, an archiving object also contains the archiving programs and Customizing settings, necessary for archiving its corresponding business object type. More concretely, an archiving object has the following components: Data declaration Is used to define which data from the database makes up a business object. Archiving programs Write program: Writes the business objects sequentially to the archive files. Delete program: Deletes the business objects from the database after it has read and confirmed them in the archive file. Preprocessing program (optional): Prepares the data for archiving. This includes, for example, the setting of an archivability indicator (also called a deletion indicator). Read program: Is used to display an archived business object. Postprocessing program (optional): Is used to process data after archiving; for example to update statistic data or indexes. Reload program (optional): Reloads the archived business objects into the database.
Customizing settings Used to define the archiving-object-specific parameters for a specific archiving session. These settings depend on the archiving objects and may therefore vary. The following figure shows the different components of an archiving object.
Archiving Object
Data
Customizing
Programs
Figure 5: Components of an Archiving Object Archiving classes Another common term used in data archiving is "Archiving Classes. It refers to data objects that are not independently defined from a business process point of view, but which belong together from a technical point of view; these include SAPscript texts, change documents, and classification data. These data objects are created when a business object is created or modified and they are usually archived together with their corresponding business objects. Access to this archived data usually also takes place using archiving classes. Archiving classes are developed by SAP, but they may be used in customer-specific programs. However, it is only possible to use them in connection with an archiving object.
SAP AG 2007
Page 17
Version 1.2
SAP AG 2007
Page 18
Version 1.2
Sales document
Archiving Reached
Residence time
SAP AG 2007
Page 19
Version 1.2
Online
Display
Online/Nearline
Archive access
Offline
Database
File system
Storage system
Database tables
Archive files
Archive files
Application data
Store
Figure 8: The Data Archiving Process The data archiving process in the SAP system can be divided into the following steps: 1. Data is written to the archive The data to be archived is read from the database and written sequentially to newly created archive files. 2. Data is deleted from the database The delete program deletes the data from the database after it has been completely written to the archive files. To ensure the integrity of the archived data, the delete session is not started until the created archive files have been read and confirmed. 3. Archive files are stored The archive files that were created during the write phase can be moved to storage systems or to tertiary storage media. The storage phase is not part of the actual archiving process. Another option would be to start the storage phase before the deletion phase.
SAP AG 2007
Page 20
Version 1.2
same variant is being used for two different sessions you run the risk of archiving the same data twice, which could lead to erroneous results in the statistics of the archived data. If you have chosen a variant that has already been used, Archive Administration will issue a warning.
However, a warning of this kind merely alerts you to the fact that you may be archiving the same data more than once. You can still continue to archive the data. Make sure you think over how you proceed with your archiving. The program does not check whether the selection values overlap in variants with different names. It is possible that when you schedule your write job with these variants, that you are actually archiving the same data more than once, without receiving any special warning. When you execute the write program ADK first generates an archiving session. Then the write program selects all the data that belongs to the first logical business object. As soon as the first business object has been written, ADK creates a new archive file. The next business objects are written to the next archive file. If the predetermined file size has been reached and there are still business objects to be written, ADK closes the current archive file and creates a new one. A business object is never divided and written to two different archive files. This ensures that all data of a logical business object is physically saved together in one archive file. A business object can either be read completely or not at all. This prevents any inconsistencies from occurring later, when the data is deleted. After the data of the last business object in your selection has been copied, ADK closes the last archive file. From the ADK point of view the writing phase is a purely technical process in which it does not matter whether the data originated from a single database table or from several different tables. This knowledge is stored in the archiving object on which a particular archiving session is based. The write process ends when one of the following events occurs. Archiving has been completed. The archive file has reached the maximum size that has been set in archiving-object-specific Customizing. The maximum number of business objects in the archive file, set in archiving-object-specific Customizing, has been reached. If, during the write phase, you find that there is not enough available storage space for all the archive files or if you know that you only have a certain amount of space available, you can interrupt the write phase and continue it at a later point in time. For more detailed information about this process see Chapter 3.2.3 Interrupting and Continuing the Write Phase. 2.2.1.1 Archivability Check To ensure that you do not archive any data that still belongs to currently open business processes, it is necessary to run an archivability check before the write phase. The check makes sure that the data to be archived meets the archivability criteria and can therefore be removed from the database. Which archivability criteria are used for a specific business object type mainly depends on the application in which it was created. Generally we can say that a business object, such as a sales or material document, is considered to be archivable if it has been completed, has reached the residence times entered for it, and is no longer used as a basis for other business objects. The archivability check takes place either directly in the write program or in a special check program that has been scheduled as a preprocessing program. Which of these methods is used depends on the archiving object and cannot be determined by the user. Checks in the write program For a write program to be able to carry out the archivability check, it has to contain the entire technical and business process logic of a check program in addition to its own write logic. This allows it to only select those business objects that meet the archivability criteria and to only write these to the archive.
SAP AG 2007
Page 21
Version 1.2
Check in the preprocessing program In some cases the archivability check is carried out by a check program that runs before the write program and marks the archivable business objects with an indicator. Depending on the application this indicator is called the deletion flag or deletion indicator. Also used is the expression To be archived indicator. This type of check program is scheduled in Archive Administration as a preprocessing program for a specific archiving object. With some SAP solutions, such as SAP CRM, you have the additional option to run an archivability check for several archiving objects at once. This check is carried out by a check program that is independent of ADK and runs in the background. It marks the checked business objects with an archivable indicator.
SAP AG 2007
Page 22
Version 1.2
database. How the phases are supposed to be executed has to be decided by each data archiver, according to his or her company's requirements. Generally it is not sufficient to write application data to the archive and then remove it from the database. The archive files themselves must be made available, so that the data they contain can be accessed. In general you need a storage system, and if you manually store your data, then you need a strategy for managing and securely storing the archive files. The storage phase can begin as soon as one or more archive files have been generated by the write program and closed correctly. Choosing the right time to store your data You can determine when the archive files are to be stored in archiving-object-specific Customizing in the Sequence area. Which option you choose is mainly determined by security issues. For example, if you choose Store Before Deleting the data is only deleted from the database once the archive file has been stored in a storage system. If you set the Delete Program Reads from Storage System indicator you can increase data security, although performance may decline. In this case the check read of the data during the delete phase takes place directly from the storage system and not the file system. In other words, before you delete you can double check that the archive file was stored successfully. To store archive files you have the following options (for more detailed information on this topic see Chapter 4 Storing Archived Data). Storage system If you are using a third party storage system connected via the SAP ArchiveLink interface (see Chapter 4) , then you can instruct it to store the archive file at the end of the write or delete phase. For this to occur ADK sends a command to SAP ArchiveLink, which controls the communication with the storage system. The storage of files takes place either when the write phase is completed or not until the delete phase was also completed. You can set the sequence of events in archiving-object-specific Customizing. The storage phase can also be triggered manually. When you do, ADK sends a command to SAP ArchiveLink, which controls the communication with the storage system. The storage of files takes place either when the write phase is completed or not until the delete phase has also been completed. HSM systems With this type of storage the archive files are created directly in the file system which is linked to the storage hierarchy of an HSM system (see Chapter 4). The HSM system independently takes care of the storing and management of the data. Communication does not have to take place via SAP ArchiveLink. Neither does the storage process have to be controlled from the SAP system side. All you need to do is enter the path to the target file system in Basis Customizing under Cross Client File Names/Paths. The HSM system is set up in such a way that the user can access the data there as if it were located on a local system. From the point of view of the user the data storage using an HSM system is transparent, meaning that he or she cannot see where the data is located. The only indicator that the data being accessed may be archived and therefore located on a slower medium, would be a slightly slower response time. Alternative storage media If you prefer not to store your data on a storage or HSM system, you can also manually move it to tertiary storage media (magnetic tapes, CD ROMs, optical disc, etc.) or store it using standard backup mechanisms (backup, mirroring, etc.). The stored archive files are then managed by your IT department. This type of storage may be less costly and easier to implement, but it requires more maintenance and management efforts. In addition you would need to implement a comprehensive management strategy to ensure the safety of the data (for example, periodically moving the data to a new, more robust storage media). To be able to manually store your archive files on tertiary storage media, the archive files must have been closed correctly during the write phase and the automatic storage of the data has to be switched off. However, you cannot manually store your data before or during the delete phase, because the delete program must be able to access the archived data. A great disadvantage of the manual storage of archived data is the fact that the management of this data is usually also performed manually. This makes it considerably more difficult to automatically find and retrieve the archive files later.
SAP AG 2007
Page 23
Version 1.2
SAP AG 2007
Page 24
Version 1.2
2.2.4.2 Scheduling pre- and postprocessing programs Some archiving objects offer pre- and postprocessing programs in addition to the write and delete programs that all archiving objects must offer. The purpose of these additional programs can vary from one archiving object to another. The following section describes the purpose of pre- and postprocessing programs and how they are used in different real life scenarios. If an archiving object offers one or both of these programs you can operate them using the functions Preprocessing or Postprocessing. Preprocessing program A preprocessing program is mainly used to check the selected business objects for archivability and to set an archivability indicator or a status for each checked business object (see above). Business objects which have received an archivable indicator can no longer be changed. If the user calls up this business object in the change mode, he or she receives a message window telling him or her that the business object has been marked for archiving and can therefore no longer be changed. The function of the write program later is to simply select the marked business objects and to write them to the archive without performing any other checks. This concept of splitting tasks between the check and the write programs is advantageous from a performance perspective and allows the archiving processes to be better integrated in overall systems operations. It is mainly used for data archiving in SAP CRM. Another example of how a preprocessing program is being used is the SAP ERP application Sales and Distribution (SD). In the archiving process of sales documents the preprocessing program is used as an analysis program, which determines the number of archivable documents, but does not mark them with an archivable indicator. Postprocessing program Not all archiving objects offer postprocessing programs. They are used to perform certain tasks in the database that may be necessary after an archiving session has been completed. Such tasks may include the removal of log entries, cleaning up index tables and updating statistics data. Here the postprocessing program serves as a "clean up" program for the remaining data after archiving. Postprocessing programs operate only on database data, not on archived data. Therefore it is generally not necessary to be connected to the archive. Postprocessing programs are generally executed after an archiving session, but they can also be used separately from data archiving. For example, after you have archived financial documents with archiving object FI_DOCUMNT, it may be useful to delete the secondary indexes for which the runtime has been exceeded. This is done by an index delete program which is automatically started in the background after the last delete program has finished. To start the program you must first make the appropriate settings in archiving-object-specific Customizing. If these settings for the automatic start of the program were not made, you can start it manually, if you are facing an acute lack of memory space in the database. In some applications post processing programs are used to delete previously archived business objects from the database. In the SAP ERP application Quality Management (QM) for example, the data of the archived inspection lots is removed from the database with the help of a postprocessing program. 2.2.4.3 Reloading archived data Even after many years and upgrades to new releases it is possible to display data archived with SAP data archiving, using the appropriate read programs. Therefore, in most cases it is not necessary to reload data back into a production database, and should only be done in emergencies. An example of an emergency would be a situation where you realize after you have archived that you have archived data that is still needed in the database. Such an error may occur due to wrong Customizing settings, choosing a residence time that is too short, or entering the wrong selection criteria. If you detect an error you should reload the data that was accidentally archived back into the database immediately after archiving. A reload of this kind is generally unproblematic. However, with some archiving objects it may be that not all data that was archived can be reconstructed in the database. When you reload a sales document, for example, the reload program cannot reload the cost center debits (controlling data) that are linked to the sales document. This means that the data will not exist in the database. Also, there are certain tables in data archiving from which data is only deleted. This means that the data deleted from these tables during the data archiving process, cannot be reloaded. Another restriction involves data from certain archiving classes, which cannot always be archived. Before you reload archived data, you must seriously consider the consequences that this option my bring with it.
SAP AG 2007
Page 25
Version 1.2
This is especially important if you are considering reloading data that has been archived for some time already. In such a scenario the risk that the archived data and the data in the database no longer coincides is very high. This has to do with the fact that the data would be reloaded from a historical context into a current database context. You may be overwriting documents that belong to a document number interval that was reset between the time of archiving and the time of reloading. Reloading can also affect the consistency of organizational data, because you may be reinserting data into the system that no longer exists. Thus, the longer the amount of time between archiving and reloading, the higher the risk of encountering data inconsistencies after the reload. Reloading across releases as a rule should never be done. If you do, SAP cannot be responsible for any problems that may occur.
SAP AG 2007
Page 26
Version 1.2
SA P syste m
Archive Administration (transaction SARA)
Archiving programs
File sy stem
Archive fi les
AS
HSM system
Database
Storage system
Applicat ion data
DA monitor
Background processing
Figure 9: ADK as a Runtime Environment for Archiving Programs The write program is scheduled in the form of a write job in archive administration. Within the write program an ADK call generates a new archiving session that is entered in the ADK repository. The application data, which is read for a specific archiving object and checked for archivability is transferred record-by-record to ADK and bundled into data objects by the ADK functions.
SAP AG 2007
Page 27
Version 1.2
Data object services Other ADK internal data object services transform and compress a complete data object into a format that is platform-neutral and can be read across releases. Depending on the data, it is possible to achieve a compression rate of 1:10, or even higher, if the data records contain many initial values. Before the data object can be written to a file, ADK makes sure that the meta data necessary for the subsequent technical interpretation of the archive files has been transferred from the ADK repository and ABAP dictionary. This particularly includes the Name Tabs of all tables and structures that belong to the archiving object. When it accesses the archive the ADK runtime system checks whether the following conversions are necessary due to a changed system environment. If yes, it carries out the corresponding conversions. Platform adjustment If the codepage or number format have changed, a platform adjustment has to be made. When archive files that originated in a non-Unicode system are being read in a Unicode system (see background information in the information box below), a codepage conversion always takes place for the characterlike data. Schema adjustment If the archived tables and structures have been changed with respect to their original definition in the ABAP dictionary, a schema adjustment has to be made. The structural changes, however, must ensure compatibility of the structure components. This is accomplished by using the same semantics as in the ABAP command MOVE-CORRESPONDING. This means that structural components that did not exist when the data was archived are returned with initial values, and components that are no longer used after the upgrade are not output. If components have the same name then the usual ABAP conversion rules between the different data types apply.
Data Archiving and Unicode Unicode is a universal character set used to facilitate a better exchange of data between different systems and across boundaries. It helps prevent problems that may arise when communicating systems use different code pages. These problems mainly arise, because a code page is only applicable to specific languages and cannot be combined at will with other code pages. Using Unicode helps you avoid these problems. What are the conditions necessary for data archiving to work in a Unicode system? Generally it is sufficient if ADK and the archiving objects and archiving classes used adhere to the stricter ABAP syntax. ADK as part of the SAP NetWeaver Application Server meets these Unicode requirements, as do the SAP archiving solutions described in this document. However, any modified or customer-specific programs must be checked for their Unicode capabilities, and if necessary, must be converted. For help on this topic read the chapter "ABAP and Unicode" in the SAP Library, the ADK documentation (especially the function module documentation, which mentions the new parameter RECORD_REF used for reading data records), and the example programs delivered by SAP, SBOOKR, SBOOKR_2 and SBOOKR_3. If you want to convert your SAP system to Unicode, you can speed up your conversion by first removing data from your system through data archiving. Archive files that were created before the Unicode conversion are treated as a special case of the temporary platform adaptation (see above) that is performed automatically. Therefore, you never have to convert any archive files, because of a Unicode conversion. For more detailed information about Unicode see SAP Notes 73606, 449918, 705447 and 379940, or the website http://www.unicode.org. These conversions are carried out only temporarily during the runtime of the read, delete and reload programs. Archive files are never actually changed. If, however, the changes in the system environment have been more comprehensive than described above, it is possible for you to use ADK to implement special conversion programs, to permanently convert your archive files. ADK File Management
SAP AG 2007
Page 28
Version 1.2
With the other runtime system services listed in Figure 9, ADK takes over other technical and applicationindependent tasks, so that the archiving programs are not responsible for these tasks. ADK file management makes sure that new archive files are automatically created and named during the write phase, as soon as the limits set in archiving-object-specific Customizing have been reached. Choosing a syntactically correct path used later to access the archive file is not part of the logic of an archiving program.
SAP AG 2007
Page 29
Version 1.2
Figure 10: Menu of the Role "Data Archiving Administrator" This role contains the transactions you need for the following tasks and links to corresponding websites in the SAP Service Marketplace: Selection of the appropriate archiving objects based on an analysis of the database and table growth (DB15, DB02, SE16). Planning, controlling, monitoring and evaluating archiving sessions (SARA, SAR_SHOW_MONITOR, SARI). System settings, especially the Customizing of the storage of archive files (SAR_OBJ_IND_CUS, FILE, SF01). In this document we are not able to describe each task of the data archiving administrator in detail, due to space and time limitations. We therefore want to focus only on a few important tasks in the following section. For more detailed information about all tasks that are part of data archiving, see the SAP Library.
3.2.2 Analysis
Large data volumes present a greater burden on the capacity of the database and backups. In addition, in many high availability scenarios data is often replicated and mirrored. This also means more investment in the necessary hardware for these scenarios. The costs for backups go up even more dramatically as data volumes rise - the amount of time needed for a backup goes up proportionally to the increase in the data volume. The analysis phase is the most important part of an archiving project. An analysis can be approached from two different viewpoints: From a technical viewpoint From a business viewpoint 3.2.2.1 Analysis from a Technical Viewpoint This type of analysis focuses on the database size and the growth of the database tables. This technical analysis must then be compared to the business view analysis. In addition, you must identify the archiving objects that are linked to the critical database tables. Before this step you should check whether there is anything you can do to reduce the data volume that is to be archived
SAP AG 2007
Page 30
Version 1.2
through data management measures (see Chapter 1.4). The individual user departments know best which data is no longer needed and therefore do not need to be archived. It is difficult to make general statements about when is a good time to remove data from the database. It is part of the system administrator's tasks to determine specific criteria that can be used to decide whether data can be removed from the database or not. Also part of the routine tasks of the administrator are the monitoring of the database and ensuring that the production system's performance remains intact. For this purpose SAP has provided the administrator a series of powerful tools. We assume that if you are a system administrator that you are familiar with different monitoring tools and how to interpret the values they show. The following section we give a short introduction of each tool. The corresponding transactions or programs are listed in the parentheses. Database Monitor (DB02) You can use the database monitor (transaction DB02) to determine the size of tables and how these tables have grown in the past. This transaction returns important database-specific figures. The display of the database monitor depends on the database you are using. With an Oracle database system, for example, the monitor displays the number of free table spaces or the size and the growth of the individual tables and indexes. In addition to transaction DB02 you can also use the tool SAPDBA and transaction ST03 (performance monitor), to find out other indicators that could help you with your data archiving decisions. You can also view memory space statistics using transaction DB15, by choosing Space Statistics. Table Analysis (TAANA) You can use this function to run analyses for database tables and analyze the corresponding table entries. This analysis function determines how many entries in a given table exist for a specific field value combination. It does this with the help of analysis variants, which contain the corresponding field lists. The analyses help you determine which archiving objects you need to use and can indicate which selection criteria would be the most effective during archiving. It also helps you avoid having to analyze archiving objects or organizational units that do not generate large amounts of data. For more detailed information see the SAP Library. Application Analysis (ST04) You can use transaction ST04 to carry out application-specific analyses of table contents. You can choose the application you want to analyze on the entry screen. Then you can schedule different analyses and then analyze them. These analyses can give you important information about document size and document type runtimes. Tables and Archiving Objects (DB15) After you have identified the critical tables, you must find out which archiving objects these tables belong to. You can see this with the help of the function Tables and Archiving Objects (transaction DB15). This function shows you which tables belong to which archiving objects and vice versa. This will enable you to assign the tables you identified in the technical analysis to specific archiving objects, which can then be used to archive the table entries. This transaction provides the following functions: Display of all the tables belonging to an archiving object and from which entries are to be archived Display of all the tables belonging to an archiving object and from which entries are deleted Display of database-specific space statistics, such as number of records in a table, how much space a table occupies in the database (in KB) and other table details from the SAP and/or database statistics.
3.2.2.2 Analysis from a Business Viewpoint With this type of an analysis the table entries are analyzed from a business point of view using standard analysis tools and then they are assigned to the archiving objects. In principle business objects can be archived independently of each other. However, there may be dependencies that require that the data is archived in a specific sequence. For more information on archiving sequence see the Network Graphic (in Archive Administration under Environment Network Graphic). You may have to analyze your business processes to determine which processes generate large data volumes. Here it is important that you consider the individual objects in the context of the actual component
SAP AG 2007
Page 31
Version 1.2
running in the production system and the corresponding business processes. The following section describes the individual steps that make up this type of an analysis. Identify the data that can be archived From a business point of view it is mainly important to determine which data can be archived when. You can only archive those business objects that belong to closed business processes. To be able to do this you must first determine when a business process is considered to be closed. This depends on the application to which it belongs. Here it is also important to take into account the requirements of your company, such as different runtimes for a business object based on its organizational unit (company code, plant, sales organization, etc.). Besides the analysis of the business processes the CO analysis programs RARCCOA1 and RARCCOA2 (see the program documentation for more detail) are among the most important tools used to identify the archivable data. In addition, you should determine whether or not data exists that is no longer needed from a business point of view. This data can be for example: Legacy data from data migrations Data from separated business areas (because the area was sold, for example) Test data Keep in mind that this data can often belong to business processes that are not yet closed. Therefore you should plan enough time for the data removal in these cases.
3.2.3 Monitoring
An important task of the data archiving administrator is monitoring the archiving sessions. You can use the Archive Management function for this task and as of SAP R/3 4.6 also the Data Archiving Monitor. The archive management function is available in Archive Administration (transaction SARA). Archive Management This function gives you an overview of all archiving sessions and archive files for a specific archiving object. You can use it to display current information about the archiving sessions, the archive files, and the archiving jobs. In addition from the function you can branch directly into the spool list of the write job, to Data Archiving Customizing, the archiving objects and tables (transaction DB15), the Archive Information System, and the data archiving statistics. The job overview displays the archiving sessions for a specific archiving object according to their status. Below we have listed some of the most common statuses for archiving sessions. For a description of the additional statuses see the SAP Library, or the Legend in archive management. Incorrect Archiving Sessions The write process was terminated before the first archive file was completed. Incomplete Archiving Sessions The write process has not yet completed, the delete program has not yet run for all archive files, or a write process was terminated. Complete Archiving Sessions Both write and delete phases are completed. Archiving Sessions to be Archived The management data of the archiving session can be archived and deleted with archiving object BC_ARCHIVE. The archiving sessions are listed according to their status. Within a status area the sessions are grouped into blocks of 20. When you expand an archiving session you can see a list of the archive files that belong to the archiving session. Archiving Session Details You can double click every archiving session and each archive file to display the details for each. For the archiving session the date, time and user are displayed, among other information. In the change mode of
SAP AG 2007
Page 32
Version 1.2
the detail screen you can enter an archiving session note and set the To be Archived or the invalid indicators. The management data of the archiving sessions that have the To be Archived indicator activated, is archived the next time the archiving object BC_ARCHIVE is used. You then no longer have read access to the archiving sessions whose management data was archived. Archive File Details The detail screen for the archive file also contains information about the size of the archive file and the number of objects it contains. If the archive file has been stored in a storage system, the field Storage System will contain the status Stored. In the change mode you can change the name of the archive file and the logical path. You can also enter a note and a long text for the archive file. If a name appears for the archive file, the system assumes that the archive file is in the file system. If this is the case the system will check the accessibility of the archive file in the file system, when you call up the detail screen. The result of this accessibility check is displayed in the last line of the detail screen. If the check was positive, the status Archive File Is Accessible (represented by a green traffic light) is displayed. If not you will see the status Archive File Is Not Accessible (red light). If the archive file has been stored in a storage system and no file name has been entered for the archive file, the system will run a check to see whether or not the file in the file system can be accessed. The file then receives either the status Archive File Is Accessible in Storage System (yellow light) or Archive File Is not Accessible (red light). Archiving sessions or archive files with terminated archiving sessions are represented by a lightening rod icon. Archiving sessions that have not been completed are represented by a clock icon. In the detail screen of the archiving session or the archive file you can see the names of the jobs that belong to the file or the session. You can double click the archiving session to display an overview of the job. Here you can monitor the status of the job. In this screen you can also see the job log, the spool list, and details for the job. Choose the function User Entries to branch to the selection criteria that was chosen for this particular archiving session when the write job was scheduled. These selection criteria will be displayed for this job, even if the variant no longer exists. Choose Goto Stored Files to display an overview of the archive files that have been written to the file system. You can restrict your selection by for example choosing stored archive files for all archiving objects or for a specific archiving object or archiving session. The results list shows the content repository in which the file is located and the technical key of the archive file in the storage system. In addition, the overview contains information about the status of the archive file in the storage system. Data Archiving Monitor The CCMS Alert Monitor is a central monitoring tool for the entire IT environment (particularly for SAP systems). It comprises a series of dedicated monitors for individual system components. A system administrator who has already monitored system availability and throughput using the Availability and Performance monitor, or has used the Background Processing, File Systems, or Knowledge Provider monitors in the context of data archiving, will know how valuable it is that data archiving has been integrated into the monitoring infrastructure. The monitoring infrastructure is even more useful when auto-reaction methods have been implemented. They can send an e-mail or an SMS message, for example, in case there is a problem. The data archiving monitor offers the following archiving-specific functions for monitoring the processes, recognize problem areas and analyze problems: A general overview of all archiving sessions (here you do not enter an archiving object first, as you would in archive management). Progress reports about the processing of the archive files. Compact information about the technical details of the write and delete jobs, such as starting time, runtime, size of the archive files, and number of archived data objects. Alerts to indicate a possible need for action (for example, yellow alerts for delete jobs that have not run yet or incomplete delete jobs, and red alerts for certain error situations). Provides a link to the jobs that triggered the alert and their job logs, to help you analyze the warning. The data archiving monitor is not able to report all potential runtime exceptions (such as a job termination due to a system error that took place outside of ADK). We therefore recommend that you use the data
SAP AG 2007
Page 33
Version 1.2
archiving monitor together with the other monitors we mentioned earlier, or that you configure a monitor that is specifically configured for your requirements. You can call up the data archiving monitor from the CCMS Monitor Collection (transaction RZ20) from the menu SAP CCMS Monitor Templates Data Archiving, directly by using transaction SAR_SHOW_MONITOR or from the role we described earlier. For documentation go to Application Help or use the long texts that describe each node. You can also use the information and hints provided within the monitor itself.
3.2.4 Settings
3.2.4.1 Overview of Customizing The settings for the archiving processes are made in archiving Customizing, which you can reach from Archive Administration. The following section will give you a general overview of what you can do in data archiving Customizing. For more detailed information and examples see the corresponding documentation in the SAP Library. Data archiving Customizing is divided into the following areas: Cross-Archiving-Object Customizing The settings you make here affect all archiving objects across the board. Data archiving monitor: This tool provides the data archiving administrator with archiving-specific functions to monitor processes, and recognize and analyze problems. This includes for example the progress indicator for processed archive files, a general overview of all archiving sessions that have been processed for all archiving objects, and alerts that point you towards current or potential problems during archiving. See also Data Archiving Monitor. Access check during archive selection: Archive files for which the access check returned negative results, are represented in the selection screen with a lightening rod symbol. Verification of archive files: During the write phase the program writes verification information into the archive file for every data object. With this information an archive file can then be check read before the actual procedure, such as deletion, reading, or reloading, actually takes place, to make sure that it and all of the data objects it contains has remained intact. Automatic interruption of the write phase: It may be necessary to interrupt the write phase due to low system resources, or because the time frame for the archiving session is too short. The interrupted write phase can be continued at a later point in time. The interruption function is part of ADK. Server groups for background processing: When you schedule archiving sessions in the background, you can enter the server groups on which the sessions are to be executed.
Archiving-Object-Specific Customizing The settings you make here only apply to the archiving object that you entered previously in archive administration. Some settings, such as those for the post processing program, are only offered for those archiving objects that include such a function. Logical File Name: Here you choose the logical file name under which the archive files are to be stored in the file system. At runtime a complete, platform-specific file name including the path is determined for the logical file name. The logical file name is maintained with transaction FILE. See Basis Customizing". Archive File Size: Here you can enter the maximum size an archive file can reach before it is closed. The file size can be determined either by entering a size, such as 100 MB, or by specifying the maximum number of data objects an archive file is to contain. As soon as one of these values has been reached, the archive file is closed and a new one is created. Settings for Delete Program: Here you can determine whether the delete program is to be executed in the test or production mode, and whether or not the delete program is to be started automatically.
SAP AG 2007
Page 34
Version 1.2
Place File in Storage System: If you are using an external storage system, you can enter the content repository here in which the archive files are to be stored. You can also determine here whether the archive files should be stored automatically and whether the storage step should take place before or after the delete phase.
Basis Customizing Here you enter the file name and file paths used for the archive files. To define a logical file name you need a name and a path. The path again, is of a logical nature, meaning that there is a rule that determines at runtime (platform-dependent), what the physical path is to be, based on the information you entered. File names and paths can be either client-specific (transaction FILE) or cross-client (transaction SF01). Application-Specific Customizing With some applications it is possible to determine criteria for the archivability and deletability of application data. These criteria are then considered during archiving. Examples are account type life and document type life for financial accounting documents. These parameters are set in the Customizing tables of the application in question. 3.2.4.2 Security Versus Performance Data archiving has to do with the processing of mass data and in this process it must be ensured that access to archived data is possible in a secure and efficient manner. It is the task of the data archiving administrator to ensure that the relationship between security and performance during archiving procedures is always optimally in tune with the requirements of the user departments that run the business processes. Verification of Archive Files An important performance issue is the amount of runtime needed for checking whether the archive files are intact. During the write phase the system stores verification information based on a CRC-32 (Cyclic Redundancy Check) checksum in the archive file. With this information an archive file can be check read before the procedure, such as deletion, reading, or reloading, actually takes place, to make sure that it and all of the data objects it contains has remained intact. This procedure recognizes and reports the incorrect archive files, and subsequent activities, such as the deletion of archived data from the database, are not started as a result. If you encounter incorrect files contact an SAP consultant or an experienced data archiving consultant. You can determine when the verification is to take place in Cross-Archiving-Object Customizing. Keep in mind, however, that although you will achieve increased security by checking your archive files, you will also generate longer runtimes. Depending on the archiving object, they can be up to 10-30% longer. File Accessibility Check During Archive Selection During the selection of archive files for delete, read or reload procedures, you can have the system check that an archive file exists. This means that the system checks whether or not the archive file can be accessed by archive administration and whether the meta data of the archive file is readable. Archive files for which the access check returned negative results are represented in the selection screen with a lightening rod symbol. What kind of access check is to be carried out is determined in Cross-Archiving-Object Customizing. You can let the system run checks on archive files that are still in the file system, and on archive files that have been stored on a storage system. If you do not need the access check at the time of the file selection, you should deactivate this setting in Customizing. Running an access check during archive file selection can be very time intensive for archive files, especially if you have many files in the storage system. Therefore, you should use these checks only after careful consideration. The same thing is valid for archive files that are located in a file system, which is connected to an HSM system. Reversing the Order: Storing Before Deleting After the archive file is created during the write phase there are two possibilities for how the delete and storage phases should be executed. You can indicate which option you would like to execute in CrossArchiving-Object Customizing: Delete Before Storing
SAP AG 2007
Page 35
Version 1.2
The write phase is followed by the delete phase, during which the data is deleted from the database, based on the data in the archive files. The archive files are stored in a storage system in the following storage phase. With this option it is important to ensure either that the archive files are written to mirrored disks or to a RAID file system, or that they are saved before the delete phase. Store Before Deleting The write phase is followed by the storage phase, during which the archive files are written to a storage system. The data is not deleted from the database in the subsequent delete phase, until it has been stored successfully. In addition to these two settings, in Archiving-Object-Specific Customizing you can also enter a storage system (Content Repository) and indicate that the storage of archive files should be started automatically (Start Automatically) Moreover, when you choose Store before Deleting you can also decide how the delete program is to read the data, with the indicator Delete Program Reads from Storage System: If you activate this option, the archive file is deleted after it has been stored in the file system. During the delete phase the delete program reads the data form the archive file in the storage system. This ensures that the delete program receives the same data as was transferred to the storage system. If you deactivate this option, the archive file is not immediately deleted after it has been stored in the file system. During the delete phase the delete program reads the data from the archive file in the file system. This helps improve the performance of the delete program, without sacrificing the early storage of the data. To start the delete program automatically after the archive files have been created or after they have been stored, the setting Start Automatically has to be activated under Settings for the Delete Program. If you have chosen Store Before Deletion (and only in this case) it does not matter whether the delete program is started in the test or production mode. If you have chosen Delete Before Storing and you have activated the test mode variant, then no data will actually be stored. The data archiving administrator must decide on a case-by-case basis in which order the delete and store phases should be executed. If security is the most important focus, then the delete phase should take place after the store phase. This ensures that the data will not be removed from the database until the archive files have been stored correctly. The process becomes even safer if the delete program is set to read the archive files from the storage and not the file system. If performance is your main focus, then the delete phase should occur before the storage phase.
SAP AG 2007
Page 36
Version 1.2
The data archiving administrator can use the following tools to display statistics: Analysis Transaction You can use the analysis transaction to display all statistics that were collected during data archiving for all archiving sessions. You can call the transaction either directly from the initial screen of Archive Administration or from Archive Management. You can also use transaction SAR_DA_STAT_ANALYSIS. To select an archiving session to display you can use the client, archiving object, archiving date and also the status of the archiving session. To further process the statistics, the data archiving administrator can use the comprehensive functions of the ALV Grid Control (SAP List Viewer), such as print or export. Standard Log The standard log outputs the statistics collected during write, delete, read, and reload sessions in the form of a screen list if the programs were run in the dialog mode and a spool list if they were executed in the background. For the write program the standard log can be output in the production and in the test mode; for the delete, read and reload programs the standard log can only be output in the production mode. In addition to the already mentioned statistics, the standard log also includes detailed information about the number of processed structures. 3.2.5.3 Archiving Statistics Data The statistics that are collected during archiving are saved in database tables and can be archived together with ADK management data, using the archiving object BC_ARCHIVE.
SAP AG 2007
Page 37
Version 1.2
Databa se bloc k
Header Insertion area
Tabl espace
Extents of a table
Gap
Record
File 1
.....
File n
Figure 11: Database Blocks and Tablespace Fragmentation To help you avoid the negative effects of fragmentation on your database system, you can choose from index, table and tablespace reorganization. 3.2.6.1 Index Reorganization Of all the reorganization options index reorganization is the most important measure to help you improve the performance of your database. To keep the time needed for database accesses short, you need a higher buffer hit rate for the data you are trying to read. Especially with indexes the probability that the data is already in the data buffer should be close to 100%. This requires that the indexes use up as little working memory as possible.
Index B-tree
Database block
Gaps .....
.....
Table
.....
Figure 12: Index Fragmentation When data is archived or even simply deleted, gaps are created in the indexes of the affected tables; see Figure 12. The database system may not be able to fill these spaces again, because the order of entries in
SAP AG 2007
Page 38
Version 1.2
the index predetermines which data blocks can be filled. If there is no free space in these data blocks, new data blocks have to be used. As a result and despite data archiving, the need for more disc space for these indexes continues to grow. This in turn will have a negative effect on the buffer hit rate, which means that the performance of the database system will suffer. This effect can be resolved by an index reorganization (ALTER INDEX...REBUILD). It is much faster to reorganize an index than to delete and completely rebuild it again. You should run an index reorganization after every larger data archiving session. If the database system uses a cost-based optimizer (CBO), the statistics will be obsolete after deletion. However, we do not recommend that you run completely new statistics on the optimization of the access paths (UPDATE STATISTICS). Experience has shown that this kind of an update is often not necessary, especially for large database tables. It can even be counterproductive. Generally with smaller data volumes, the optimizer opts for a full table scan, even though a faster index access is possible. The risk that the optimizer takes a wrong decision is less with greater data volumes. It is therefore not necessarily recommendable to update the statistics directly after data archiving, when the data volume is at its lowest. 3.2.6.2 Table Reorganization Unlike the gaps in the indexes, unfavorable space distribution in data blocks does not have a major effect on performance, because the probability that these data blocks have to be loaded into the data buffer is much smaller than with the indexes. Table reorganizations cost much more than index reorganizations, and should therefore only be used in exceptional cases. 3.2.6.3 Tablespace Reorganization Although tablespaces may be highly fragmented after archiving, this fragmentation does not affect the performance of a database system. It is therefore not necessary to reorganize tablespaces. However, it may still be beneficial to reorganize your tablespaces to Gain back space from your database system, after you have archived data that is no longer needed and the tables will only be changed slightly from then on. If, however, you expect that the table from which data was archived, will receive the same amount of new data, then it would not make sense to run a tablespace reorganization. Make space for other tables. An alternative here would be to accept that after several archiving sessions the gaps will be organized by the free space management function of your database system in such a way that they are large enough to be used for new data, even without a tablespace reorganization. In sum we can say that to improve the performance of your system, it pays to run an index reorganization after every larger archiving session. When you should run a table or tablespace reorganization you can determine by analyzing the history of how your database blocks are filled on average. Even without these additional reorganization measures, data archiving stabilizes the size of your database. Even if you need to use several archiving sessions, each data block should at some point in time be completely emptied of old data and at the latest should then be available for new data. Compared to data archiving followed by reorganization measures, this kind of stabilization cannot be felt immediately after archiving and is not noticeable on a higher level.
SAP AG 2007
Page 39
Version 1.2
scheduling job, which starts the write program with the same variant without any additional checks. When you choose the period, make sure that for every archiving session the corresponding delete jobs will be completed before the next write job starts. Prerequisite for Periodic Archiving It would seem that whether or not periodic archiving is possible only depends on the archiving object and the variant. However, periodic archiving cannot be possible if the data to be archived is selected using absolute (document) numbers, items or times. But, if the archivability of the data is determined using specific statuses, which can be set in application-specific Customizing outside the variant, using residence times or other relative time parameters, then the archiving object can be used for periodic job scheduling. In some cases you can use periodic archiving by leaving out some absolute selection criteria. An example would be the archiving of material documents. Here you do not need to enter the material document number in the variant of the write program (program RM07MARC for archiving object MM_MATBEL). The archivability of the data is determined in the application-specific Customizing of the document runtimes per transaction type. To speed up the selection of the data, however, we recommend that you enter the material document year in the variant. Then you only need to change the variant once a year.
SAP AG 2007
Page 40
Version 1.2
Postprocessing Write
SUB
Preprocessing
FIN END
RET
REA
REL
Schedule WRI/PRE/FIN
Index
Store
Delete Retrieve
Read
Reload
Archive Ad ministration
Figure 13: Job Types in Data Archiving and Their Dependencies Figure 13 shows the dependencies between archiving object types and the sequence, in which ADK automatically schedulers the different job types, if you want to start your programs automatically. The sequence in the Figure refers to the Customizing setting Store Before Deletion. It is of course also possible for the write job to first (or to only) start the delete jobs automatically. The Figure also shows the scheduling options you have from within Archive Administration. When you choose the start date you can choose from the usual options: Immediately, at a specific time, after a specific job or event, with a specific operation mode, or according to a factory calendar. The scheduling job sees to it that the job it started usually the write program contains the job class A. This ensures that the privileged write job can run even on the most restrictively configured database servers. ADK tries to schedule a write job on the database server. This is especially advantageous, when the archive directory is in a file system that is connected locally with a database server. Avoiding network traffic does not only speed up the write phase, which is generally not parallelized, but it also decreases the risk of I/O errors during the creation of archive files. Use of Server Groups In the SAP NetWeaver Application Server CCMS, you have access to centrally administered server groups for background processing. These server groups are analogous to RFC server groups. This allows you to restrict the distribution of jobs to selected servers with background processes. The introduction of job server groups in CCMS led ADK in SAP Web AS 6.10 to replace the archiving-specific server group concept available with SAP R/3 4.5 with a uniform and extended function. Moreover, you now no longer assign a job server group to individual archiving objects. The job server group assignment now affects all the data archiving jobs of all archiving objects equally. The following general rule still applies: If a group contains a server that runs on the database server, the write job is scheduled on that server.
SAP AG 2007
Page 41
Version 1.2
time windows during which the write phase must take place. You can also interrupt archiving sessions manually from Archive Administration. In Archive Management the interrupted archiving sessions are listed with the stop sign symbol. Not all archiving objects support the interruption function. You can use transaction AOBJ to find out for which archiving objects interruption is possible. These archiving objects have the indicator Interruption possible activated in the detail view.
SAP AG 2007
Page 42
Version 1.2
3.3.5.2 Scheduling Delete Jobs Direct Scheduling The special aspect of the delete phase is that the number and the names of the archive files that are created during the write phase, are not known before the delete job is started. ADK passes the file information to each automatically started delete job during the write phase. If you determine the name of the delete program using transaction AOBJ and schedule the program directly, files are selected automatically. For a particular archiving object ADK determines the archive file with the smallest key in the status Writing Complete. In other words it selects the oldest archive file that contains data still to be deleted. Directly scheduling the delete programs in this way gives the scheduler the greatest flexibility and most complete control over the jobs. The only condition is that only those variants that have been entered in the settings for the delete program should be used for the test and production modes. Indirect Scheduling Figure 14 compares the direct and indirect scheduling of jobs. Indirect scheduling is also available in earlier releases; see SAP Note 205585. Here the scheduler merely starts the program RSARCHD, which then schedules the delete jobs. Which archive files are then processed you mostly decide on your own: You enter the archiving object, the maximum number of files to be selected, and the maximum number of parallel delete jobs, among other information, in the variant for RSARCHD. Based on the sequence and status of the archive files RSARCHD determines the number of delete jobs that are started with each execution of RSARCHD.
RSARCHD
Figure 14: External Scheduling of Delete Jobs Using the variant you can also determine which of two processing methods RSARCHD uses: Either the program is ended immediately after the delete jobs have been scheduled, or it is ended as soon as all scheduled jobs have the status Completed or until one of the delete jobs has the status Canceled. The latter method makes it easier for you to monitor your delete jobs, because you would have to look at only the RSARCHD job. If the RSARCHD job finishes properly then the entire delete phase can be considered to be finished properly. If, however, the a job is cancelled, the job status of RSARCHD will reflect this immediately.
SAP AG 2007
Page 43
Version 1.2
Content server
SAP ArchiveLink
Storage programs
Display programs
Figure 15: Integration and Components of SAP ArchiveLink Up to and including SAP R/3 4.5A only SAP ArchiveLink has met most of the named functions requirements for SAP applications. An SAP application integrates SAP ArchiveLink functions and in this way can access the administrative and interface functions, to process documents on external content servers. Particularly important with SAP ArchiveLink is that it operates with very simple and small management structures and application interfaces. The advantages of this kind of architecture particularly lie in the fact that they are easy to integrate into business applications and can handle mass data due to their slim management tables. As of SAP R/3 4.5B this concept was considerably expanded in terms of the functions offered by the management layer, with the development of the Knowledge Provider (KPro). KPro offers an application interface that is based on content models, which allow for a very comprehensive and flexible modeling of documents. This provides the applications with a large number of functions that go beyond those offered by SAP ArchiveLink, such as versioning, variant creation, indexing based on content models, and many more. As a result, SAP applications can, depending on their document requirements, use KPro to integrate any given number of data management functions, or to simply use SAP ArchiveLink.
SAP AG 2007
Page 44
Version 1.2
In sum, SAP ArchiveLink and KPro differ mainly on the application interface and management layers. The content server interface is the same in both components, so that the same document functions can be used on the same external content server. The components we mentioned are generally named as follows: External Content Server In connection with KPro, you talk about content servers which have content repositories. As of SAP R/3 4.5B these terms are also used in connection with SAP ArchiveLink. In addition, the terms "storage system", "archive system" and "external archive system" are also used as synonyms. Content Server Interface As of SAP R/3 4.5B the terms SAP ArchiveLink Interface and SAP ArchiveLink HTTP Content Server Interface are used. The HTTP Content Server Interface, with a slightly less comprehensive array of features, was also released with SAP R/3 4.5B. In sum we can say the following about the array of components and terms we have mentioned: All functions that are used for the integration of documents in SAP applications, use the SAP ArchiveLink interface to certified storage systems to store documents. This way the same storage systems can be used for all application purposes, regardless of whether the application uses an SAP ArchiveLink or KPro application interface. The terms SAP ArchiveLink Interface and KPro CMS can be used interchangeably and refer to the use of document-level application interfaces. This is especially important for data archiving because as of R/3 4.6C it uses the KPro CMS Application Interface instead of the SAP ArchiveLink Application Interface, for reasons of simplicity. Generally, in this document we will use the term SAP ArchiveLink, which is also the most commonly used term among SAP users. This term originates from the well-established SAP ArchiveLink Interface. However, at some points in this document we do distinguish between SAP ArchiveLink, KPro, and CMS, for the sake of precision. 4.2.1.2 The Concept of Document Before we can talk about the purpose of SAP ArchiveLink, we must first define the term "document" in the context of SAP ArchiveLink, from both a business and a technical point of view. SAP ArchiveLink distinguishes between four different document categories: Incoming Documents Can trigger business processes; from a technical standpoint this category comprises every type of document that was not created by the SAP system itself, such as scanned documents or files on your local PC. SAP ArchiveLink passes these documents unchanged from their original format on for storage. Outgoing Documents These are created as part of a business process or as the result of a business process; from a technical point of view, this category comprises documents that were created with the help of SAP text processing systems SAPscript or SmartForms, and are output using, for example, a printer, FAX or monitor. SAP ArchiveLink is used to pass outgoing documents to the storage system in PDF format. Print Lists These documents are created during the evaluation of business processes; from a technical point of view they are the result of the evaluation program. They are transferred to the storage system via SAP ArchiveLink as ASCII files. For more information, see Chapter 1.3.2. Archive Files These contain business process information that is removed from the SAP system database in the form of data, which must remain accessible in the future. They can be displayed directly as documents. From a technical point of view the last three categories produced within the SAP system.
SAP AG 2007
Page 45
Version 1.2
ERP FI HR
External Storage System
PLM KM
CRM
Jukebox HSM
Server Clients
Scanner
Figure 16: Typical Configuration of an SAP ArchiveLink System 4.2.2.1 Communication Using SAP ArchiveLink The SAP ArchiveLink interface consists of both server and client components. A typical call of the server interface would be triggered when a file originating in the SAP system is to be passed to the storage system. An example on the client side would be the request for the display of a document.
SAP AG 2007
Page 46
Version 1.2
The SAP ArchiveLink interface is available in four versions, for which a third party vendor can be certified. The versions are named after the SAP R/3 release with which the enhancements were first made available. (The interfaces are available to all SAP solutions.) ArchiveLink 2.1 First version of the SAP ArchiveLink interface. The server communication is based on remote procedure calls, while the client integration takes place via the ArchiveLink Viewer. ArchiveLink 3.0 Server communication now takes place via SAP RFC, and on the client side OLE automation 2.0 is now supported. ArchiveLink 3.1 The interface is enhanced with more functions. ArchiveLink 4.5 The interface is expanded to HTTP. RFC is eliminated and the ArchiveLink Viewer is no longer part of the interface. The current SAP systems support the last two versions of SAP ArchiveLink, meaning 3.1 and 4.5. The HTTP Content Server Interface 4.5 is closely related to the SAP ArchiveLink Interface 4.5. It contains all HTTP components of the SAP ArchiveLink Interface 4.5, but does not include OLE automation and the barcode BAPI. This means that a system that is certified for the HTTP Content Server Interface 4.5 cannot be used for all SAP ArchiveLink scenarios. Therefore, almost all vendors are certified for SAP ArchiveLink Interface 4.5.
Table 2: Storage Methods Via SAP ArchiveLink The HTTP interface only supports synchronous communication. Therefore, the storage procedure for archive files takes place synchronously. The RFC interface (SAP ArchiveLink 3.1) supports both asynchronous and synchronous methods. In SAP R/3 Releases up to 4.6 the storage (and retrieval) of archive files takes place asynchronously. As of SAP R/3 4.6C synchronous communication is used.
SAP AG 2007
Page 47
Version 1.2
There are two different methods used for accessing the information: Accessing an archive file block specified through a position and an offset This is the case with single document accesses and evaluations directly in the storage area. A single document access requires few block accesses on the storage area, and during an evaluation the entire archive file is called sequentially in blocks. This may mean a large number of requests, which, however, each involve small data volumes. Accessing the entire file This is the case, for example, when the file is copied into the local SAP file system. The function is executed manually by the administration on the level of administration sessions. This means that the number of parallel retrieval requests to the storage area is relatively low, but that each request involves a large amount of data. The method of accessing archive files can be synchronous or asynchronous, depending on the release and the interface version. The table is only valid for the function of copying entire archive files into to SAP file system, not for block accesses to data in the storage area. Block accesses via a position and offset are always synchronous. In sum, the storage system you are using should meet the following requirements: It should support a high number of synchronous parallel requests. These requests should cover small and very large data areas.
SAP AG 2007
Page 48
Version 1.2
Storage hierarchy 1
HSM
Rule-based migration
Storage hierarchy 2
Rule-based migration
Storage hierarchy 3
Figure 17: Hierarchical Storage Management System (HSM) HSM systems can be accessed as if they were file systems, meaning that you do not need any special interface to be able to use an HSM system for writing or reading data. The HSM software takes over the automatic conversion of the file and folder paths, according to the predefined rules. As a result an HSM system can be accessed by any system as if it were a hard disk of potentially unlimited size. Through the transparent integration of different storage media and architecture, the HSM system presents itself as a single file system. The concept of an HSM fulfils two main objectives: Offering direct, standardized access to a storage pool that can always be expanded. The expansion takes place only in the HSM system. The application that is accessing the data is not affected by this expansion and does not have to be switched to a different file system or interface. Intelligent storage through intelligent use of different media, which helps users save on costs. The potential cost savings generated by HSM systems can best be demonstrated using an example: Within a given application data is created that needs to be accessed often and other is created that needs to be accessed less often. For example, an invoice booked in SAP ERP is accessed most often during the year it is first posted. At the end of that year the information is only needed for evaluation purposes. In the following years it is needed only in exceptional cases, for example for audits or test purposes. During the time the document is accessed frequently the data must be available online at all times, meaning that it must be either in the database or in archive files that can be accessed directly and quickly. Data that is accessed infrequently can also be stored on slower, possibly offline media, which is less costly. It is exactly this aspect that HSM systems aim to satisfy through the integration of different storage systems and the use
SAP AG 2007
Page 49
Version 1.2
of definable migration rules. As a result, data is migrated automatically and completely transparently for the user. It is not necessary for the data to be copied manually from one medium to another. A common architecture is a combination of hard disc, jukeboxes with optical discs and tape robots. In this scenario one rule could be that files, which are accessed regularly, are stored directly on the hard disk. If the access frequency falls below a certain threshold, the files are automatically transferred to optical discs; if the frequency falls below another threshold the data is moved to magnetic tapes.
SAP AG 2007
Page 50
Version 1.2
The list of criteria discussed below is not meant to be an exhaustive list. The list is merely meant to name a few of the criteria that could come into play when you choose your storage strategy. Each storage strategy should be analyzed based on the specific needs of a company and taking all criteria into account. 4.4.4.1 Security Under the criterion security you should pose especially the following questions: How secure is the path to the storage system? This question applies to the entire process starting with the creation of the archive files up to their final storage on storage media outside the database. It is essential that the created archive files are transferred to their final storage destination unchanged. This is especially important when archive files are moved from one system to another. The process must ensure that the correct data is transferred to the final storage medium. How safe is the storage medium? How secure a medium is depends largely on the type of medium being used, such as hard disc, MO, WORM, CD or magnetic tape. These can differ considerably in terms of security, performance and lifetime, and this must be taken into account in the purchase decision. Which backup options does the storage system offer? The storage solution itself, which generally also includes its own database, must offer an adequate concept for securing the data that is stored on it. Also, the backup options it provides must be integrable into your current system. If not, the implementation of your storage system will generate additional costs, because you will need to maintain and administer different backup systems. 4.4.4.2 Costs The costs of storage scenarios can vary considerably. Not only do the products differ in terms of their initial costs, but also in terms of the follow-on costs they generate in the company. These include money spent on support, training and knowledge transfer, as well as the expenses for having to adapt internal IT infrastructures if necessary. In a long-term view it is more important to look at the continuous operating costs instead of the initial costs, as the central cost component of a new storage solution. 4.4.4.3 Integration A new storage system should be integrable into an existing IT landscape both from a technical and an administrative point of view. Here it is also important to take the costs such an integration would generate into account. 4.4.4.4 Additional usage You should ask the question for what other purposes your new storage system could be used, besides for the storage of archive files. Usually storage systems are also used in the context of document management and data backup. Many systems used to store archive files offer other comprehensive features for document storage and management. Even though when you first implement a storage system your main focus may be data archiving, you should check, whether or not your company is planning to implement a document management or document integration strategy in the future. You can also use your data archiving storage system for backup purposes for data from other areas of your business. In sum a storage system can be useful for storing archive files, and also for backing up other file systems or databases in your company. 4.4.4.5 Performance From the point of view of the SAP system the purpose of the storage system is to make the stored data available to programs that want to access this data. This can encompass a sequential access to the entire archive file or a direct access to a single business object, whose location within the archive file is specified by the byte position and length of the used up space. In addition, it must support that archive files are copied completely to or are removed completely from the storage area. Generally these type of operations are possible with all types of storage systems: In the case of an HSM system you would use the corresponding system commands, and in the case of storage systems certified for SAP ArchiveLink through the requirements of the certification. 4.4.4.6 Long-term storage
SAP AG 2007
Page 51
Version 1.2
Especially when you store archive files you must make sure that the storage system you use is suitable for long-term storage. The data stored there must be accessible over several years. Statistically archive files are accessed less and less frequently as time goes by. The less an archive file needs to be accessed, the lower the requirements for access time, until eventually the files can be stored offline, to be accessed only via manual administration. As access time becomes less and less important, because data is accessed only seldomly, you can also lower the costs of your data storage, by using storage hierarchies. The data that is accessed infrequently or not at all can be moved to slower, but less costly data carriers. To be able to use this option your storage system must support this storage technology and provide time or data controlled automatic mechanisms for the migration of data from fast and expensive to slow and less costly media.
SAP AG 2007
Page 52
Version 1.2
storage media. When you need to access your data later, you can call up the corresponding print list. Here, however, the archive is not actually accessed. Index using the Archive Information System If the archived data needs to be accessed later, you can use the archive information system to search for, format, and display the data. For more information, see Chapter 5.4. Document Relationship Browser The Document Relationship Browser (DRB, Transaction AL01, User Role SAP_DRB) is based on the Archive Information System. You can use it to display the relationships between data within a business process or even across different business processes (see Chapter 5.5). Display from the application For some archiving objects you can display the archived data directly from the application in a business view. In this display you cannot see whether the data being displayed has been archived or whether it still resides in the database. The only difference is that the archived data can no longer be changed. The following chapter focuses on archive accesses using the Archive Information System and the Document Relationship Browser. We will also describe the use of sequential read programs. We will not go into further detail about print list storage, because it is only slightly related to the other topics.
SAP AG 2007
Page 53
Version 1.2
An example of a sequential read program is the program RKAARCS1, which belongs to the archiving object CO_ORDER (internal orders). This read program is available via the Read function in Archive Administration. After entering the selection criteria you can execute the program which brings up the dialog box for selecting the archive files. Keep in mind that the selection criteria do not influence which archive files are offered for selection. You will always be offered all accessible archive files for your analysis, no matter what your selection criteria. You should therefore make sure that the selection of archive files matches your selection criteria. Do not choose all relevant files, because this way not all of the data you want to see will necessarily be displayed. If you choose too many archive files, you may be facing a long response time, because the program reads all files through sequentially. As a next step the read program reads the selected archive files sequentially and filters the data according to the selection criteria you entered. The selection criteria do not affect the runtime of the program. The runtime is affected by the selection of the archive files. The contents of the archive file is usually displayed as a list. With internal orders you have the option to use this list to navigate to more detailed information. However, this is not very typical for this type of analysis. You can run a read program in the dialog mode or in the background. The scheduling of the program is similar to that of a delete program. The difference is that the read program requires a variant for the transfer of the selection criteria. The programs available in archive administration are usually dedicated archive read programs, you can also have programs that were originally developed for evaluations of database data, but were later extended to also perform archive accesses. A certain disadvantage of these programs is that users must know, whether or not the data is in the archive, and if so, in which archive file it is stored. The advantage is, however, that the data is displayed in a format that you are familiar with. An example of this are summary reports (Report Writer Reports) in overhead cost controlling. Use the function Data Source in the selection screen of this type of read program, to indicate that the data should be read from the archive. Here you also choose the archive file. From a technical standpoint the selection of the data source (database or archive) and the archive file to be read belongs to the selection screen, although the information for the data source selection is not seen directly on the selection screen. This means that when you save a selection variant the data source is also saved with it. This allows you, for example, to create a variant to select specific archives in the background. In the list that is displayed after you have executed the program, you cannot tell whether the data being displayed is in the database or in the archive.
SAP AG 2007
Page 54
Version 1.2
component of the Archive Information System); see Chapter 5.4.1 Creating an Infostructure. As the archive index, the infostructure is filled with data either directly during the delete phase or subsequently by the user. Also similar to an archive index, the data of an infostructure is kept in a database table. Another component of the Archive Information System is the Archive Explorer. It is used to search for data within an infostructure and allows you to access and display archived data directly. Each infostructure belongs to exactly one archiving object and also one field catalog. A field catalog is a collection of fields, which are used to index the archive files of a specific archiving object. The fields of an infostructure are always a selection of the fields of the corresponding field catalog. The field catalog also contains a series of technical properties that are also incorporated into the infostructure. Because of the field catalogs, it is not necessary for you to know the technical properties of your archiving object to create an infostructure. These properties are already contained in the field catalog. To create an infostructure you only need to select the fields from the field catalog.
Archiving Object
Field Catalog
Targ. Sourc Ref.Field Field Field
Archive Infostructure
Field Name Field Name Field Name Field Name Field Name Field Name Field Name Field Name Field Name Field Name Field Name Field Name
DB Tables
Figure 18: Field Selection in the Field Catalog In the following section we describe the different uses of the Archive Information System and provide some background to these uses. The steps are listed in the order in which a user or an administrator would normally perform them, when the Archive Information System is used for the first time for an archiving object. These functions are available through the central administration of the Archive Information System (Transaction SARI). For more information about the Archive Information System see the Application Help in the SAP Library.
SAP AG 2007
Page 55
Version 1.2
SAP AG 2007
Page 56
Version 1.2
the archive. The data saved in a generated database table use up disc space. For this reason it generally makes sense to delete data from older archive files, after a certain amount of time has passed. In contrast to the building function for archive infostructures, the delete function is not integrated into ADK. This means that the deletion of infostructures has to be started manually. This is particularly important when you reload archives. When you reload archived data (which should only be done in exceptional cases), the active infostructures of the data in question have to be manually deleted and other infostructures have to be manually built for archive files that may have been created during the reload.
SAP AG 2007
Page 57
Version 1.2
Search in DB When you select this option the program only searches for the documents in the database. Archived documents are completely ignored. Search in DB and SAP AS If you select this option the program searches for the documents in the database and in the abovementioned infostructures of the Archive Information System. The archive, however, is not accessed. As a result, not all fields may be filled in the results list, or not all desired records can be found, because the program views fields that are not contained in the infostructure as empty and skips them. Search in DB, SAP AS and Archive When you select this option the program searches for the documents in the database and in the Archive Information System. For documents that were found according to the Archive Information System, the data that may still be missing is searched for in the archive. This means that here, too, only documents are read that are in an appropriate infostructure. This selection option only determines what will be displayed in the results list of the program and not which linked documents DRB eventually finds. It is therefore possible that although you selected the option Search in DB, archived data will be displayed in DRB as linked objects. In many cases only the two options Search in DB and Search DB, SAP AS and Archive can bring back meaningful results if used. The option Search in DB and SAP AS is often faster than the latter option, but often times leads to confusing results, because the end user usually does not know which fields are contained in the infostructure and which effects this would have on the selection. Unlike in financial accounting, in logistics DRB does not display archived documents in the same way as database documents. However, the display transactions for archived documents were designed to be similar to the corresponding display transactions for documents still in the database. In addition, all the important fields are displayed. If the documents are still in the database the usual document display transactions, such as VA03, are used. All other logistics object types are connected to DRB in a manner similar to sales orders. The only differences are in the case of the field catalogs used, and in the case of those fields that can be used to make selections and that can be integrated into the infostructures. For more information see the documentation for the application-specific components of the Document Relationship Browser.
SAP AG 2007
Page 58
Version 1.2
First you must create a selection variant for every program you want to use. You can use the field properties of the selection variant to preset and hide the Search in field. If you start the program with this variant the user will not see these fields anymore on the selection screen and the desired value is used automatically. You can do the same for the entry lists of accounting documents and line item reports in cost accounting. However, here it is not possible to hide the fields for the data source selection, because either way they do not appear on the selection screen. They are saved together with the variant. Of course you can also control the entry lists for accounting and cost accounting documents via table ASACCESS01, as mentioned above. However, this would mean that all users are affected by any changes. If you really want to configure the system in such a way that the cost accounting line item reports automatically read from the archive for all users, then it would be preferable to make your settings via table ASACCESS01. After you have created variants for all programs you want to use, you can enter these programs in a role (transaction PFCG). If this kind of a program is called from the role it was assigned to, it starts automatically with the presettings of the variant. This way you can create a role that contains all programs that lead to the DRB, and which are configured in such a way that they automatically access the archive. You can also use this mechanism to pre-enter other selection criteria than those we have named. 5.5.3.2 Choosing Entry List Fields Many of the programs contained in the role Document Relationship Browser, were created with the help of the ABAP List Viewer. Whenever a list is displayed, you can change the layout, save the changed layout and set a layout as default. These settings can be user-specific or can be valid for all users. 5.5.3.3 Choosing Object Types To Be Displayed Usually the representation of complex business events and processes is also complex in DRB. Also, due to the large number of object types that DRB supports in SAP ERP, you could face long runtimes when you are determining relationships, because the program tries to determine all links, even if the user does not need all object types. For example, a user may be interested in the logistical chain of business processes, but not for all details in accounting. Here it would make sense to simply hide the unwanted object types. The tool for configuring a more selective display is the personalization. Depending on whether the settings are valid for individual users or for a role, you can call up the personalization tool either via User Maintenance (transaction SU01) or Role Maintenance (transaction PFCG). Settings made for a role can be automatically copied to all users that are assigned to this role. In the role Document Relationship Browser" the selection of object types is set so that all object types are displayed. When you hide object types, keep in mind that the documents in question are not only removed from the display, but also that they cannot be used for determining other relationships. This means that not only those objects that were explicitly hidden from the display are excluded from the display, but also the objects that are dependent on the hidden objects. 5.5.3.4 Choosing Fields in DRB In the DRB navigation tree the default display only includes the type and description of the object. You can expand this display by adding other relevant fields. In addition to the technical fields of the object key and the object type, particularly two fields are important here: The field Logical System shows in which logical system the data originates. This is relevant when you are looking at cross-system processes or business events. In the context of data archiving the field Origin is particularly relevant. It shows whether a displayed business object is in the database or in the archive. Similar to the procedure for entry lists, here you can use layouts to make your field selection, save userspecific layouts, and create defaults. 5.5.3.5 Improved Performance in the DRB Tree When displaying the document relationships in the DRB tree, DRB always determines the links one step ahead of the step that is currently being displayed. The expand symbol only appears for those nodes that have a subnode. This process may negatively affect performance when you call up DRB and when you try to expand documents in the DRB tree. This can be prevented by choosing the Optimum Performance option in the DRB personalization function under Settings for DRB Tree. If this is set DRB only determines the nodes that are to be displayed. You can also choose the Optimum Display option if performance is not an issue. For further measures to improve the runtime of DRB, see SAP Note 558462 and 497820.
SAP AG 2007
Page 59
Version 1.2
Appendix A. Glossary
The terms included in this glossary are important for understanding the archiving concept introduced in this handbook. For other terms related to data archiving, see the SAP Terminology Database. Storage system A storage or archive system connected to the SAP System via a certified interface. It is used to store the archive files that are created during the write phase. A storage system can contain a variety of different storage media, but in general it is based on optical storage media such as CD ROMs and WORMS. ADK, see Archive Development Kit Archive The total of data saved in archive files. Archive administration A function that is called using transaction SARA. It is the central starting point for most user activities in data archiving, such as the planning of write and delete jobs, building and deleting indexes, and storing and retrieving archive files. Archive file A file that is created in the file system of the SAP system by the write program. It contains the archived data. The maximum size of an archive file can be determined by the user. It can have one or several data objects and belongs to exactly one archiving session. Archive Development Kit Abbreviation ADK. The technical framework and basis for the SAP data archiving concept. ADK is a software layer between the SAP applications and the archive. It is the runtime and administration environment for most of the functions of SAP data archiving. ADK also provides a programming interface (ADK API) for the development of archiving programs by SAP or customers. ArchiveLink The interface used to control communication between an SAP application and an external storage system. It can be used to store data and documents created within SAP applications, and also to access this data. Archive Information Structure The central element of the archive information system. It is a kind of index which is created based on a field catalog (see Field catalog) and is used to find archive data in the archive. Archive Information System Abbreviation AS. A generic tool used to conduct searches in archives. It is integrated into the data archiving process. Searching and displaying the data is based on archive information structures, which are defined by the user and can be filled with data from the archive. Archiving session An archiving unit which is made up of a write and delete phase and an optional storage phase. In addition to the actual archiving process the data set written to the archive during the write phase is also called the archiving session. This session can be viewed as a whole in archive management under a unique ID, the archive session number. Archiving object A logical object that encompasses all the application data linked through a business process that must be written to archive files and then deleted from the database. In addition it comprises the corresponding archiving programs and the Customizing. Archiving programs The general term for all programs used during archiving, such as write, delete, read and reload programs.
SAP AG 2007
Page 60
Version 1.2
Archiving class A mechanism used to archive certain data shared by different business objects, together with data objects from applications. Examples of archiving classes are SAPscript texts, change documents and classification data. Archive management The part of Archive Administration in which archiving sessions and archive files for a specific archiving object are displayed and managed. It can be reached via Archive Administration (transaction SARA). Business object The representation of a central business object from a real life scenario, such as an order or an invoice. From a technical viewpoint a business object is an instance of a business object type with concrete values. Business objects are managed in the Business Object Repository (BOR). See also Data object. Code page Vendor- and hardware-specific coding that assigns a hexadecimal value to each character in a character set. Data object A logical data processing unit used in ADK. It contains application data that is linked through a business process. During data archiving it is written as a whole to the archive file by the write program It is similar to a business object in an application context (see Business object). The archiving object determines from which database tables data records are incorporated into the data object (see also Archiving object). Direct access Also called Single document access. A read access to individual data objects in the archive. The pointer is positioned at the beginning of the data object within the archive file. Here only the data object specified during the selection is read (for example Invoice). This method uses an index, which can be generated with, for example, the Archive Information System. Document storage The electronic storage and management of documents, such as original documents, outgoing documents, and print lists, on an external storage system. See also Optical archiving. Print list The result of an application program, presented in list form. The print list can be printed on paper and/or stored in a storage system using SAP ArchiveLink. Field catalog A collection of fields that can be used to create or maintain an archive information structure. HSM system The abbreviation used for Hierarchical Storage Management System. This storage solution automatically distributes data according to individual rules (for example frequency of data accesses) across a hierarchy of different storage media (such as hard disc, MO disc, magnetic tapes). To the accessing system the HSM system is a file system that stores files under a logically unchangeable file path. Infostructure, see Archive Information Structure Jukebox An automated storage unit with different storage media. It is composed of disc drives, storage compartments, and a robot mechanism that automatically changes the optical discs. It facilitates access to comprehensive data archives, without the need for intervention by an operator. Knowledge provider Abbreviation KPro. A central SAP Basis service of the SAP NetWeaver Application Server, used for the storage and management of any documents or document-like objects. KPro, see Knowledge provider Read program
SAP AG 2007
Page 61
Version 1.2
A program that reads and evaluates archived data or outputs this data in list form. Delete program A program that first reads the data previously written to the archive by the write program and deletes the corresponding data from the database. Delete job The execution of the delete program in the background. The delete job can be executed before or after the archive file has been stored in the storage system, depending on the Customizing settings. Delete phase The part of the archiving session during which data is deleted from the database. The delete phase of an archiving session begins with the start of the delete program for the first archive file. It ends when all files that belong to the archiving session have the status Deletion Complete. Meta data The information stored in archive files, which is used to achieve a platform-independent storage and interpretation of the stored data. Examples: Schemas of the database tables, data type and length of a column, number format, code page. Postprocessing program A program that can be executed after the delete phase to perform further application-specific procedures on the data in the database (for example the updating of statistics). Offset A value in relative addressing that specifies how far a certain element or a specific position is located away from the starting point. The starting position of a data object in an archive file is specified by an offset. Optical archiving A widely used, although unfitting, term for document storage. The storage system used in this process is based on optical media, such as CDs and WORM, which is why the expression refers to "optical" archiving. Residence time The amount of time that has to have passed, before application data can be archived. The basis for calculating the residence time can be the entry date, the posting period, the goods issue date, depending on the application. It is usually expressed in number of days. Reload program A program that reads archived data from the archive and reloads it into the database. SARA, see Archive Administration Write program A program that selects the data to be archived in the database and writes them to one or more archive files. Write job The execution of the write program in the background. Write phase The part of the archiving session during which data is written from the database to an archive file. The write phase of an archiving session begins and ends with the execution of the write program. Sequential reading A read access to archived data. The pointer is positioned directly at the beginning of the archive file and is moved on sequentially. The Archive Development Kit transfers these data objects to the read program, which compares the data with the selection criteria entered by the user and evaluates or outputs the matching data. The read process is completed as soon as the end of the file has been reached. When
SAP AG 2007
Page 62
Version 1.2
several archive files or archiving sessions are evaluated the read process ends when the last file end has been reached. Management data The additional information about archive files and archiving sessions. Management data is stored in the database. Examples: Number and size of data objects, archiving status, logical file path, physical file name, all archive files that belong on one archiving session, etc. Preprocessing program A program that can be executed before the write phase, to check the data for archivability and to prepare the data for archiving, by, for example, setting a deletion indicator.
SAP Library
The following table contains an overview of the archiving documentation in the various solutions or components of the SAP Business Suite. You can find this documentation either via the SAP Library or the SAP Help Portal (http://help.sap.com). General documentation on SAP data archiving General documentation on SAP data archiving (XML) SAP NetWeaver AS SAP ERP SAP CRM
SAP NetWeaver BI
SAP NetWeaver Library SAP NetWeaver by Key Capability Solution Life Cycle Management by Key Capability Data Archiving (CA-ARC) Introduction to Data Archiving SAP NetWeaver Library SAP NetWeaver by Key Capability Solution Life Cycle Management by Key Capability Data Archiving (CA-ARC) Introduction to Data Archiving XML-based Archiving SAP NetWeaver Library SAP NetWeaver by Key Capability Solution Life Cycle Management by Key Capability Data Archiving (CA-ARC) Data Archiving in SAP NetWeaver AS SAP ERP Central Component Scenarios in Applications Data Archiving (CA-ARC) SAP Customer Relationship Management Components and Functions Basic Functions Data Archiving SAP NetWeaver Library SAP NetWeaver by Key Capability Information Integration by Key Capability Business Intelligence Data Warehousing Data Warehouse Management Information Lifecycle Management Data Archiving Process
SAP AG 2007
Page 63
Version 1.2
Helmut Stefani SAP Press 405 Pages, 2007 ISBN 978-1-59229-116-8 69.95 USD
Training Courses
BIT660 Data Archiving Three-day introductory course to the technology and concepts of SAP data archiving. Data archiving is explained using examples from the most important SAP ERP and SAP NetWeaver components. BIT670 Data Archiving (Programming) This two-day course explains how to develop read and analysis programs for archived data. BIT614 Document Management at SAP: An Overview Overview of the different document management options in an SAP system. The focus of this two-day course is the storage and management of documents using different SAP components such as SAP Content Server, SAP ArchiveLink, Records Management, Knowledge Warehouse, etc. BIT615 Document Storage using SAP ArchiveLink Introduction to document storage using SAP ArchiveLink. This two-day course discusses the storage of original documents in a storage system and their relationship with SAP business objects. BC680 Data Retention Tool Introduction to the Data Retention Tool (DART). DART is a tool used in the context of GDPdU to help provide data extracts to financial authorities during audits.
SAP AG 2007
Page 64
Version 1.2
SAP AG 2007
Page 65