Captiva PDF
Captiva PDF
Captiva PDF
5 Tutorial
M. Scott Roth
February 2012
v1.0
www.armedia.com
msroth.wordpress.com
Table of Contents
1 Introduction .......................................................................................................................................... 5
2.2.1 IA Values.............................................................................................................................. 11
3 Module Setup...................................................................................................................................... 21
3.1.1 Scanner................................................................................................................................ 22
3.3 OCR.............................................................................................................................................. 25
3.5.4 Folders................................................................................................................................. 31
3.6.2.2 Mappings......................................................................................................................... 36
5 Wrap Up .............................................................................................................................................. 47
First, there is virtually zero information on the Internet or the EDN about how to setup
and develop a solution using InputAccel. One good website I found, EMCCaptiva.net,
was all about version 5.3 of InputAccel. InputAccel went through a major redesign with
version 6, so not much of this website was relevant.
The documentation that comes with InputAccel can be overwhelming. Much of the
documentation is contained in hyperlinked Windows help files and, at least to my mind,
doesn’t provide enough context or flow to function as a tutorial.
There are at least five different ways to configure and deploy InputAccel discussed in the
installation documentation. However, the discussions are all predicated upon you
knowing something about InputAccel and don’t offer any best practices or rules of thumb
to get you started.
The CaptureFlow Designer, the new GUI design tool that was released with InputAccel
version 6.0 has no documentation, though it does have online help and some sample
processes that are helpful. Because this tool is so new, no one on the Internet or EDN
professed knowledge of how to use it either.
Any process built in Process Developer (the previous InputAccel process development
tool) is not compatible with CaptureFlow Designer. This means there is no way to
reverse-engineer an existing process into the new tool for learning purposes.
There is not a unified debugging environment. Tracking down an error or root cause of a
failed process step requires checking several logs and servers.
And last, much of the “development” of an InputAccel process is configuration of the
modules. However, most of this configuration is hidden and not obvious. There is also
no unified development/configuration environment, so I was continually bouncing around
among the various modules to get the process configured and running.
Here’s why developing, configuring and deploying an InputAccel solution was easier than I
expected (the caveat being: “once I figured it out”):
First, the CaptureFlow Designer GUI tool is easy to use once you get the hang of it. You
simply drag modules from a pallet onto a workspace and connect them to form your
process. Hopefully future versions of the tool will continue to refine its function and
expand its capabilities (e.g., better error handling, better custom variable configuration,
direct access to module setup, etc.).
The purpose of this paper is to describe the simple process I designed and implemented using
CaptureFlow Designer, various InputAccel modules, and the InputAccel Administration
Console. I have tried to capture the “gotchas” I encountered and tips for avoiding them or
remedying them, and advice given to me by seasoned InputAccel developers. My hope is that
someone will benefit from this narrative.
One note to make here: the users were used to scanning and indexing their own documents.
This setup is a little different from how InputAccel usually works. Usually, the tasks in an
InputAccel process are executed by different people in more of a pipeline paradigm. For
example, scanner operators do nothing but scan documents and make sure the quality is
acceptable. Once they release the document, it flows to a QA person who applies image
enhancement filters and techniques to improve the quality of the documents. QA releases the
document to someone who reads the document and applies the indexing metadata, etc.
In my process, a single person performs both the scanning and indexing. All of the other tasks
are automated. The result of this configuration is that when users apply indexing metadata to
their documents, they see all the other users’ documents in the IndexPlus module’s queue. For
my customer this was okay, but it is not the normal implementation of InputAccel.
In addition, there is a branch in the logic I have not depicted here, but will discuss in Section 2.1.
During the Indexing step, if the user fails to enter a ‘Case Number’, the ‘Update ODBC
Database’ step is skipped.
Once all the steps of your process are in place and in the right order, right-click on each one to
rename them. The name you give each step here will be the name used later by the InputAccel
server while processing documents.
In addition to naming your steps, you must also indicate at what level each step is triggered. You
do this by right-clicking each step and choosing the Level option. Your choices of trigger level
vary by step, but are generally between 0 and 7. Essentially you are telling each step what level
in the stack of scanned pages each operation should occur. For example, documents are scanned
and released to the InputAccel server in batches. Image enhancement is applied to each page in
the batch. Indexing occurs on a per-document basis. See the Captiva InputAccel document,
System Overview: The Basics of InputAccel, for a good description of what these trigger levels
are and what they do.
IA Step Level
ScanPlus Batch (Level 7)
ImageEnhancement Page (Level 0)
OCR Document (Level 1)
Index Document (Level 1)
DocumentumExport Document (Level 1)
UpdateDatabase Document (Level 1)
End Delete Batch
When you insert steps into the CaptureFlow Designer process, some of these variables are
defined for you. For example, when you inserted the ImageEnhancement step after the Process
node, CaptureFlow Designer automatically assigned ImageEnhancment:0.ImageInput
= ScanPlus:0.OutputImage. This assignment passes the scanned pages from the
ScanPlus step to the ImageEnhancement step. Without this assignment, the
ImageEnhancement step would never trigger.
2.2.1 IA Values
InputAccel has hundreds of pre-defined variables that can contain or receive values throughout
the process. These pre-defined values are called “IA Values”, and just like steps, they exist and
operate at different levels. You can find descriptions of many of these variables by clicking the
Help button in the CaptureFlow Designer tool bar, and searching on “IA Values”.
When you click the <IA Value> link on the Assign to line in the Assign Values window, the
assignment window expands and the autocomplete feature engages to display the names of all of
the steps in your process. Simply select the step that contains the IA value you would like to set.
Next, you will be presented with a numbered list, 0 – 7. These represent the trigger levels at
which the IA values are defined. Select a level and autocomplete will display a list of all
available IA values at that level. Choose a value and click OK to close the assignment window.
Once you have completed an assignment, you will notice a More… link on the right side of the
Assign Values window for the assignment you just made. If you click on the More… link, the
assignment window will expand to display some options for applying conditions to your
assignment. Note that the condition must evaluate to TRUE or FALSE.
Simply define the variables you will need in your process. These variables will be available for
assignment in the Assign Values window, the same as IA Values. Note that the Level column
assigns these variables to a trigger level of the process, the same as the levels discussed in
Section 2.1.
2.2.3 My Assignments
The key assignments I made in my process to pass content and data from one step to the next
using the IA Values and Custom Values are:
2.3 Branching
Creating a branch, or an if-then decision, in your process is easy. Simply drag the Decision node
from the CaptureFlow Designer Steps pallet into your process (see Figure 6). The interesting
part of the branch is setting up the condition to cause the branch to execute.
To enter the condition that will cause the branch to execute, click the Condition1 link. This will
open the Condition Editor. Give the condition a name (this will appear on the process diagram)
In my example, I created a condition called “Has Case Number” with an expression:
CustomValues:1.Case_Number <> “”. So, when the process gets to the branch node,
the InputAccel server will look to see if the Custom Value Case_Number has a value. (This
value will be entered in the indexing step discussed later in Section 3.4.) If it does, then the
branch is executed. Otherwise, the branch is ignored.
2.4 Compiling
When you save your process in CaptureFlow Designer, the file you create is an XPP file. To
prepare the process for deployment to the InputAccel server, compile the process by selecting
Compile from the CaptureFlow Designer Quick Access Toolbar. Compiling your process will
create an IAP and IPP file. Either of which can be deployed to the InputAccel server (see
Section 2.5).
Note that IPP files can be loaded into Process Builder, InputAccel’s legacy design tool. Loading
an IPP file in Process Builder will reveal all of the code CaptureFlow Designer auto-generated
for you. However, any changes you make to the process in Process Builder, cannot be loaded
into CaptureFlow Designer.
2.5.1 Installation
Unlike the Process Builder tool which allowed you to install your compiled process directly from
the development environment, CaptureFlow Designer has no such capability. To install a
CaptureFlow Designer process, you need to logon to the InputAccel Administration Console and
manually add the process. See the online Captiva Administration Guide for details pertaining to
this procedure.
Once a process is installed on the InputAccel server, you can update it by right-clicking on it and
choosing Add Upgraded Process from the context menu (see Figure 7). A major caveat here is
that you can only upgrade processes that have not had steps added, deleted or rearranged.
Otherwise, this procedure will create a new process on the InputAccel server for you, instead of
updating the selected process. Processes with changed IA Value expressions or new/deleted
Custom Values upgrade with no problem;; just don’t change the steps once they are defined and
installed. If the steps in your process change, you can copy the module settings from one process
to another. See Section 2.5.2.
The copy/paste feature is a huge time saver. You will have a much greater appreciation for it
after reading the configurations performed on each module discussed in Section 3.
To open a role, select it and then click the Settings button (see Figure 10). To understand the
role permissions available, see “Roles” in the online Captiva Administration Guide. The lower
two boxes on the screen (Available Members and Selected Members) are used to add or remove
users from this particular role. To add a user not already in the Available Members box, click
the Find Member button.
The Find Member button opens a search screen that will allow you to search the local machine
or domain for users to add to this role. (see Figure 11). Enter the necessary information to find
the user and click the Search button. The search will populate the Results list box when it
completes. Select a user and click OK to add them to the role.
Any user who launches a module must be a member of the “Module Operator” role, this includes
users who will launch modules manually, as well as system users who will run unattended tasks
(e.g., OCR). In addition, any user who is doing scanning should be a member of the “Scan
Operator” role and any user doing indexing should be a member of the “Index Operator” role.
So, in my example process, all of my users are members of “Module Operators”, “Scan
Operators”, and “Index Operators”. My local system account on the IA Module server is a
member of the “Module Operators” group because it will run the unattended tasks.
If you prefer to launch the modules manually, you can do so from the command prompt. There
are two types of modules used by InputAccel: EXE modules and DLL modules. The EXE
modules can be launched directly. The DLL modules must be “hosted” by a program named
QuickModuleHost. The table below lists the syntax to launch each of the modules in my
process in setup mode. Note that you will need to know the process ID for which you are
configuring the module. The process ID can be found on the InputAccel Administration
Console. For my process, the process ID is 41601, and the InputAccel server name is
IAServer.
One final note about setting up modules is that I have had the best results launching the modules
in setup mode from the actual machines where the modules will be running, while logged in as
the user who will be running the process step. For example, I setup the ScanPlus and IndexPlus
modules from my workstation where those modules will be run by me. The ImageEnhancement,
OCR, DocumentumExport and ODBCExport modules I launched from the IA Module Server
where they will be run by a local system account. To accomplish this, I logged onto the server as
the user who will be running the modules, then accessed the InputAccel Administration Console
and double-clicked the module to launch it in setup mode.
3.1 ScanPlus
I have found that there are three important areas to configure in the ScanPlus module: Scanner,
Event Actions, and Auto-Batch Creation.
3.1.1 Scanner
Creating a scanner configuration and profile is straighforward. See the online help for ScanPlus
for detailed information. It is important that the you create a scanner configuration profile for
each scanner in your system, meaning if each user has a different model or make of scanner,
create a profile for each of their scanners (see Figure 12). This will make deployment much
easier.
In my process, the image enhancement step applies a de-skew filter (see Figure 15) and occurs
automatically with no user intervention. You can configure the specifics of each filter by right-
clicking on document thumbnail in the Filter list and choosing Properties.
If you are using a barcode detection filter it must be the first filter applied to the image or it
won’t detect the barcodes. If you do configure a barcode filter, note that it won’t evaluate the
sample image until you click on the filter thumbnail in the Filter list. I have had the best results
using the “Extended Barcode Detection Filter” and setting the Decode symbols option to off
(i.e., unselecting it).
The only OCR configuration needed for my simple example is to create a full-text searchable
PDF file. This is accomplished by creating a New Format on the Output Formats tab. For the
output Format, choose “Adobe PDF with image on text”, and set the Level to “Auto”. Figure 16
depicts this configuration.
In my process, the OCR step occurs automatically, with no user intervention. Recall from
Section 2.2.3 where I assigned IA Values, that the output from this step (i.e., the PDF file) is the
input for the Documentum export step.
DocumentumExport:1.InputFile1 = OCR:1.OutputFile1_OutputFile
One caveat with the OCR module: if you have sparse content with diverse character sets,
sometimes the module gets confused about which recognition engine it should use and can throw
memory fault exceptions. For example, I have noticed in my system that when I have computer-
generated fax coversheets with a printed header in the top third of the page, and a hand written
personal note below that, the OCR module can’t decide which recognition engine to use, print or
handwriting. Content of this type frequently throws memory fault exceptions. I believe it is
because there is not a large enough sample set for the module to determine which recognition
engine gives the best results. To alleviate this problem, I manually set the Recognition engine to
“Machine Print (OMNIFONT_MTX)”. See Figure 17.
3.4 Indexing
The Indexing step is where users enter metadata about each document scanned, for example: a
memo’s subject, or the sender and recipient’s names, or the type of document. Each field that is
indexed should correspond to a Custom Value or an IA Value discussed in Section 2.2. These
values can then be used later in the process or exported to Documentum or a database.
To configure the Index step, launch IndexPlus in setup mode and click the Settings tab, then the
Settings… button.
In my example, I only request that the user enter a Case Number (optional), and the Document
Type (required). The Case Number entered is stored in CustomValue:1.Case_Number
variable and later used to determine if an external database needs to be updated. The Document
Type entered is stored in CustomValues:1.Doc_Type variable and is later passed to
Documentum as an attribute of the content object.
To add indexing fields, first create an Index Family in the Index Family window on the left. Be
sure to un-check the Generate value names automatically checkbox. Then add indexing fields
in the window on the right using the Insert … button at the bottom of the widow. See Figure 18.
Following are the specifications for the Case Number indexing field:
Following are the specifications for the Document Type indexing field:
3.5.3 Cabinet
The first step in the configuration task was to define the cabinet in which to build the folder
structure. Right-click on the node representing the repository and choose Add definition… ->
Cabinet. On the Object tab (see Figure 21), make the following settings:
3.5.4 Folders
Right-click the cabinet node (called ExistingCabinet0) and choose Add definition… -> Folder. A
new folder is added to the cabinet, this will be the year portion of the storage path. Select the
folder to display the definition settings for it. On the Object tab (see Figure 22), make the
following settings:
Now, click the Attribute tab to define the attributes that will be set for the new folder object (see
Figure 23). Click the Populate Documentum system attributes checkbox and then click the
New button. In the Attribute name column, select object_name from the drop down list. In the
Value cell, enter or select @(CustomValues.y). Recall in Section 2.2.3, we defined the
CustomValues:1.y value to contain the current year.
Select the NewFolder0 node and create a subfolder for the month portion of the storage path in
the same manner you created the year folder. The value for its object_name attribute should
be @(CustomValues.Month).
Repeat the process once more for the day folder. Give the object_name a value of
@(CustomValues.Day).
Select the last folder you defined, right-click and select Add definition… -> Document. On the
Object tab, make sure the Object type is set to Document (dm_document). All of the other
settings on this tab should default to the same settings as the folders you configured.
For my correspondence, I set two attributes, object_name and title, using IA and Custom
Values entered or derived earlier in the process. Click the Attribute tab, make sure the Populate
Documentum system attributes checkbox is checked, and click the New button. Add the
object_name attribute and the title attribute. For the object_name attribute, assign it a
value of @(Batchname), and for the title attribute, assign the value
@(CustomValues:1.Case_Number). See Figure 24.
The @(Batchname) IA value contains the name of the scan batch which is controlled by the
auto-batch creation template we defined in the scanning module (see Section 3.1.3). The
@(CustomValues:1.Case_Number) contains the case number entered by the user in the
Index step (see Section 3.4).
In the Export content actions window, select Export content files with these settings. In the
File Value field, choose DocumentumExport.InputFile1 as the source of the content file
to export. This IA value contains the scanned image after it has gone through the Image
Enhancement filters we defined in Section 3.2. Recall that we specifically assigned the value of
DocumentumExport:1.InputFile1 = OCR:1.OutputFile1_OutputFile back
in Section 2.2.3.
Next, choose to Copy the file without modification (source can be any type). Also choose
Acrobat PDF (pdf) in the Documentum content type section.
There are other options here. If I had assigned a “raw” type to the
DocumentumExport:1.ImportImage variable, and selected Save the file with these
settings (source must be an image type) option, we could have the Documentum export module
convert the file to the type specified in the File Type field (e.g., a searchable PDF). For more
information about this option, see the Documentum Advanced Export Module’s online help
guide.
3.6.2.2 Mappings
Click the New… button in the Mapping panel to create a new mapping (i.e., a new query). The
Edit Mapping window will appear (see Figure 27).
In the Data source field, select the DSN you configured in Section 3.6.2.1.
In the Export from Level field, select Document. Remember from Section 2.1 that
InputAccel maintains scan information at numerous levels (e.g., batch, document, page).
We are configuring the ODBC Export module to trigger and export information on a per
document basis.
In the Action panel, choose Run SQL. This is not the obvious choice since we will be
writing an “Insert” query to update the database. However, I have determined that the
Run SQL selection here is the easiest option. I have encountered problems using the
“Insert” action that are magically resolved by selecting Run SQL.
Note: the table in my database is named test and only contains the two columns specified
here: Case_Number and Documentum_Id. The steps for creating this table in your database
are left to your abilities.
3.6.2.2.2 Parameters
Below the query, in the field mapping area, map the input parameters as follows:
3.7 Multi
The Multi module was not directly included in our CaptureFlow Designer process, but was
indirectly included when the End node was configured to Delete Batches. Multi does not require
any configuration, but needs to be running on the IA Module Server in order to delete batches as
they complete.
If you would like to know what value the Image Enhancement step collected for the barcode
value, select the ImageEnhancement step in the Step Values node of the tree, and scroll the
right-hand window down to BarText0 to observe the value.
If you select a node in the Tree Values node of the tree, the Filter in the right-hand pane becomes
active. This will allow you to filter the values displayed for that node by step. In my example,
the most useful view is to select the Node 7 – Document 1 node, and filter on CustomValues.
This view shows me that the variables I setup in Section 2.2.2 for capturing date elements are
working correctly. You can also change a value in this view by right-clicking on it and choosing
Edit Value. This changed value will now move forward in this process. This is a great tool for
correcting a variable what was not set correctly by the process, or to force the process to behave
in a certain way. For example, I could enter a Case_Number here to cause the process to branch
You can also view the image file, copy it from the server, or replace it with a different file by
right-clicking on one of the IA Values that references the image (e.g., Node 7 –
Document.OutputFile1_OutputFile). This can be useful if you want to view or manipulate an
image on your workstation, outside the context of the batch or InputAccel completely.
To clear an error and retrigger a step, right-click the step and choose Clear Task Errors and then
Retrigger. This will retrigger the step. If you want to skip this step, you can set the IA Value
RetriesLeft to 0 on most steps and retrigger. There can be ramifications for doing this if later
steps in the process depend upon the output of the skipped step; caveat emptor.
4.1.4 Logs
The InputAccel server keeps a log of all activities and errors it executes and encounters. The log
file can be accessed by clicking the Reports / Logs button on the InputAccel Administration
Console main page, and then selecting the View Logs link (see Figure 32). By default, log
messages show up in the far right column. Double-clicking a log entry will open the entry so the
details of the message can be viewed (see Figure 33). Unfortunately, sometimes the log
messages coming from the modules are incomplete or truncated. Therefore, reviewing the logs
is hit-or-miss as far as being a usable debugging tool.
The Log Screen, discussed in Section 4.1.4, gets its data from the Tbl_AuditErrorLog
table. Also:
There are numerous tables that deal with users, roles, and security as well as reporting.
Sometimes, viewing this data directly can aid in debugging a process.
ODBC tracing will write numerous messages to the trace file you indicate in the Log File Path
field. Unfortunately, ODBC trace files are not the easiest files to decipher. However, for gross
errors, you should be able to figure out the issue from the entries in this log file.
Also, many of the InputAccel modules have the option to save a log file if they are run in client
mode (as opposed to unattended mode).
Having struggled with the environment, the architecture, the paradigms, and concepts
encapsulated by InputAccel, I have formulated a few suggestions that I think would make
developing InputAccel solutions easier and more intuitive.
If CaptureFlow Designer is the central tool for development, then all activities involved with
development should be brought into the tool. For example:
You should be able to install a process from CaptureFlow Designer after it has been
compiled. Process Developer allows you to do this.
After a process has been installed, you should be able to launch each module for setup
from CaptureFlow Designer to continue the development process.
CaptureFlow Designer should provide the same debugging facilities as Process
Developer; specifically, step-by-step debugging of a batch, and the ability to load an
exported batch.
CaptureFlow Designer needs a better built-in facility for creating “Scripts” for modules
that is more on par with the capabilities of Process Developer.
Log rules and reporting should be more intuitive and f possible, brought into the
CaptureFlow Designer tool.
Perhaps some of these capabilities will be realized in Captiva v7 expected late in 2012.
This document contains the extent of my experiences and best practices (as I have come to know
them). Hopefully the knowledge I have gained will be helpful to someone as green as I was
when I started my project.
5.2 Thanks
I want to thank Lee Grayson (Armedia) for his longsuffering patience with me and expert help in
getting me through my first InputAccel project. Much of what I learned and documented here
was from Lee or based upon knowledge I gained from one of his suggestions. I also want to
thank Jesse Rauch (EMC\Captiva) for all of his help, suggestions, and support with my project.
<SDG><