SIPAC Signal Intelligence Processing
SIPAC Signal Intelligence Processing
SIPAC Signal Intelligence Processing
SIPAC framework
o
Platform independent
Interactive analysis
Trainable classifiers
Impressions
Speech
Text
Automated
content
processing
Scalable
hardware
Classifier
training
support
Automated
processing
Image
SIPAC
User
specific
extensions
Interactive
analysis
Flow
management
Benefits
For the user
One interface for processing all types of information (speech, text, image etc.)
More time for more complex tasks due to savings on routine tasks
References
SIPAC has been installed for various customers with different requirements spanning diverse system sizes from the stand-alone workplace system based on a single high-end notebook to the
scalable system with up to 30 client workplaces on high-performance cluster architecture.
SIPAC was introduced to the market in 1999. New releases are launched every year.
Speech
Text
Automated
content
processing
Scalable
hardware
Classifier
training
support
Automated
processing
SIPAC
Interactive
analysis
Image
User
specific
extensions
2.
3.
Also user-owned software can be used inside SIPAC. This software can be embedded by the
user himself or by MEDAV.
4.
Flow management typically tries to compose procedures and processing elements. The goal is
to automatically analyze, filter, dispatch, channelize and process masses of data. Hereby, the
challenge is to not only plainly concatenate the stages of processing but to set the processing
into a context and relationships. This works up to complex tasks and achieves high throughputs, efficient information mining and useful limitation of the output to concentrate important information. For example, if the language of a file is determined, the following software may use
this knowledge for language specific processing.
5.
SIPAC can be used for interactive analysis, automated processing and classifier training by
calling the provided services. Services are for example file content analysis or language detection of speech samples.
6.
the parts of a SIPAC system, especially the archive server, the processing server and the
clients
System parts
A SIPAC system consists of the following software parts:
The SIPAC processing server for data processing, analysis and classification
SIPAC slave(s) for distributed processing tasks to different computers and/or operating
systems
SIPAC client(s) interface for users to control the servers and tasks
For one SIPAC system one archive server, one processing server and one client is necessary.
Bigger systems may consist of for example one archive server, one processing server, 30 slaves
and 10 clients.
SIPAC processing server
The SIPAC processing server is the central processing unit of the system. More hardware can be
added for faster processing or extended processing capabilities. For each computer attached to
SIPAC, a slave software license is required. If SIPAC slaves are available with the required functionalities, the SIPAC processing server is able to distribute tasks across the attached computers
for optimal performance.
The flow management of the SIPAC processing server is based on user definable program
flowcharts (simply also known as flows) written in a XML-based command meta-language. The
command set covers all functions needed for controlling the automatic processing. The interface
for changing or creating flows is very intuitive and the system can support the user when manipulating the flows.
The SIPAC server provides an Application Programming Interface (API) with connection via
TCP/IP.
SIPAC archive server
The SIPAC archive server is the data storage of the system. All input data and metadata can be
stored there. For fast data access, it provides an up-to-date instrument of retrieval, such as categorisation of data and full-text-search.
The core functions of the archive server are:
SIPAC clients
The clients run the graphical user interfaces for both servers, the SIPAC processing server and the
SIPAC archive server. The clients are customized and provide specific graphical interfaces for the
different tasks (classifier training, flow definition, service application, monitoring, results and data
retrieval). Functions for system administration are also available.
For each operator workplace, a client license is necessary. For additional tasks, additional licenses
are necessary:
All components communicate via TCP/IP protocol. If distributed machines are in use, LAN connection is required accordingly.
Tool sets
Different tool sets can be plugged into SIPAC. The tool sets can be bought from MEDAV as commercially available packages (available tool sets: Common, Speech, Text, ARGUS, Network see
next pages) or created and compiled on demand either by MEDAV or by the user himself. Third
party products can be part of a tool set. Available tool sets are described on the next pages.
User development kit
Optionally, a user development kit (UDK) is available. Using the UDK, the expert can embed own
software into the SIPAC environment. After including the customized software, the new features
will fit seamlessly into SIPAC and can be used for processing.
The UDK works command line-oriented. Programming knowledge is necessary. An expert training
is available.
Hardware
The use of hardware is very flexible: notebooks or standard computers can be used for small systems and rack-type servers for large systems. Different operating systems are supported.
Processing algorithms
A large variety of processing algorithms / software can be embedded into SIPAC.
Often it is recommended or even necessary to train the parameters of the respective software on
data of the application in order to obtain optimal performance. For audio signals for example, the
channel characteristics or language models may be trained separately.
Software can be provided by
MEDAV: offers tool sets for classification of audio, text and image processing
(others on request),
The customer himself: e.g. task specific software developed by the user,
Third party: software with a special focus, e.g. for automatic translation
To start the automated processing, files that have been sent to SIPAC are placed in a file directory
(this is a common way of production). From the file directory the files are subsequently (and/or
parallel) processed according to the user requirements. The result of processing is added to the
meta-information of the respective file.
Interactive processing provides more breakpoints for the operator to deal with inputs, procedures
and results all the way long from the original data in question to the completely processed result. In
most cases, procedures are developed, tested and evaluated in an interactive way before they are
finished to an accomplished production flow.
Tool sets
Tool sets are used as extension modules of the SIPAC system. Depending on the requirements of
the users, different tool sets can be added. Experts can create own tool sets using the User Development Kit (UDK).
Available tool sets are described in the following:
Tool set - Common
This tool set is included in all SIPAC systems. It provides the following functions:
The text tool set is used for the classification of files containing text (it partly includes third-party
products)
Language identification
Entity recognition
Topic spotting
Word spotting
Term translation and
Full text translation
Optical Character Recognition (OCR)
Some of the algorithms are language specific and, therefore, training data must be collected
beforehand.
The speech tool set is used for the classification of audio files. Referring to different tasks, the
audio signal is analysed / processed with respect to the following criteria:
Speech detection
Gender identification
Language identification
Speaker identification
Topic spotting
Word spotting
Speech-to-phoneme
Speech-to-text
Some of the algorithms are language specific and, therefore, training data must be collected
beforehand to improve the classifiers for optimal performance.
Tool set ARGUS
ARGUS is a special analysis tool set for images. It provides the following functions:
STRIX is a special analysis tool set for captured network data streams. It provides the following
functions:
The configuration is done in the following steps both MEDAV and the users are able to configure
the system.
Task analysis
The task has to be defined exactly with respect to classifiers and data volume.
Example: A system shall be configured for the classification of the prevailing language in audio
files. The signals are obtained from the telephone line.
Expected data volume: 1000 hours of audio signals to be produced per day.
This leads to a need of speech classifiers for the requested languages and a calculation of the
necessary amount of processing hardware.
Choice of available classifiers
If trained classifiers are available for the given task, nothing has to be done. Otherwise, training
data must be collected (or purchased) and the classifiers must be trained.
Example: Trained classifiers are available in telephone quality for all languages but Danish. Thus,
Danish telephone data must be collected. A classifier for Danish must be trained.
Configuration of the flow
The workflow describing the sequence and conditions of the automated processing must be established.
Example: After speech detection, language identification is performed for the parts of the audio
signal containing speech.
Test and evaluation
The flow is activated including the embedded classifiers. Testing is performed in order to ensure
correct function and high quality.
Example: Run SIPAC with a set of test files. The evaluation of the language classification results is
performed.
After these steps, the SIPAC system is ready for use and can be started in the user scenario.
For optimal performance, it is recommended to perform the evaluation on a yearly basis in order to
maintain the high accuracy with respect to changes of the incoming data (e.g. different telephone
codecs).
Technical data
General
Linux
Solaris
Slave:
o
Storage of flows
Search functionality
SIPAC client
Command-line-oriented
Corporate Policy
Sensors
Technology
in the products, development and in the company management
is state-of-the-art and represents a top level.
Quality
... in all divisions of our company is considered as the indispensable
prerequisite for a risk-free and successful cooperation with our customers
and business partners.
Signals
Classification
Content
form the roots of the company and render the services necessary for
maintaining and expanding the technical basis and a trustful and fair
cooperation.
Growth
Information
Intelligence
Compliance
... with excessive sensibility and compliance with German and international
export regulations we act on a worldwide basis.
MEDAV GmbH Grfenberger Str. 32-34, D-91080 Uttenreuth Homburger Platz 3, D-98693 Ilmenau
Phone +49 9131 583-0 Fax: +49 9131 583-11 E-Mail: info@medav.de www.medav.de
w713od.065