Concept Bi Intel
Concept Bi Intel
ABSTRACT
Business intelligence (BI) is a promising way which let companies to gather large amounts of data, accessing,
analyzing that data, and presenting a high-level set of reports that condense the huge amount of that data so
that they can make fundamental business decisions regarding business actors (customer, supplier, logistics…).
To learn from the past and present time and anticipate the new market trends, many companies are adopting
Business Intelligence (BI) tools and systems. There are numerous business intelligence tools to help manag-
ers in making the best decisions. The paper give some explanation of the concepts and the components of BI,
benefits of BI, tools for BI, the implementation of business intelligence, and a proposal to combine BI and
semantic Web. Additionally, the paper describes the popularity and the increase of data and information use
from open sources (OS) and its impacts in competitive and marketing intelligence.
Keywords: Business Intelligence Components, Business Intelligence (BI), Marketing Intelligence, Open
Source Tools for Business Intelligence, Semantic Web
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
16 International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013
The combination of several existing system scribed. Third, tools for BI are described. Fourth,
capabilities characterize the requirement of a survey of open source BI tools is described.
unique BI system with unique characteristics. Fifth, how to choose BI tool is explained. Sixth,
The development of design theory for business business intelligence and semantic Web inte-
intelligence systems from a conceptual model gration is proposed. Finally, the summary and
with several interrelated component is useful. the perspective for future works are discussed.
The research in BI has primarily concen-
trated on either developing analytical tools for
BI (Clarabridge, 2006; de Ville, 2006; Watson, BUSINESS INTELLIGENCE
Wixom, Hoffer, Anderson-Lehman, & Reynolds AND ITS BENEFITS
2006) or on its application in a specific business
The term intelligence includes important mean-
area (Fordham, Riordan, & Riordan, 2002) such
ings in business environments. The existence
as marketing.
of good source of business intelligence signify
The increasing standards, automation,
the survival of businesses, which can range
and technologies in modern businesses, have
from data about their existing customers to
as a result the availability of large amounts of
intelligence about their competitors (Maguire
data. The storage of this data are managed by
& Robson, 2005).
data warehouse technologies organized under
However, sometimes information is col-
specified repositories. However, the examina-
lected to build up a background understanding
tion of the large amounts of data, pertinent
of the environment without any clear purpose
information extraction, and the transformation
in mind (Curtis & Cobham, 2005).
of the information into knowledge which lead
The intelligence requires the ability to learn,
to best decision making is the art of Business
to understand, or to deal with new or trying
intelligence.
situations; the skilled use of reason; the ability
Business intelligence applications have
to apply knowledge to manipulate one’s envi-
become the top spending priority of corporate
ronment or to think abstractly (Brackett, 1999).
information technology organizations (Gartner,
The ability to create information more than
2009).
to localize or mine it from a huge amount of
Rather than concentrates only on automa-
data is more related to intelligence. According
tions, organizations must have a strong focus
(Turban et al., 2004), intelligence is creative
on decisions and their relations to informa-
and human reasoning which enables recognition
tion. Businesses require to deal with decisions
of relationships between things, the ability to
making and execution, how can improve those
sense qualities and spot patterns that explain
decisions, and how to use information to sup-
how various items interrelate .
port them.
The resulting output from the BI process
A design theory of BI consisting of a
(process input of information) is an actionable
conceptual architecture with a specific design
knowledge, which includes intelligence related
specification has been developed by (Baldwin
to the business.
& Yadav, 1995; Hevner, March, Park, & Ram,
Organizations lack clarity on who should
2004; Gregor & Jones, 2007).
make decisions (Rogers & Blenko, 2006). Deci-
Therefore, in this paper we proposed to
sion process requires to hardly know the scope
explores the concepts of BI, its components, a
of the problem and the potential benefits, and
newly important open source tools for BI and
its responsibility to follow the results of their
a comparison between them.
key decisions. The variety of information inputs
The rest of the article is structured as fol-
available to provide the intelligence needed in
lows. First, an introduction to BI and its benefits
decision making is showed in Figure 1.
is described. Second, the BI components is de-
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013 17
Because business intelligence focuses on common in industry, but is still not familiar
environment it is similar to military intelligence compared to other types of software. We give
(Cavalcanti, 2005). Military intelligence is a in the third section a presentation of some es-
process of gathering and analyzing data that sential open tools till the year 2012.
allows understanding of the weaknesses of the The importance of business intelligence
enemy and being able to take advantage of those tools for enterprises has concretized. Some
weaknesses when planning an attack According benefits are listed below:
to (ESRI, 2005). Hence, The best knowledge of
the enemy (competitor) is the more successful • BI tools enables employees to easily
way to a military campaign. convert their business knowledge based
BI term is a concept defined and described on analytical intelligence to solve some
differently by various academics. business problems: As example of prob-
Zeng et al. (2006) defines BI as “The pro- lems we state: the increase of response rates
cess of collection, treatment and diffusion of from direct mail, telephone, e-mail, and
information that has an objective, the reduction marketing campaigns based on Internet;
of uncertainty in the making of all strategic • BI tools helps align the organization
decisions.” towards its key objectives: Because,
Furthermore, (Cui et al., 2007) view that ultimately organizations are built based
the BI process can be simplified when we use on the outcomes they deliver and not on
a common interface to access the data (from their processes, the processes only enable
different sources and in different format) with the outcomes;
ontology representation. • Enables faster and fact-based decision
Several software are included in BI: ETL making: The data in an organization should
(Extraction, Transformation and Loading), data be organized and allow a user who queries
warehousing, database query and reporting the data to explore further details. The way
(Berson et al., 2002). The open source Busi- we perceive data influences and changes
ness Intelligence (BI) tools is becoming more
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
18 International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013
our decisions; therefore it is essential that opportunities, can give to organizations the
data be available in an accurate as well as possibility to distinguish themselves. A data
easily accessible manner; warehousing system means the backend, or
• Efficient collection and distribution of the infrastructural, component for achiev-
vital data and statistics: BI refers to a ing business intelligence. Online analytical
system that enables the collection of dif- processing (OLAP) refers to the analysis
ferent data and metrics for the purpose and evaluation of specially prepared data
of efficient decision making. This func- as a basis for corporate decisions;
tionality helps organizations to enhance • Master Data Management (MDM): The
their competitive frame and consequently process of creating and managing data that
establish themselves in a highly competi- an organization must have as a single master
tive market, where the failure or the suc- copy is called Master Data Management
cess of the market is affected by itself or (MDM). A clearly defined master data,
of competition. is crucial for enterprises to avoid the risk
of having multiple copies of data that are
We give in the following paragraph the inconsistent with one another. BI is one
business intelligence components. among the linkages of MDM. Moreover,
MDM is the basis of data management
capability which serves business applica-
BUSINESS INTELLIGENCE tions and processes. Among many domains
COMPONENTS of MDM we can state: Customer Data
Integration (CDI), Product Information
A various BI components will be detailed in the Management (PIM) and Vendor Informa-
following paragraphs are showed in Figure 2: tion Management (VIM). It is not special
to see MDM and data warehousing fall
• Data Warehousing (DW): The most ac- into the same project because they have
cepted definition came from Bill Inmon, different goals, different reporting needs
who provided the following: “A data and different types of data;
warehouse is a subject-oriented, integrated, • Business Process Management (BPM):
time-variant and non-volatile collection of A Business Process can be defined as an
data in support of management’s decision activity or set of activities accomplishing
making process.”. The consolidation of data a specific organizational goal. Specifically,
from several enterprise operational systems business process management (BPM) is
into shared data warehouse is the main key an approach which build a more effective,
to the successful of BI system. The capabil- more efficient and more capable organiza-
ity of enterprises to leverage information tion’s workflow. Moreover, BPM have the
about their market place, customers, and goal to reduce human errors on the require-
operations to capitalize on the business ments of their roles. Within a company, we
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013 19
consider frequently that, BPM refers to the dardizing it, and de-duplicate records, as
point of connection between the line-of- well as doing some of the data enrichment.
business (LOB) and the IT department. The The provided services by a company can
simplicity of communication between IT distinguish its importance, but the provided
and the LOB is simplified by the creation services are highly reliant upon good data.
of Business Process Execution Language Because the data is used for both internal
(BPEL) and Business Process Management and external decision-making, data should
Notation (BPMN). Business people can be very tactical. The real problem could lie
quickly learn to use theses languages and in the way the business operates at a tactical
design processes because they are easy level given that most business intelligence/
to read and learn. Processes designed in performance management solutions gather
either language are easy for developers to data from internal source transaction sys-
translate into hard code based on the fact tems. Organizations need to apply constant
that both BPEL and BPMN adhere to the data cleaning processes, and to monitor
basic rules of programming; and follow data quality levels over time to
• Reports: Reporting is the solution to suc- ensure regular high quality data;
cessfully using data to improve competi- • Cube Intelligence: The set of data,
tiveness in any BI process. The informa- organized in a way that facilitates non-
tion related to any advanced data mining predetermined queries for aggregated infor-
technology don’t have any importance if the mation is the definition of an OLAP cube.
users can’t easily manipulate it or quickly The business intelligence concept uses the
find answers to their questions. The process word “cube” because it best describes the
of reporting is easy only if the predefined resulting data. The facts in each level of
reports satisfy the needs of most enterprise each dimension in a given OLAP schema
business intelligence products. Moreover, is aggregated by A cube.
business intelligence reporting is proposed
to improve business and financial analytics;
• Dashboards as element of Corporate BUSINESS INTELLIGENCE
Performance Management (CPM): The TOOLS
tool for data visualization that displays
the existing standing of metrics and key According IDC (International Data Corpora-
performance indicators (KPIs) represents tion) study, In 2010, the BI tools market grew
the business intelligence dashboard of an 11.4% to reach $8.9 billion in worldwide license
enterprise. A business intelligence dash- and maintenance revenue (including software-
board, like the dashboard of a car, indicates as-a-service [SaaS] subscription contracts). The
the status at a specific point in time. The BI BI tools market grew 11.4% in 2010 compared
dashboard product have essential features with the 2.0% revised growth for 2009. In
which include a customizable interface and the report, “Worldwide Business Intelligence
the ability to draw concurrent data from Tools 2010 Vendor Shares,” defines BI tools
multiple sources. Among the vendors of market as a combination of two components:
business intelligence dashboards we state end-user query, reporting, and analysis (QRA)
Oracle and Microsoft. Excel is an example and advanced analytics:
of business applications which can create
BI dashboards; • QRA software: includes applications
• Data Quality: The truly unique benefit that are used primarily for data access
that a company has is data quality. Techni- and report-building by either technology
cally, data quality represents the process or business users. Advanced analytics are
of obtaining the data and cleaning it, stan- tools used to accomplish tasks that are
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
20 International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013
either “hidden, not apparent, or too com- • Business Analytics Software: Business
plex” to be obtained by QRA software. analytics helps companies to recognize
Multidimensional analysis tools include subtle trends and patterns that allow an-
both online analytical processing (OLAP) ticipations and shape events and improve
servers and client-side analysis tools that outcomes. Not only allow to drive more
provide a data management environment top-line growth and control costs, but also
used for modeling business problems and the identification of risks that could derail
analyzing business data. Figure 4 draws the enterprise plans – that facilitates timely
business intelligence software taxonomy corrective action. Analytic applications of
as the view point of IDC; IBM is an example of business analytics
• Advanced analytics: Data mining and software.
statistical software (previously called
technical data analysis) are the compo-
nents of Advanced analytics software. OPEN SOURCE TOOLS FOR
The discovery of relationships in data and BUSINESS INTELLIGENCE
predictions making (hidden, not apparent,
or too complex) extracted using query, BI tools is popular and extremely used in indus-
reporting, and multidimensional analysis try. However, the use of open source tools for
software is based on technologies such BI is still quite limited compared to other type
as neural networks, rule induction, and of software. The dominant tools are closed to
clustering. Gartner’s report in 2010 and commercial and original software. The better
2011 top technologies list highlights the understanding of the limited use of open source
benefits of using analytics and simulation to BI tools, requires to consider the availability and
analyze, optimize, and improve your busi- the capability of each tool. To assist managers
ness processes as showed in the Figure 3; in making the best decisions, there are numer-
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013 21
ous business intelligence tools which requires tember 2011 and is released under GNU
to be evaluated and compared. Some previous LGPL Licence;
surveys have been presented in Thomsen and • Talend Open Studio/JasperETL: Talend
Pedersen (2005) and Christian et al. (2009). Open Studio (Talend, 2008) is developed
In this section we present some additional by Talend and is released under GNU
tools till the year 2012 including ETL (Extrac- GPL Licence. Talend’s unified integra-
tion, transform, load) category, the DBMS cat- tion platform addresses projects such as
egory, the OLAP (Online Analytical Processing) data integration, ETL, data quality, master
Server category and the OLAP client category. data management and application integra-
tion. Talend is characterized by its proven
ETL Tools for BI performance, ease of use, extensibility
and robustness. The solutions provided
In previous work (Christian et al., 2009), the by Talent’s are the most widely used and
category of ETL tools only had several possi- deployed integration solutions in the world.
bilities but incomplete. We give in this section JasperETL is a renamed version of Talend
some additional tools. Table 1 give a comparison Open Studio. The latest version of Jasper-
between some tools described below: ETL is 4.5, from January 2012;
• KETL: KETL (Kinetic Networks, 2008) –
• Pentaho Data Integration (pdi, kettle): not to be confuse for Kettle described below.
Pentaho (2008) is a BI provider that offers KETL is licensed under is partly released
ETL tools as a capability of data integration. under the GPL license and partly under the
These ETL capabilities are based on the LGPL license that represents a production
Kettle project. Pentaho is known by selling ready ETL platform. The engine is built
subscriptions such as support services or upon an open, multi-threaded, XML-based
management tools. Kettle has a graphical architecture. KETL’s is designed to assist
designer for jobs and transformations. The in the development and deployment of data
latest version of Kettle is 4.2, from Sep-
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
22 International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013
Operating Latest
Tools Licence Platform DM DW Dashboard Category
System version
GNU, Windows, Linux,
Kettle (pdi) Java 4.2 √ √ √ ROLAP
LGPL Mac OS X
GNU, Windows,
JasperETL Java 4.5 x √ √ ROLAP
GPL Linux
GNU, Linux, windows,
KETL Java 2.1.33 x √ √ ROLAP
GPL unix, Mac OS
APATAR GPLv2 Linux, Windows Java 1.12.23 √ √ √ ROLAP
Windows, Linux, Files and
Pequel ETL GPL Java 3.05 x √ x
Mac OS X ROLAP
Clover ETL LGPL Mac OS X Java 3.3.0M2 x √ x ROLAP
Enhydra Linux, windows,
LGPL Java 3.2.2 X √ x ROLAP
Octopus unix, Mac OS
Benetl LGPL Windows, Linux Java 4.0 √ √ x Files
Scriptella Apache Windows, Linux Java 1.1 √ x ROLAP
Flat files,
GNU MacOS X, Win-
GeoKettle Java v3.2.0 x √ x sensors,
LGPL dows, Linux/ Unix
DBMS
C++, C, Flat files,
cplusql GPLv4 Windows 0.9.0 √ x x
IDL DBMS
integration efforts which require ETL and targeted at processing files to generate other
scheduling. The latest version of KETL is files. The latest version of Pequel is 3.05;
2.1.33, from May 2009; from June 2007;
• Apatar: Apatar (2007) is a data integra- • Clover ETL: CloverETL (Clover, 2008) is
tion and ETL tool developed by the Apatar developed by OpenSys and Javlin. Clover
company. The first version of the tool was ETL has been a powerful data integration
released under the GPLv2 license at www. software core that enables quicker project
sourceforge.net in February 2007. Apatar implementation and faster performance.
is an open source data integration and With a low cost buy-in and a scalable
ETL tool written in Java, with powerful architecture, it represents the best value in
Extract, Transform and Load capabilities the market today in the growing Business
on-premise data sources with the Web Intelligence arena. Unlike the previously
without coding. The latest version of Apatar described ETL tool, Clover.ETL is licensed
is v 1.12.23, from March 2011; under LGPL. A closed source GUI exists,
• Pequel ETL: Pequel (Gaffiero, 2007) is but is only free of charge if not used com-
licenced by GPL and implemented in Perl mercially and it uses its own TL language
and runs on UNIX-like platforms and on for data transformations. The latest version
Windows using Cygwin. Pequel generates of Clover is 3.3.0M2; from April 2012;
Perl and C code for the load job. It is mainly
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013 23
• Enhydra Octopus: Octopus (Teamlö- generic ETL tool Kettle (Pentaho Data
sungen, 2007) is an ETL tool created by Integration and a part of the geospatial BI
the company Enhydra and licensed under software stack developed initially by the
LGPL. Octopus is a simple Java-based GeoSOA research group at Laval Univer-
Extraction, Transformation, and Loading sity in Quebec, but are now developed
(ETL) tool. It uses JDBC to connect data and supported by Spatialytics. GeoKettle
source and targets and perform transforma- provides a true and consistent integration of
tions defined in an XML file. Many differ- the spatial component. It is Released under
ent types of databases can be mixed (MS LGPL at http://www.geokettle.org with
SQL, Oracle, DB2, QED, JDBC-ODBC the latest version v3.2.0, from September
with Excel and Access, MySQL, CSV, 2011. It allows the definition of custom
and XML). The latest version of Octopus transformation steps by the user (“Modi-
is 3.2.2, from October 2011; fied JavaScript Value” step). GeoKettle
• Benetl: Benetl (Benoît, 2011) is a free ETL provides support for topological predicates
is released under LGPL licence for files (Intersects, crosses, etc.) and aggregation
working with postgreSQL 9.x and MySQL operators (envelope, union, geometry col-
5.x. It helps to easily manage “.csv”, “.txt” lection, ...), SRS definition and transforma-
and “.xls” data source files. Benetl has been tions and cartographic preview. Kettle is
written in Java and its last version is 4.0. natively designed to be deployed in cluster
A Linux version is available for Benetl. and web service environments. It makes
Benetl is creating everything you can use, GeoKettle a perfect software component
in PostgreSQL or MySQL database. In or- to be deployed as a service (SaaS) in cloud
der to retrieve data, you have to define an computing environments as those provided
entity (something you would like to show, by Amazon EC2;
to calculate), a date (it is a time stamp, a • Cplusql ETL: cplusql1 is a distributed ETL
moment an event appears). Furthermore, tool written en C++, C and IDL languages
benetl is a proposition of an automatic way that extracts and transforms row based data
to retrieve data, using scheduled tasks of from databases and flat files for terabyte
your system. The use of benetl facilitate scale data warehouse loading. The latest
the way to retrieve information from flat version of this tool is 0.9.0, from January
data files, filtering them and organizating 2004 and licenced under GNU (GPLv4);
them, according to date (time stamp) and • Web-based free ETL tools: This category
entity. The latest version of Benetl is 4.0, of tools is in most cases ROLAP-oriented
from March 2012; (Relational OLAP). The ROLAP-oriented
• Scriptella: Scriptella (Kupolov, 2008) is tools allow user to define and create data
an open source ETL and script execution transformations in Java based on (Jasper-
tool written in Java execution tool. It is ETL) or in TL (Clover.ETL). A conducted
released under the Apache License. It studies states that ETL tools organize
doesn’t require the user to learn another heterogeneous data sources and complex
complex XML-based language to use it, file formats. Furthermore, they interact
but allows the use of SQL or another script- with differents DBMSs (DataBase Man-
ing language suitable for the data source agement Systems). Data extraction from
to perform required transformations. The ERP (Enterprise Resource Planning) and
latest version of scriptella is 1.1, from CRM (Customer Relationship Manage-
January 2012; ment) systems are also allowed by some
• GeoKettle: GeoKettle (Thierry, 2010) is of the tools.
a free powerful open source ETL tool for
BI. It is a spatially-enabled version of the
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
24 International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013 25
allow managers to keep running processes views) that are very useful for business
under direct control, so as to monitor and intelligence analyses. The latest version of
minimize the gaps (anomalous situations) Ingres is v10.0, from October 2010;
in processes and consequently maximize • Greenplum: Greenplum5 is an analyti-
the company’s profit. The Analytical model cal relational DBMS designed for heavy
over which SpagoBI analysis are based workloads on terabytes of data. Its core is
supplies features to produce high-quality derived from PosgreSQL, which assumes
reports and dashboards, to navigate data that data is still stored in rows, however
throw an internal OLAP engine, to build Greenplum has MPP architecture. The lat-
run-time ad-hoc queries. The latest ver- est version of Greenplum is 4.2.1.0, from
sion of SpagoBI is v3.3-01242012., from June 2011. The Greenplum Chorus source
January 2012; code will be released under an open source
• MySQL: MySQL2 includes all features license in the second half of 2012;
of commercial enterprise DBMS, includ- • Infobright: Infobright6 is an open source,
ing partitions, triggers, stored procedures freely available product under a GPL2 li-
and views which can be updated. MySQL cense. It is truly columnar DBMS, however
also supports multiple storage engines not so popular as Sybase IQ or Vertica.
used for transaction processing and rapid The free Community Edition is simplified
query performing. It is available under a version of commercial Enterprise edition
commercial license or under the GPL. The and is open-sourced. Infobright lacks
latest version of MySQL is 5.5.24, from some features that can be important for
May 2012; large-scale deployment: DML, Temporary
• PostgreSQL: PostGreSQL3 is a full fea- Tables, Parallel Query Execution and some
tured DBMS released under a variant of the other. The latest version of Infobright is
BSD Licence that includes many constructs 4.0.6 from March 2012;
that are common on proprietary commer- • LucidDB: LucidDB7 is also columnar
cial applications (sequences, tablespaces, DBMS which is open-source from origin.
temporary tables, functions, inheritance, It is licenced under GPL and LGPL, de-
triggers and views). PostgreSQL runs veloped by the LucidEra Company and
on Windows and on a large collection of the non-profit Eigenbase Project organiz-
UNIX-like operating systems. The latest tion. I’ve not heard about any large-scale
version of PostGreSQL is v 9.1.3, from deployments of LucidDB, however some
February 2012; benchmarks show impressive improvement
• Ingres Database: Ingres4 developed by In- over MySQL. While LucidDB offers many
gres is available under a commercial license features relevant for data warehousing, it
under the GPL. Ingres contains features should be noted that it is still not a mature
demanded by the enterprise while provid- DBMS (Christian et al., 2009) . The lat-
ing the flexibility of Open Source that are est version of LucidDB is 0.9.4, from
focused on reliability, security, scalability, November 2011;
and ease of use. Core Ingres technology • MonetDB: MonetDB8 is another open-
forms the foundation of Ingres Database source columnar DBMS which was de-
and numerous other industry-leading signed in the Netherlands. The features
DBMS systems as well. All core features of MonetDB include enhanced support for
needed for BI are includes in each DBMS XML and multimedia objects, and support
include. It offer some performance boost for modern CPU architecture. Since 2011
features (bitmap indexes and materialized column store technology as pioneered in
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
26 International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013
MonetDB has found its way into the product language is assumed to be SQL, with stan-
offerings of all major commercial database dard SQL semantics (Mike et al., 2005).
vendors. The latest version of MonetDB The latest version of C-Store is 0.2; from
is 11.9.1, from April 2012 and released October 2006;
under MPL-like licence. The innovation of • SQuirrel SQL Client: SQuirrel10 is a Java
MonetDB exists at all layers of a DBMS, program with graphical capabilities that al-
e.g., a modern CPU-tuned query execution low the structure visibility of a JDBC com-
architecture, a model based on vertical pliant database, data browsing in tables,
fragmentation for data storage, automatic the manipulation of SQL commands etc,.
and self-tuning indexes, run-time query SQuirrel is available under a commercial
optimization, and a modular software ar- license under the GPL and LGPL and The
chitecture (Kersten, 2011). A peek preview lat version of SQuirrel is 3.3.0 (from May
is given in the award winning VLDB 2011 2012). SQuirreL’s functionality can be ex-
paper titled in (Kersten, 2011); tended through the use of some developed
• Firebird: Firebird9 is a relational database plugins (DBCopy, DBDiff, DB2, Graph,
offering many ANSI SQL standard fea- Graph, SQL Validator…). Among the many
tures that runs on Linux, Windows, and different functionalities: Syntax highlight-
a variety of Unix platforms. Firebird is ing for database object source code, SQL
released under MPL-like license and offers Scripts Plugin that allows storing results
excellent concurrency, high performance, of queries directly to file, an improved
and powerful language support for stored memory management for large result sets,
procedures and triggers. It has been used possibility to display errors in a temporary
in production systems, under a variety of result tab for better visibility…etc.
names, since 1981. A multi-platform rela-
tional database management system based OLAP Servers
on the source code released by Inprise Corp
(now known as Borland Software Corp) In the work of (Christian et al., 2009) only two
have been achieved on 25 July, 2000. The open source OLAP servers were found (Mon-
current version of Firebird is 2.5.1 (from drian and Palo). In this paper, we give more
May 2012); additional OLAP servers. Table 3 compares
• C-Store: C-Store (Mike et al., 2005) is between the OLAP servers described below:
a read-optimized relational DBMS that
physically stores a collection of columns, • Saiku: Saiku11 is an important project
each sorted on some attribute(s). C- developed by the Pentaho community as it
Store is made available as Open Source may grow to take over ‘jpivot’the old offcial
software under the BSD license and it is multidimensional browser. It was known as
distinguished with most current systems, the Pentaho Analys Tool (PAT) and repre-
which are write-optimized. Among the sents a Modular open-source analysis suite.
many differences in its design are: the This tool provides a lightweight OLAP
data is stored by column rather than by solution which remains easily embeddable,
row, objects packing into main memory extendable and configurable.
storage during query processing is care-
fully coded, an overlapping collection of Technically, Saiku is a RESTful server
column-oriented projections can be stored, based on a front-end JQuery, which enables
rather than the current fare of tables and connections to XMLA providers: Mondrian,
indexes, transactions implementations with but also SAP BW, Microsoft Analysis Services,
high availability and snapshot isolation for Hyperion Essbase etc. Saiku offers all features
read-only transactions. The C-Store query
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013 27
found in JPivot. It is functionally the most 2011. The most rewarding aspect of the
comprehensive new Web OLAP requester of in-memory technology included in Palo is
the paid version of JasperServer. The Saiku the speed advantage. The Palo OlAP server
server is shipped under the GPL v2 license works in memory and provides real time
and its latest version of Saiku is 2.3 from 2012: calculation, as well as cross data source
calculations. Compared to an old-style
• Palo OLAP server: Palo12 is an OLAP ROLAP system in-memory technologies
server that offers increased stability and have the potential to be as much as 100
performance to the Palo suite Business In- times faster than an OLAP system with a
telligence platform. PALO is a cube server disk based relational database. Multidimen-
optimized for a spreadsheet. It therefore sionality is an alternative way of organizing
does not have a visualization tool. It is a data in a database of Palo;
multi-user, high-performance data server • Mondrian/Pentaho Analysis Services:
application that allows workers throughout Mondrian13 is an Online Analytical Process-
an enterprise to access, change and col- ing (OLAP) written in Java. It is ROLAP
laborate on BI data instantaneously. The (Relational OLAP) server-based approach.
latest version of Palo is 3.2 from August Tools used Mondrian benefit from the
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
28 International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013 29
Functionalities
OLAP Client OLAP
TOOLS Licence Platform Operating System Servers/ Reports DW DM
API
Linux, Mac OS X ·
Windows, · Solaris, Tables and
FreeAnalysis MPL Web XMLA √ √
FreeBSD, NetBSD, charts
OpenBSD
Java and Linux, MAC OS, · Palo and Tables and
JPalo client GPL √ √
Web Windows, unix XMLA charts
Java and Linux, MAC OS, · Palo and
JPalo Web client GPL Tables √ √
Web Windows, unix XMLA
tables.
Java and Linux, MAC OS, ·
pocOLAP LGPL XMLA charts and √ √
Web Windows
map
Java and Windows, Linux, Mac XMLA, Tables and
JPivot CPL √ √
Web OS, BSD, Solaris olap4j charts
XMLA,
GNU Java and Windows, Linux, Mac Tables and
STPivot olap4j, √ √
GPL v3 Web OS, BSD, Solaris charts
Pentaho
Mondrian Tables,
Windows, Linux, Mac
Jrubik CPL Java and olap4j charts, and √ x
OS
support maps
Windows, Linux; Mac Tables and
REX LGPL Java XMLA √ x
Os, Unix, BSD charts, map
DBMSs, Tables and
Windows, Linux; Mac
JMagallanes BSD Java XML, and charts and √ √
Os, Unix
Excel maps
Java, Excel, Table, List,
BIOLAP BSD Web, Linux, Windows XML, Map, Grid, √ x
C++ Mondrian Chart
GNU Tables,
OpenI Java Windows XMLA √ √
GPL v2 charts,
available in web site of BPM Conseil (v4.0, and Planning. With the API, programmers
from October 2011). A manual of 17 pages can easily model their specific needs. The
is available in English19 and gives further Palo Web Client 3.0 allows access of dif-
details on this tools; ferent users with various access profiles;
• JPalo (Client and Web Client): The • pocOLAP: PocOLAP21 is a lightweight,
JPalo20 Client, API, and WebClient are open source OLAP datamining solution.
tools to visualize and model data of a Palo PocOLAP is released under LGPL license
or XMLA database. The latest version of and provides a web-based, crosstab OLAP
Palo Pivot Client is 3.2 (released from April tool that looks like a standard spreadsheet.
2010). The latest version of Palo Web Client While it’s not an OLAP server or fully
is 3.0 (from December 2009). Their main fledged data mining solution, pocOLAP
purpose is Business Intelligence Reporting makes your data easy to use and under-
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
30 International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013
stand. Users can determine which cells to and MDX Builder Tool. REX is released
display across the top and side of the data under the LGPL and its latest version of
warehouse. The latest version is 0.4.1 from REX is 0.81, from May 2009;
October 2010; • JMagallanes: JMagallanes is an end user
• JPivot: JPivot is among the first open application for OLAP reports written in
source OLAP clients developed by Ton- Java/J2EE. It combines static reports, a
beller. JPivot is licensed under the CPL. Swing pivot table and charts in a single
It is a Web client implemented in Java and UI. It reads from data sources such as SQL,
JavaServer Pages (JSP) that allows a user to Excel and XML. It can produce PDF, XML,
create his own analysis (crosstabs, graphs, and other application specific files for later
tabular listings) independently, the only off-line viewing of reports. JMagallanes
requirement being to have a star schema in Olap is distributed under the BSD license
a data warehouse and the Mondrian open with the latest version 1.0 from May 2006;
source OLAP server. The latest version of • BIOLAP: BIOLAP26 is an open source
JPivot is version 1.8 from March 2008; OLAP for biology data released under
• STPivot: STPivot22 is an open source web BSD license that enables to perform Da-
based OLAP viewer, developed by Todo tabase Engines/Servers Enterprise OLAP
BI–Stratebi, developed to give special at- tasks. In traditional OLAP, summaries are
tention to user interface and interactions sets of numbers. In biology, data is often
(foreground). It is a small improvement non-numeric and the summary may not be
to jpivot with the first goal to construct a number. To handle biology data sets there
fresher web based OLAP client solution, are two main requirements: aggregation
specially focused on ease of use. The im- functions that can summarize non-numeric
provement of JPivot user’s experience, is data and aggregation functions able to re-
based on the advantage of free user interface turn more than just numbers. The solution
libraries and technologies (such as jQuery is based on Mondrian. The latest version
and Ajax). The latet Version of STPivot is was released on July 2009;
3.6, from June 2010 and licensed under • OpenI: OpenI27 is a simple web application
GNU GPL v3; developed by the company Loyalty Matrix
• Jrubik: Jrubik23 is an OLAP rich client that does out-of-box OLAP reporting. It
which developed in Java/Swing interface deploys on any J2EE server and supports
for the manipulation of data represented the publishing of interactive OLAP reports.
as a cube ROLAP and based on JPivot OpenI can use relational data sources and
and Mondrian projects. It is built on top of XMLA based sources. For data mining da-
JPivot’s model (for building MDX queries, tasets, OpenI integrates with the R project.
formatting the results as a pivot table…). The existing source code is available under
The client connects to Mondrian Olap GNU GPL license. The latest version is
datasources and allows to generate SVG 3.0.1, from April 2012;
maps that include the Olap query results • Web-based free OLAP tools: Several
as graphical metadata. Jrubik is licenced OLAP clients uses Free web-based OLAP
under the CPL and its latest version is 0.9.7 servers. According the best functionality,
from January 200924; the most extended and widely used server
• REX: Rex25 (waRehouse EXplorer) is is Mondrian Pentaho. The most studied
a java client that provides easy-to-use OLAP clients are Java applications. Most
GUI for browsing multidimensional data of them can be used with XMLA(XML
sources that support XMLA protocol (Mon- for Analysis)-enabled sources. However,
drian, Microsoft Analysis Services 2000 they have not properly documented. In
and 2005). Rex also includes MDX editor general, sufficient functionality of web-
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013 31
based studied tools is guaranteed, but they Another important issue is the discovery
remain cumbersome due to traditional of average implementation time (days/weeks
OLAP usage. or months). The time of required training, the
impact on the existing infrastructure (Platform)
and the access strategies (desktop, web (browser
HOW TO CHOOSE BI TOOL or support), mobile) are very important to know
at the beginning of evaluation process.
Business Intelligence includes the use of data
to make insight and gain a feasible advantage.
Its main objective is to enable business users BI AND SEMANTIC WEB
to analyze and find what makes their business
activity. We can state a number of concrete Current research on BI and the challenging
profits such as reduced cost, increased rev- Semantic Web approach can be combined.
enue, improved customer satisfaction as well The relation between analytical tools
as a number of not achieving profits such as (OLAP, DM, visualization…) and the ap-
improved communication and improved job proaches of business knowledge and mecha-
satisfaction. Some of the business drivers for nisms, can leads to different stakeholders. Being
BI implementation are: given that Semantic Web provides agile ways
based on high semantic articulacy to locate
relevant content on the Internet, BI architectures
• Decisions making based on facts;
must also make use of semantic to support the
• Be aware of operational and strategic
analytical processing. However, BI solutions
planning;
require to use effective methods for content
• Examine plans execution;
exploration such as those already used by the
• Discover new marketing opportunities;
billions of current Web users, yet without losing
• Planning for product innovation;
the potential conjectured by the Semantic Web
• Standards fulfillment and auditing.
(SmallTree, 2006).
BI research still requires the use of natural
The first step to choose the right BI tools language to conduct analysis, although appli-
is to prepare a list including a glance of your cation of semantic technologies and methods
requirements outside the technical aspects. The of knowledge representation. The approach of
second step is to launch your BI tool evaluation information speculated on the Semantic Web
process based on the specified list of require- is becoming a trend in the area of BI. The need
ment. The Evaluation tools can be based on of knowledge use and extraction to support the
your own data. It is recommended to avoid decision making motivates the convergence of
decision making based on RFI (Request for the new Business Intelligence (BI) solutions
Information) responses. Prior to the evaluation with the Knowledge Engineering tools.
process, most of the BI tools in today’s market The word “ontology” seems to generate a
look exactly the same. The judgment of the lot of controversy in discussions about AI. An
power and suitability of the BI tool for your ontology is a description (or a specification
organization is based on the fact to connect the of conceptualization: formal specification of
tool on your own data and start discovering your a program) of the concepts and relationships
own business. A process of the tool testing will that can exist for an agent or a community of
offer the ability to discover the degree of the agents. business semantics can be captured using
complexity of the tool and the ability of your ontologies. Moreover, Ontologies can be used
business users to make their own discoveries to define the required knowledge models for
and create dashboards/reports without techni- generating flexible and exploratory function-
cal assistance? alities in analytical tools.
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
32 International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013 33
Hevner, A. R., March, S. T., Park, J., & Ram, S. Talend (2008). Talend – first provider of open source
(2004). Design science in information systems re- data integration software. Retrieved August 7, 2008
search. Management Information Systems Quarterly, from http://talend.com/
28(1), 75–105.
Teamlösungen. (2007). JDBC data transforma-
Kersten, M. L., Idreos, S., Manegold, S., & Liarou, tions. Retrieved from http://www. enhydra.org/
E. (2011). The researcher’s guide to the data deluge: tech/octopus/
Querying a scientific database in just a few seconds
2011. In Proceedings of International Conference on Thierry, B. (2010). GeoKettle: A powerful open
Very Large Data Bases 2011 (VLDB) (pp.585–597). source spatial ETL tool. FOSS4G 2010, Spatialytics
inc. Quebec, Canada.
Kinetic Networks. (2008). KETL.org – Designed to
support the community that uses the KETL™ open Thomsen, C., & Pedersen, T. B. (2005). A survey of
source ETL product. Available at http://ketl.org/. open source tools for business intelligence. In Tjoa,
A. M. & Trujillo, J. (Eds.), Proceedings of the 7th
Maguire, S., & Robson, I. (2005, April). Intelligence International Conference on Data Warehousing and
management: The role of environmental scanning. In Knowledge Discovery (pp. 74-84). Berlin Heidelberg,
Proceedings of the UKAIS Conference, University Germany: Springer.
of Northumbria.
Turban, E., Lee, J., & Viehland, D. (2004). Electronic
Mike, S., Daniel, A., Adam, B., Xuedong, C., Mitch, commerce; A managerial perspective (International
C., Miguel, F., … Stan, Z. (2005). C-store: A column ed.). Pearson Prentice Hall.
oriented DBMS. VLDB, 553-564.
Watson, H. J., Wixom, B. H., Hoffer, J. A., Anderson-
Negash, S., & Gray, P. (2004). Business intelligence. Lehman, R., & Reynolds, A. M. (2006). Real-time
Communications of the Association for Information business intelligence: Best practices at continental
Systems, 13, 177–195. airlines. Information Systems Management, 23(1),
12. doi:10.1201/1078.10580530/45769.23.1.2006
O’Brien, J. A., & Marakas, G. M. (2007). Introduc- 1201/91768.2.
tion to information systems (13th ed.). New York,
NY: McGrawHill. Zeng, L., Xu, L., Shi, Z., Wang, M., & Wu, W. (2007,
October 8-11). Techniques, process, and enterprise
Pentaho (2008). Pentaho commercial open source solutions of business intelligence. In Proceedings
business intelligence: Kettle Project. Retrieved from of the 2006 IEEE Conference on Systems, Man, and
http://kettle.pentaho.org/. Cybernetics, 2006, Taipei, Taiwan (Vol. 6, pp. 4722).
Rogers, P., & Blenko, M. (2006). How clear decision
roles enhance organizational performance. Harvard
Business Review, 84(1). ISSN 0017-8012
ENDNOTES
Smalltree, H. (2006, August 17). Business intel-
ligence search: Five myths. In Search business 1
http://cplusql.sourceforge.net/
analytics.com. Retrieved from http://searchbusi- 2
www.mysql.com
nessanalytics.techtar get.com/news/1507286/ 3
www.postgresql.org
Businessintelligence-search-Five-myths 4
www.ingres.com
Solutions, E. S. R. I. (2005). GIS for retail and
5
http://community.greenplum.com/
commercial business. Retrieved from hppt://www.
6
http://www.infobright.org/
esri.com/industries/business/business/business_in-
7
http://www.luciddb.org/
telligence.html
8
http://www.monetdb.org/Home
9
http://www.firebirdsql.org/
Spago. (2009) Engineering ingegneria informatica. 10
http://www.squirrel.com/ or http://squirrel-
SpagoBI - Spago Solutions. Retrieved from http:// sql.sourceforge.net/
www.spagoworld.org 11
http://analytical-labs.com/
12
http://www.palo.net/
Steele, R. D. (2002). The new craft of intelligence: 13
http://mondrian.pentaho.com/
Personal, public & political. Oakton, VA: OSS 14
http://www.jaspersoft.com/jaspersoft-olap
International Press. 15
Bee project, bee.insightstrategy.cz/en/
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
34 International Journal of Innovation in the Digital Economy, 4(3), 15-34, July-September 2013
16
http://savannah.nongnu.org/projects/lemur/ 21
http://www.pocolap.org/
17
http://freeanalysis.sourceforge.net/index-fr. 22
http://stpivot.jimdo.com/
html 23
http://rubik.sourceforge.net/jrubik/intro.html
18
http://www.bpm-conseil.com/ 24
http://sourceforge.net/projects/rubik/files/
19
http://forge.bpm-conseil.com/docman/view. 25
http://sourceforge.net/projects/whex/
php/11/45/BPM_Vanilla_FreeAnalysisSche- 26
http://biolap.sourceforge.net/
maDesigner_v4.0_EN.pdf 27
http://openi.org/
20
http://www.jpalo.com/en/
Thabet Slimani got a PhD in Computer Science (2011) from the University of Tunisia. He is
currently an Assistant Professor of Information Technology at the Department of Information
Technology of Taif University at Saudia Arabia and a LARODEC Labo member (University of
Tunisia), where he is involved both in research and teaching activities. His research interests are
mainly related to Semantic Web, Data Mining, Business Intelligence, Knowledge Management
and recently Web services. Thabet has published his research through international conferences
and peer reviewed journals. He also serves as a reviewer for some conferences and journals.
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.