BI Data House: What Is The Role of Data Warehousing in Business Intelligence?

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

BI Data House

BI and DW are used to store a company's data in databases from various sources. The data is then analyzed and
used to generate insights through BI tools.

Data warehousing is the process of transferring and storing data from multiple sources into a single repository. This
repository is known as a data warehouse (DWH)

A DWH is a central platform for consolidating and storing data from different sources. It prepares this data for
downstream BI and analytics.

BI relies on complex queries and comparing multiple sets of data to inform decisions. BI allows for better business
decision-making by ensuring decisions are strategic.

Although BI and DW cannot function without each other, there are some important differences. For example, data
warehouses can be expensive and scarce resources. They can take months and millions of dollars to set up.

What Is the Role of Data Warehousing in Business Intelligence?

In business intelligence, data warehouses serve as the backbone of data storage. Business intelligence relies on
complex queries and comparing multiple sets of data to inform everything from everyday decisions to organization-
wide shifts in focus.

To facilitate this, business intelligence is comprised of three overarching activities: data wrangling, data storage,
and data analysis. Data wrangling is usually facilitated by extract, transform, load (ETL) technologies, which we’ll
explain in detail below, and data analysis is done using business intelligence tools, like Chartio.

The glue holding this process together is data warehouses, which serve as the facilitator of data storage using
OLAP. They integrate, summarize, and transform data, making it easier to analyze.

Even though data warehouses serve as the backbone of data storage, they’re not the only technology involved in
data storage. Many companies go through a data storage hierarchy before reaching the point where they absolutely
need a data warehouse.

OLTP DISADVANTAGE

Here are some disadvantages of Online Transaction Processing (OLTP)

1. Performance Challenges: OLTP systems can face performance issues during peak loads due to
numerous concurrent transactions. This can lead to increased response times, impacting user experience
and overall system efficiency. Factors like locking, contention, and complex query processing can
contribute to performance bottlenecks.
2. Scalability Concerns: As transaction volumes grow, scaling an OLTP system can become complex and
expensive. Adding more hardware or resources might not be a viable solution, especially in cases where
the system architecture doesn't support seamless scaling.
3. Data Integrity Risks: Maintaining data consistency and integrity in an OLTP environment, especially
during concurrent transactions, can be challenging. Issues like ‘dirty reads,’ ‘phantom reads,’ and ‘lost
updates’ might occur if proper concurrency control mechanisms are not implemented effectively.
4. Complexity in Query Optimization: OLTP systems often involve complex queries and numerous indexes
to support transactional operations. Optimizing these queries for efficient execution without affecting
concurrent transactions requires significant expertise and ongoing fine-tuning.
5. High Costs: Implementing and maintaining OLTP systems can be costly. Licensing fees for database
software, hardware requirements, ongoing maintenance, and the need for specialized staff to manage
these systems can significantly contribute to the overall expenses.
6. Limited Reporting Capabilities: OLTP systems are designed for transactional processing, not for
complex analytical or reporting tasks. Generating extensive reports or performing analytics directly on
OLTP databases can hinder their performance and may not provide the desired insights efficiently.
7.

When should I Use a data warehouse for business intelligence?


As we explain in our Cloud Data Management eBook (a super easy — and dare we say fun — read), there are
generally four stages of data sophistication: source data, data lakes, data warehouses, and data marts.
Knowing when to invest in a data warehouse requires you to know each stage, but at the end of the day, the data
warehouse stage is what unlocks the true power of your data.

Source Data

Source data is any individual set of data like databases, Excel spreadsheets, individual application reports, etc. It’s
structured (i.e., organized) yet siloed data that works fine alone but does not provide a larger picture of your
organization’s data as a whole.

Data Lake

For teams who have graduated to a need to centralize their source data into one place, a data lake is increasingly
becoming the next step. A data lake serves as a central repository for all raw, unstructured (i.e., not organized)
data.

If a data warehouse is like backing up a truck and unloading the data in an orderly fashion into a well-organized
shelving system, data lakes are like backing the truck up and dumping all the data into, well, a lake. James
Dixon, who coined the term “data lake,” describes it as the natural raw state of data that, for people with the diving
skills, serves as a frontier to explore.

Data Warehouse

Like a data lake, a data warehouse centralizes your data, but as we’ve established, it’s well-organized and set up
for efficient analysis. It’s a single source of truth for all data that’s easier to understand and navigate.

Data warehouses can hook right up to source data, but nowadays, we’re seeing more and more companies use
their data warehouse as a layer on top of their data lake. Following Dixon’s comparison, if a data lake is the
water/data in its natural, unorganized state, a data warehouse is where you treat it and make it ready for
consumption.

DATA MART

Using a data warehouse for some projects can be like swatting a fly with a sledgehammer. If, for instance, the
marketing team returns time and time again to the warehouse to make similar queries, you can set up a data mart.

Data marts are curated data sets created for specific use cases. Again, bringing up Dixon’s description, the
marketing team doesn’t need to go to the treatment center every time they need water. The data warehouse can
be used to package data/water into ready-to-drink “water bottles.”

In this data storage ecosystem, the data warehouse is still the backbone. It’s structured and relatively easy to
understand (like source data), yet it provides a holistic, centralized view (like a data lake), making it much easier to
use that data however you need (like creating data marts).
Characteristics of Data warehouse

A data warehouse is a repository that stores integrated, historical, and structured data from various sources within
an organization. Some key characteristics include:

1. Subject-Oriented: Organized around specific subjects or areas relevant to the organization (e.g., sales,
finance, inventory) rather than transactional data.
2. Integrated: Combines data from disparate sources across the organization into a single, coherent
repository, often through ETL (Extract, Transform, Load) processes.
3. Time-Variant: Stores historical data, allowing analysis of changes and trends over time to support
decision-making.
4. Non-Volatile: Data in a data warehouse is read-only and remains unchanged once it's stored, ensuring
consistency and providing a stable environment for analysis.
5. Supports Analytical Processing: Optimized for complex queries and analysis rather than transactional
processing, enabling business intelligence, reporting, and data mining.
6. Denormalized Structure: Often designed in a denormalized schema to facilitate faster query
performance for analytical operations.
7. Scalability: Scalable to accommodate increasing data volumes and user demands for analytics and
reporting.
8. Metadata: Contains metadata (data about data) that helps users understand the content, source, and
meaning of the stored data.

Function Data warehouse

A data warehouse is a specialized system used for storing and managing


large volumes of structured and unstructured data from various sources
within an organization. It's designed for query and analysis rather than
transaction processing.

Functions of a data warehouse include:

1. Data Integration: Collecting data from multiple sources and organizing it into a unified format within the
warehouse.
2. Data Storage: Storing structured and sometimes semi-structured data in a way that facilitates efficient
querying and analysis.
3. Data Cleaning and Transformation: Preparing data for analysis by cleaning, transforming, and
standardizing it.
4. Data Management: Managing metadata, ensuring data quality, and maintaining historical data for
analysis purposes.
5. Query and Analysis: Providing tools and interfaces for users to run complex queries, generate reports,
and perform data analysis.
6. Decision Support: Supporting business decision-making by providing a centralized, consistent, and
reliable source of data for analysis.

Advantages Data warehouse


Sure, data warehouses offer several advantages:
1. Centralized Data: They consolidate data from various sources into a single repository, making it easier to
access and analyze information from different departments or systems within an organization.
2. Improved Decision-Making: Data warehouses enable efficient data analysis, allowing businesses to
make better-informed decisions based on comprehensive and reliable information.
3. Data Quality and Consistency: They often include data cleaning and transformation processes that
enhance data quality and ensure consistency across the organization.
4. Faster Query and Reporting: By structuring data for quick access and analysis, data warehouses speed
up querying and reporting, providing faster insights to users.
5. Scalability: They can scale to accommodate large volumes of data, ensuring the system’s performance
remains stable even as data volumes grow.
6. Historical Analysis: Data warehouses store historical data, allowing for trend analysis and long-term
strategic planning based on historical patterns.
7. Support for Business Intelligence (BI) and Analytics: They serve as a foundation for BI tools and
analytics platforms, facilitating complex analysis, forecasting, and data visualization.
8. Security and Governance: Data warehouses often have robust security measures in place to protect
sensitive information, along with governance protocols ensuring data compliance and integrity.
disadvantages of data warehouses:
Some disadvantages of data warehouses include:
1. Costly Implementation: Building and maintaining a data warehouse can be expensive due to hardware, software, and personnel
costs.
2. Complexity: Integrating various data sources and transforming data for storage in a warehouse can be complex and time
consuming.
3. Data Quality Issues: Ensuring data consistency and accuracy can be challenging, leading to potential errors or inconsistencies in
reports and analytics.
4. Scalability Challenges: Scaling a data warehouse to handle large volumes of data might require
significant upgrades, which can be both expensive and time-intensive.
5. Rigidity: Data warehouses are designed for structured data, which can limit their ability to handle
unstructured or semi-structured data efficiently.
6. Potential Data Redundancy: Storing massive amounts of data in a warehouse can lead to redundancy if
not managed properly, consuming extra storage space.
7. Long Development Time: Constructing a data warehouse, especially with custom requirements, can
take a long time from planning to implementation.
8. Difficulty in Real-time Processing: Traditional data warehouses may not excel in real-time or near real-
time processing, which can limit their usefulness in rapidly changing environments.

Data mart characteristics

Data marts are smaller subsets of a data warehouse that focus on specific business functions or departments within
an organization. Some of their key characteristics include:

1. Focused Scope: They contain data focused on a particular subject area, department, or specific set of
users, making them more specialized than data warehouses.
2. Subset of Data Warehouse: Data marts are derived from the larger data warehouse, often containing summarized or aggregated
data tailored to the needs of a particular user group or business function.
3. Optimized for Analysis: They are designed for the specific requirements of analysis, reporting, and decision-making within a
particular domain, enabling faster query performance.
4. User-Oriented: Tailored to the needs of specific user groups or departments, facilitating easier access to relevant data for analysis
and reporting purposes.
5. May Use Different Technologies: Data marts can use various technologies and structures like star schema, snowflake schema, or
other data modeling techniques to organize and store data efficiently.
6. Rapid Development: Due to their smaller size and focused nature, data marts can be developed more
quickly compared to a full-scale data warehouse.
7. May Exist Independently: While data marts often reside within a larger data warehouse architecture,
they can also exist independently, especially in cases where an organization’s data needs are specific to a
particular department or function.

Data mart function

A data mart serves as a specialized subset of a data warehouse, focusing on a specific department, team, or
business function within an organization. Its primary function is to store and manage a particular set of structured
data that is relevant to a specific group or area of the business. Data marts are designed to facilitate easier access,
analysis, and retrieval of data for users who require specific
insights and reports tailored to their needs, thereby supporting decision-making processes within that particular
domain or function.

OLAP vs OLTP

Online analytical processing (OLAP) and online transaction processing (OLTP) are two
different data processing systems designed for different purposes. OLAP is optimized for
complex data analysis and reporting, while OLTP is optimized for transactional processing
and real-time updates.

Data mart advantage


Data marts offer several advantages:
1. Focused Data: They contain a subset of data that's tailored to a specific business area or user group,
allowing for quicker access to relevant information without the complexity of an entire data warehouse.
2. Improved Performance: By storing only relevant data, data marts can enhance query performance and
reporting speed since they are smaller and more focused than comprehensive data warehouses.
3. Business Agility: Data marts are often designed for specific departments or business functions, enabling
faster decision-making and agility in responding to the needs of those specific areas within an
organization.
4. Ease of Use: They're user-friendly due to their specialized nature, making it easier for non-technical users
to access and analyze data pertinent to their responsibilities.
5. Cost Efficiency: Developing and maintaining data marts can be more cost-effective compared to
comprehensive data warehouses since they're smaller in scope and cater to specific needs.
6. Scalability: They can be easily expanded or modified as the business requirements change, allowing for
flexibility and scalability in adapting to evolving needs.

Data mart disadvantage


Data marts, while beneficial for specific purposes, have some disadvantages. Some of these include:
1. Limited in Scope: Data marts are designed for specific departments or users, which can limit the
overall view of the organization’s data. This narrow focus may lead to inconsistencies or inaccuracies
when attempting to analyze data across the entire organization.
2. Data Redundancy: If multiple data marts are created independently, there’s a risk of redundant or
conflicting data. This can result in discrepancies between different departments or teams using
different data marts.
3. Integration Challenges: Integrating data from various data marts or with the main data warehouse
can be complex. Ensuring data consistency and compatibility across different systems and platforms
can be a significant challenge.
4. Scalability Issues: As the organization grows or data needs change, scaling individual data marts
might become difficult or costly. Maintaining and expanding multiple separate data marts can be
resource-intensive.
5. Dependency on Source Systems: Data marts rely heavily on the quality and structure of data from
source systems. If these source systems change or encounter issues, it can affect the data quality in
the data marts.

OLAP OF ADVANTAGE

OLAP (Online Analytical Processing) offers several advantages:

Fast Analysis: Enables quick multidimensional analysis of data for various business perspectives, allowing users to
make informed decisions faster.

Multi-Dimensional View: Provides a multidimensional view of data, allowing users to analyze information from
different angles, facilitating better insights and understanding.

Aggregation and Calculation: Supports complex calculations, aggregations, and summaries across various
dimensions, enabling the generation of reports and forecasts efficiently.

Interactive Exploration: Allows users to interactively explore data, drill-down, and drill-up to view details or
summaries at different levels of granularity.

Decision Support: Empowers decision-makers by offering a flexible environment to slice and dice data, aiding in
identifying trends, patterns, and correlations.

Improved Data Quality: Helps in maintaining data consistency and accuracy by integrating data from multiple
sources, ensuring a reliable foundation for analysis.

Scalability: Offers scalability to handle large volumes of data, accommodating business growth and evolving data
needs without compromising performance.

Enhanced Business Intelligence: Enables businesses to gain a competitive edge by extracting valuable insights
from data, leading to better strategic planning and execution.

What is online analytical processing(OLAP)?


Online analytical processing (OLAP) is software technology you can use to analyze business data from different points of view.
Organizations collect and store data from multiple data sources, such as websites, applications, smart meters, and internal systems.
OLAP combines and groups this data into categories to provide actionable insights for strategic planning. For example, a retailer
stores data about all the products it sells, such as color, size, cost, and location. The retailer also collects customer purchase data,
such as the name of the items ordered and total sales value, in a different system. OLAP combines the datasets to answer
questions such as which color products are more popular or how product placement impacts sales.

Why is OLAP important?

Online analytical processing (OLAP) helps organizations process and benefit from a growing amount of digital information. Some benefits of
OLAP include the following.

Faster decision making- Businesses use OLAP to make quick and accurate decisions to remain competitive in a fast-paced economy. Performing
analytical queries on multiple relational databases is time consuming because the computer system searches through multiple data tables. On
the other hand, OLAP systems precalculate and integrate data so business analysts can generate reports faster when needed.

Non-technical user support- OLAP systems make complex data analysis easier for non-technical business users. Business users can create
complex analytical calculations and generate reports instead of learning how to operate databases.

Integrated data view- OLAP provides a unified platform for marketing, finance, production, and other business units. Managers and decision
makers can see the bigger picture and effectively solve problems. They can perform what-if analysis, which shows the impact of decisions taken
by one department on other areas of the business.

What is OLAP architecture?

Online analytical processing (OLAP) systems store multidimensional data by representing information in more than two dimensions, or
categories. Two-dimensional data involves columns and rows, but multidimensional data has multiple characteristics. For example,
multidimensional data for product sales might consist of the following dimensions:

 Product type

 Location

 Time

Data engineers build a multidimensional OLAP system that consists of the following elements.

Data warehouse- A data warehouse collects information from different sources, including applications, files, and databases. It processes the
information using various tools so that the data is ready for analytical purposes. For example, the data warehouse might collect information
from a relational database that stores data in tables of rows and columns.

ETL tools- Extract, transform, and load (ETL) tools are database processes that automatically retrieve, change, and prepare the data to a format
fit for analytical purposes. Data warehouses use ETL to convert and standardize information from various sources before making it available to
OLAP tools.

OLAP server -An OLAP server is the underlying machine that powers the OLAP system. It uses ETL tools to transform information in the
relational databases and prepare them for OLAP operations.

OLAP database - An OLAP database is a separate database that connects to the data warehouse. Data engineers sometimes use an OLAP
database to prevent the data warehouse from being burdened by OLAP analysis. They also use an OLAP database to make it easier to create
OLAP data models.

OLAP cubes- A data cube is a model representing a multidimensional array of information. While it’s easier to visualize it as a three-dimensional
data model, most data cubes have more than three dimensions. An OLAP cube, or hypercube, is the term for data cubes in an OLAP system.
OLAP cubes are rigid because you can’t change the dimensions and underlying data once you model it. For example, if you add the warehouse
dimension to a cube with product, location, and time dimensions, you have to remodel the entire cube. OLAP analytic tools- Business analysts
use OLAP tools to interact with the OLAP cube. They perform operations such as slicing, dicing, and pivoting to gain deeper insights into specific
information within the OLAP cube.

How does OLAP work?


An online analytical processing (OLAP) system works by collecting, organizing, aggregating, and analyzing data using the following
steps:

1. The OLAP server collects data from multiple data sources, including relational databases and data warehouses.

2. Then, the extract, transform, and load (ETL) tools clean, aggregate, precalculate, and store data in an OLAP cube
according to the number of dimensions specified.

3. Business analysts use OLAP tools to query and generate reports from the multidimensional data in the OLAP cube.

OLAP uses Multidimensional Expressions (MDX) to query the OLAP cube. MDX is a query, like SQL, that provides a set of
instructions for manipulating databases.

What are the types of OLAP?

Online analytical processing (OLAP) systems operate in three main ways.

MOLAP

Multidimensional online analytical processing (MOLAP) involves creating a data cube that represents multidimensional data from a
data warehouse. The MOLAP system stores precalculated data in the hypercube. Data engineers use MOLAP because this type of
OLAP technology provides fast analysis.

ROLAP

Instead of using a data cube, relational online analytical processing (ROLAP) allows data engineers to perform multidimensional
data analysis on a relational database. In other words, data engineers use SQL queries to search for and retrieve specific
information based on the required dimensions. ROLAP is suitable for analyzing extensive and detailed data. However, ROLAP has
slow query performance compared to MOLAP.

HOLAP

Hybrid online analytical processing (HOLAP) combines MOLAP and ROLAP to provide the best of both architectures. HOLAP
allows data engineers to quickly retrieve analytical results from a data cube and extract detailed information from relational
databases.

What is data modeling in OLAP?

Data modeling is the representation of data in data warehouses or online analytical processing (OLAP) databases. Data modeling is
essential in relational online analytical processing (ROLAP) because it analyzes data straight from the relational database. It stores
multidimensional data as a star or snowflake schema.

Star schema

The star schema consists of a fact table and multiple dimension tables. The fact table is a data table that contains numerical values
related to a business process, and the dimension table contains values that describe each attribute in the fact table. The fact table
refers to dimensional tables with foreign keys—unique identifiers that correlate to the respective information in the dimension table.

Characteristics of OLTP systems


In general, OLTP systems do the following:

Process a large number of relatively simple transactions: Usually insertions, updates, and deletions to data,
as well as simple data queries (for example, a balance check at an ATM).

Enable multi-user access to the same data, while ensuring data integrity: OLTP systems rely on
concurrency algorithms to ensure that no two users can change the same data at the same time and that all
transactions are carried out in the proper order. This prevents people from using online reservation systems from
double-booking the same room and protects holders of jointly held bank accounts from accidental overdrafts.

Emphasize very rapid processing, with response times measured in milliseconds: The effectiveness of an
OLTP system is measured by the total number of transactions that can be carried out per second.

Provide indexed data sets: These are used for rapid searching, retrieval, and querying.

Are available 24/7/365: Again, OLTP systems process huge numbers of concurrent transactions, so any data loss
or downtime can have significant and costly repercussions. A complete data backup must be available for any
moment in time. OLTP systems require frequent regular backups and constant incremental backups.

OLTP vs. OLAP


OLTP is often confused with online analytical processing, or OLAP. Both have similar acronyms and are online
data processing systems, but that's where the similarity ends.

OLTP is optimized for executing online database transactions. OLTP systems are designed for use by frontline
workers (e.g., cashiers, bank tellers, part desk clerks) or for customer self-service applications (e.g., online banking,
e-commerce, travel reservations).

OLAP, on the other hand, is optimized for conducting complex data analysis. OLAP systems are designed for use by
data scientists, business analysts, and knowledge workers, and they support business intelligence (BI), data mining,
and other decision support applications.

Not surprisingly, there are several distinct technical differences OLTP and OLAP systems:

 OLTP systems use a relational database that can accommodate a large number of concurrent users and
frequent queries and updates, while supporting very fast response times. OLAP systems use a
multidimensional database—a special kind of database created from multiple relational databases that
enables complex queries of involving multiple data facts from current and historical data. (An OLAP
database may be organized as a data warehouse.)
 OLTP queries are simple and typically involve just one or a few database records. OLAP queries are
complex queries involving large numbers of records.
 OLTP transaction and query response times are lightning-fast; OLAP response times are orders of
magnitude slower.
 OLTP systems modify data frequently (this is the nature of transactional processing); OLAP systems do not
modify data at all.
 OLTP workloads involve a balance of read and write; OLAP workloads are read-intensive.
 OLTP databases require relatively little storage space; OLAP databases work with enormous data sets and
typically have significant storage space requirements.
 OLTP systems require frequent or concurrent backups; OLAP systems can be backed up far less frequently.

It's worth noting OLTP systems often serve as a source of information for OLAP systems. And often, the goal of the
analytics performed using OLAP is to improve business strategy and optimize business processes, which can
provide a basis for making improvements to the OLTP system.

Examples of OLTP
Since the inception of the internet and the e-commerce era, OLTP systems have grown ubiquitous. They’re found in
nearly every industry or vertical market and in many consumer-facing systems. Everyday examples of OLTP
systems include the following:

 ATM machines (this is the classic, most often-cited example) and online banking applications
 Credit card payment processing (both online and in-store)
 Order entry (retail and back-office)
 Online bookings (ticketing, reservation systems, etc.)
 Record keeping (including health records, inventory control, production scheduling, claims processing,
customer service ticketing, and many other applications)

OLAP DISADVANTAGE
Sure, here are some disadvantages of OLAP (Online Analytical Processing) that you might consider when discussing
the technology:
1. Complexity in Implementation: Setting up an OLAP system can be intricate and time-consuming.
Integrating data from various sources, designing the multidimensional database, and ensuring its
compatibility with existing systems can be challenging.
2. Scalability Issues: As data volumes grow, OLAP systems might face performance issues. Query response
times may increase significantly, leading to slower analysis, especially if the hardware infrastructure isn’t
robust enough to handle large datasets.
3. Data Freshness: OLAP databases are typically not real-time systems. There might be a delay between
data updates and their availability for analysis, which can be problematic for time-sensitive decision-
making processes.
4. Cost and Resource Intensiveness: Implementing and maintaining an OLAP system can be costly.
Licensing fees, hardware requirements, ongoing maintenance, and skilled personnel for management and
optimization contribute to the overall expenses.
5. Limited Detail Level: OLAP cubes are designed for aggregated views of data. Analyzing granular or
detailed information might be challenging within OLAP structures, potentially leading to the loss of
intricate details that could be essential for certain analyses.
6. Inflexibility in Schema Changes: Making alterations to the structure or schema of an OLAP cube can be
cumbersome. Adding new dimensions or modifying existing ones might require rebuilding the entire cube,
resulting in downtime and disruption to ongoing analyses.
7. Security Concerns: OLAP systems might store sensitive data, and managing access control to ensure
data security can be complex. Unauthorized access to OLAP cubes could lead to data breaches or misuse
of critical information.

OLTP ADVANTAGE
OLTP (Online Transaction Processing) offers several advantages:

1. Real-time Processing: OLTP systems support real-time transaction processing, allowing immediate
access and updates to data. This is crucial for applications where up-to-date information is essential, such
as banking or e-commerce.
2. Concurrent Access: Multiple users can access and manipulate data simultaneously without interfering
with each other’s transactions, ensuring data integrity and consistency.
3. Atomicity, Consistency, Isolation, Durability (ACID): OLTP systems ensure that transactions are
processed reliably by adhering to ACID properties, guaranteeing that transactions are either completed
entirely or not at all, maintaining data consistency.
4. Optimized for Read and Write Operations: OLTP systems are designed to handle a large number of
short, individual transactions efficiently, making them suitable for applications that require frequent read
and write operations.
5. Support for Complex Queries: While OLTP systems are primarily focused on transaction processing,
modern OLTP databases also provide support for moderately complex queries, enabling some level of
analytical processing.

Type of queries that an OLTP system can Process


OLTP system is an online database changing system. Therefore, it supports database query such as insert, update, and delete information
from the database.

Consider a point of sale system of a supermarket, following are the sample queries that this system can process:
 Retrieving the description of a particular product.
 Filtering all products related to the supplier.
 Searching the record of the customer.
 Listing products having a price less than the expected amount

OLTP Function
Online Transactional Processing (OLTP) is a data processing type that executes transaction-focused tasks. These tasks include inserting,
deleting, or updating database data. OLTP is often used for financial transactions, order entry, retail sales, and CRM.

OLTP systems are user friendly and can be used by anyone with a basic understanding. They allow users to perform
operations like read, write, and delete data quickly. OLTP systems respond to user actions immediately as they can
process queries very quickly.

OLTP systems consist of four main components

 Databases
 Transactional hardware environment
 Application software
 User interfaces

OLTP systems are characterized by:


 A high volume of concurrent users accessing data
 Frequent data modification
 Data integrity
 Transactions that usually involve only a few database records and
small amounts of data

OLTP focuses on maintaining data integrity across multi-access


environments, query processing, and effectiveness.

What is CCR model


The CCR (Charnes–Cooper–Rhodes) model is a Data Envelopment Analysis (DEA) model that evaluates whether a
decision unit achieves scale and technical validity. It was first developed in 1978 and is based on the assumption
that an increase in production resources results in a proportional increase in output.

The CCR model is a model of constant returns to scale, while the BCC (Banker–Charnes–Cooper) DEA model is a
model of variable returns of scale.

The CCR ratio model calculates an overall efficiency for a unit by aggregating its pure technical efficiency and scale
efficiency into a single value. This efficiency is relative to the field and is never absolute.

The CCR model is a commonly used methodology to identify the relative efficiency of decision-making units (DMUs).
It is widely employed in many fields and industries.

What is Data Transformation

Data transformation is the process of converting, cleansing, and structuring data into a usable format that can
be analyzed to support decision making processes, and to propel the growth of an organization.
Data transformation is used when data needs to be converted to match that of the destination system. This can
occur at two places of the data pipeline. First, organizations with on-site data storage use an extract, transform,
load, with the data transformation taking place during the middle ‘transform’ step.

Organizations today mostly use cloud-based data warehouses because they can scale their computing and storage
resources in seconds. Cloud based organizations, with this huge scalability available, can skip the ETL process.
Instead, they use a transformation process that converts the data as the raw data is uploaded, a process called
extract, load, and transform. The process of data transformation can be handled manually, automated or a
combination of both.

Different Data Mining Tasks

There are a number of data mining tasks such as classification, prediction, time-series analysis, association, clustering,
summarization etc. All these tasks are either predictive data mining tasks or descriptive data mining tasks. A data mining system
can execute one or more of the above specified tasks as part of data mining.

Predictive data mining tasks come up with a model from the available data set that is helpful in predicting unknown or future values
of another data set of interest. A medical practitioner trying to diagnose a disease based on the medical test results of a patient can
be considered as a predictive data mining task. Descriptive data mining tasks usually finds data describing patterns and comes up
with new, significant information from the available data set.

What is data mining? List the real life application of data mining.

Data mining is the process of analyzing large amounts of data to find patterns and trends. It's used in many fields,
including:

Credit risk management, Fraud detection, Spam filtering, Market research, Telecommunications, Retail, Financial
analysis, Intrusion detection, Higher education, Biological data.

Data mining involves:

 Cleaning raw data


 Finding patterns
 Creating models
 Testing models

Here are some examples of data mining in action:

Clustering
Dividing a large dataset into smaller, meaningful subsets. This helps understand the individual nature of the
dataset’s elements.
Biological data analysis

Using complex computational analysis to study and interpret biological datasets. This helps predict protein

structure, gene classification, and analyze cell mutation.

Google

Collecting a lot of data about you when you use their services. They then use computer programs to find

information they can use to target ads to you.

Basic Concept of Classification (Data Mining)

Data Mining: Data mining in general terms means mining or digging deep into data that is in different forms to gain patterns, and to
gain knowledge on that pattern. In the process of data mining, large data sets are first sorted, then patterns are identified and
relationships are established to perform data analysis and solve problems.
Classification is a task in data mining that involves assigning a class label to each instance in a dataset based on its features. The
goal of classification is to build a model that accurately predicts the class labels of new instances based on their features.

There are two main types of classification: binary classification and multi-class classification. Binary classification involves
classifying instances into two classes, such as “spam” or “not spam”, while multi-class classification involves classifying instances
into more than two classes.

Advantages:

 Mining Based Methods are cost-effective and efficient


 Helps in identifying criminal suspects
 Helps in predicting the risk of diseases
 Helps Banks and Financial Institutions to identify defaulters so that they may approve Cards, Loan, etc.

Disadvantages:
Privacy: When the data is either are chances that a company may give some information about their customers to other vendors or
use this information for their profit.
Accuracy Problem: Selection of Accurate model must be there in order to get the best accuracy and result.

The process of building a classification model typically involves the following steps:

Data Collection:

The first step in building a classification model is data collection. In this step, the data relevant to the problem at hand is collected.
The data should be representative of the problem and should contain all the necessary attributes and labels needed for
classification. The data can be collected from various sources, such as surveys, questionnaires, websites, and databases.

Data Preprocessing:
The second step in building a classification model is data preprocessing. The collected data needs to be preprocessed to ensure its
quality. This involves handling missing values, dealing with outliers, and transforming the data into a format suitable for analysis.
Data preprocessing also involves converting the data into numerical form, as most classification algorithms require numerical input.

Principal Component Analysis:

Principal Component Analysis (PCA) is a technique used to reduce the dimensionality of the dataset. PCA

identifies the most important features in the dataset and removes the redundant ones.

(Next Question) Affinity methods: Affinity chromatography is a separation method based


on a specific binding interaction between an immobilized ligand and its binding partner.
Examples include antibody/antigen, enzyme/substrate, and enzyme/inhibitor interactions.

Partition method: Partitioning methods are a widely used family of clustering algorithms in data
mining that aim to partition a dataset into K clusters. These algorithms attempt to group similar data
points together while maximizing the differences between the clusters.

What Is Artificial Intelligence?


Artificial intelligence (AI) is the theory and development of computer systems capable of performing tasks that historically required human
intelligence, such as recognizing speech, making decisions, and identifying patterns. AI is an umbrella term that encompasses a wide variety of
technologies, including machine learning, deep learning, and natural language processing (NLP).

Yet, despite the many philosophical disagreements over whether “true” intelligent machines actually exist, when most people use the
term AI today, they’re referring to a suite of machine learning-powered technologies, such as Chat GPT or computer vision, that enable machines
to perform tasks that previously only humans can do like generating written content, steering a car, or analyzing data.
Types of AI
As researchers attempt to build more advanced forms of artificial intelligence, they must also begin to formulate more nuanced understandings
of what intelligence or even consciousness precisely mean. In their attempt to clarify these concepts, researchers have outlined four types of
artificial intelligence.

1. Reactive machines
Reactive machines are the most basic type of artificial intelligence. Machines built in this way don’t possess any knowledge of
previous events but instead only “react” to what is before them in a given moment. As a result, they can only perform certain
advanced tasks within a very narrow scope, such as playing chess, and are incapable of performing tasks outside of their limited
context.
2. Limited memory machines
Machines with limited memory possess a limited understanding of past events. They can interact more with the world around them
than reactive machines can. For example, self-driving cars use a form of limited memory to make turns, observe approaching
vehicles, and adjust their speed. However, machines with only limited memory cannot form a complete understanding of the world
because their recall of past events is limited and only used in a narrow band of time.
3. Theory of mind machines
Machines that possess a “theory of mind” represent an early form of artificial general intelligence. In addition to being able to create
representations of the world, machines of this type would also have an understanding of other entities that exist within the world. As
of this moment, this reality has still not materialized.
4. Self-aware machines
Machines with self-awareness are the theoretically most advanced type of AI and would possess an understanding of the world,
others, and itself. This is what most people mean when they talk about achieving AGI. Currently, this is a far-off reality.

AI benefits

1. Greater accuracy for certain repeatable tasks, such as assembling vehicles or computers.
2. Decreased operational costs due to greater efficiency of machines.
3. Increased personalization within digital services and products.
4. Improved decision-making in certain situations.
5. Ability to quickly generate new content, such as text or images.

AI Dangers

1. Job loss due to increased automation.


2. Potential for bias or discrimination as a result of the data set on which the AI is trained.
3. Possible cybersecurity concerns.
4. Lack of transparency over how decisions are arrived at, resulting in less than optimal
solutions.
5. Potential to create misinformation, as well as inadvertently violate laws and
regulations.

What is Supply Chain Optimization?

Supply chain optimization is the adjustment of a supply chain’s operations to ensure it is at its peak of efficiency.
Such optimization is based on certain key performance indicators that include overall operating expenses and
returns on the company’s inventory. The aim is to provide customers with the products at the lowest total cost
possible while retaining the highest profit margins. To achieve these goals, managers have to balance costs
incurred in manufacturing, inventory management, transportation, and fulfillment of customer expectations.

Considering how complex supply chain optimization is, it’s best to tackle this business process as a long-term
activity. What works is a blend of cost and service changes over time that take into account variations in resource
costs, carrier changes, customer demographics, and other factors that require constant examination.
Define electronic knowledge portals (EKP).
An Electronic Knowledge Portal (EKP) is a type of Enterprise Information Portal (EIP) that is focused on knowledge
production and integration. EKPs are a new way to access corporate expertise by combining knowledge
management (KM) technology with portal technology.

Knowledge portals are internet-based programs that provide a single point of access to organizational
knowledge. They support and encourage knowledge transfer, storage and retrieval, creation, integration, and
application by providing access to relevant knowledge artifacts.

Knowledge portals can contain a wide range of information, including:

 Process documentation and SOPs


 HR policies
 Guides, manuals, and tutorials

Knowledge management portals can promote a collaborative work environment by providing employees with a
platform to share ideas and knowledge. They can also help employees collaborate seamlessly on projects and stay
on the same page.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy