ThoughtSpot SpotIQ AI Driven Analytics White Paper PDF
ThoughtSpot SpotIQ AI Driven Analytics White Paper PDF
ThoughtSpot SpotIQ AI Driven Analytics White Paper PDF
2
CHAPTER 1
ARTIFICIAL INTELLIGENCE: BEYOND THE HYPE
Fast forward to today and we are now once again celebrating the many advances in AI,
particularly in the field of machine learning. AI is no longer relegated to pure science fiction,
as often depicted by many Hollywood films. Since those early days of AI, it has been used
in many applications, from computer games, to simulating human movement in robotics, and
more recently, even making vehicles completely autonomous. AI has become as ubiquitous as
the enormous amounts of data that feeds its underlying calculations and machine learning
algorithms. Combined with the support of seemingly boundless amount of compute power,
there’s broad applicability in many consumer services and enterprise products today.
ARTIFICIAL INTELLIGENCE
MACHINE LEARNING
DEEP LEARNING
So what exactly is artificial intelligence and machine learning and how are they different?
In short, AI is a broad concept where machines behave and think more like humans. Machine
learning is an application of AI that uses statistics and historical data to identify patterns and
automatically improve at accomplishing a given set of tasks over time. Machine Learning itself
is also classified into three different categories: supervised learning, unsupervised learning,
and reinforcement learning.
With supervised learning, humans specify a desired outcome and manually classify a set of
training data, and the machine automatically learns how to classify new data to produce the
3
desired outcome. Spam filtering is a good example of supervised learning as spam filters learn
from humans’ explicit classification of spam emails vs good emails. On the contrary, with
unsupervised learning, the machine automatically determines how to classify data without
human intervention, continuously adapting and improving its ability to accomplish a task all on
its own. Playlist curation and content recommendations on sites like YouTube are examples of
unsupervised learning as the system automatically learns each user’s preferences from his/her
interaction with the content, without any explicit action from the user. And with reinforcement
learning, the machine marches towards a particular goal and its behavior as it navigates down
its path is influenced by rewards or punishments such as a user’s explicit approval or disapproval
via a thumbs up or thumbs down action.
Going one layer deeper, deep learning is a machine learning technique that processes data
through multi-layered neural networks – processes inspired by the structure of the human brain.
Deep learning algorithms are extremely powerful, producing desired outcomes by breaking down
a problem into smaller chunks and crunching through large amounts of data at scale. As an
example, in self-driving cars, one part of the algorithm recognizes cars in other lanes, another
recognizes pedestrians, and another may even recognize street signs. All these pieces working
together help the car navigate safely to its destination.
Everyday Applications of AI
As consumers, AI is at work all around us. It’s not uncommon to roll out of bed in the morning
and ask your favorite virtual assistant, “What is the weather today?” Popular devices such as the
Amazon Echo and Google Home use AI techniques like far-field voice recognition to isolate your
voice in a noisy room and then use natural language processing to parse your question. Then, it
recognizes your location and uses weather source APIs they have at their disposal to respond
back with something like, “It is 77° fahrenheit with clear skies in Memphis today.” This is a perfect
example of AI and machine learning built into virtual assistants that enable smart devices to act
like an intelligent human.
When you open your laptop and fire up a web browser, your initial destination is likely your favorite
search engine like Google. As you type your question in the search box, the search engine returns
relevant search suggestions in real time based on every character you type. But as you know, not all
search engines are created equal. Google’s search engine has become the gold standard because
of the quality of the results it returns, driven by its sophisticated PageRank machine learning
4
algorithm. PageRank uses the billions of documents on the web and data about the number and
quality of links within a webpage to automatically determine how relevant the content is based
on your search terms.
Every time you upload a photo to Facebook, it uses facial recognition AI to pick out faces in
the photo and automatically makes recommendations on who should be tagged in what photo
based on patterns found in other photos of that person. More recently, Facebook researchers have
started rolling out a deep learning image recognition algorithm “DeepFace” that is said to be 97%
accurate. Similar technologies power Google’s image search and Apple’s new facial recognition
software that automatically recognizes your face to unlock your phone.
The list of AI-powered consumer experiences doesn’t end there, signaling how pervasive and
mainstream AI has become. Naturally, there is also a real sense of excitement about AI and machine
learning in the data and analytics community as this technology makes its way into the enterprise.
CHAPTER 2
Why Relational Search is Different
THE AI OPPORTUNITY IN ANALYTICS
Unfortunately, while data volume is growing exponentially, the volume of insights we’re able to
extract from it is fundamentally limited. That’s because in today’s analytics paradigm there’s a
huge gap between data supply and data demand. On one end, you have lots of data consumers
across every line of business who are craving new insights. On the other end you’ve got few data
producers – the data experts – who are required to extract value from data. As more and more
data is collected, more pressure is being placed on this small group of trained experts.
DATA GAP
CREATORS
IT, BI Teams, Analysts,
Data Engineers
CONSUMERS
Sales, Marketing, Service Ops, HR, Finance
That’s why AI and Machine Learning is so exciting in the world of analytics today. Infusing AI into
analytics workflows can transform your organization and bridge the supplier-consumer divide
by giving everyone access to the tools they need to make data-driven decisions. The good news
is that AI has already arrived and is changing the way business people — like marketers and
salespeople — interact with the data they have at their disposal. Today, the uses of AI in analytics
can be boiled down to three categories of technology: automated data discovery, search and
conversational analytics, and intelligent modeling and recommendations.
6
Automated Data Discovery
Automated data discovery encompasses a class of technologies that automate the process of data
analysis and exploration in real-time. This includes everything from selecting data sets to explore,
running queries automatically on your behalf, combing through results for insights, and choosing
a best-fit visualization paired with a natural language description of each insight.
The number of possible questions to ask of data is often too much for any human. With automated
data discovery technologies, business people can rely on machine-driven smarts to explore
complex datasets with a few clicks and get insights explained to them in natural language, without
the need for a trained analyst and the hours of time it would take them to explore the data manually
and build a report. Instead, data experts can focus on data governance, building bulletproof data
models, preparing new datasets for analysis.
7
Conversational analytics technologies expand on this question-and-answer paradigm by leveraging
the power of bots and virtual assistants to extend access to data outside of the analytics
environment. Bots serve as a virtual liaison between humans and analytics engines and make it
possible for business users to query and interact with their data on-the-go. These virtual assistants
can be found within modern instant messaging clients like Slack and native mobile apps, helping to
provide critical insights and data context anywhere at anytime.
AI-powered data modeling and recommendations can help cut down time spent on this kind
of work by automatically generating statistics about data sets, inferring data types, identifying
hierarchies and relationships within data sets, and dynamically aggregating data at query-time.
This helps to create a new class of citizen data scientists and frees up time for data experts to
focus on the work they enjoy most like sourcing new datasets and doing more sophisticated
analyses like predictive analytics to help the business stay ahead of the curve.
This covers just a few examples of how AI is fundamentally changing the world of analytics. Applied
correctly, artificial intelligence has the potential to substantially impact or predict business
outcomes, exponentially improve employee productivity and decision-making, and even create
new jobs within the data team by reducing the need for data literacy.
Machine-generated insights also help to minimize errors in analysis and eliminate human bias,
bringing to our attention new metrics and business drivers that weren’t considered before.
8
Ultimately, the ease and speed with which new insights are detected enable a new class of citizen
data professionals who can source, prepare, and analyze their own data without the need for trained
resources. But what will it take for your organization to adopt these kinds of AI-driven analytics
technologies and change the way your business operates?
Trust-based AI
There’s a problem lurking at the core of AI in today’s world. Many understand what innovative AI
technologies can accomplish but few know how they work, creating a general feeling of distrust.
When you combine that with a category of technology like analytics that drives business-critical
decisions, you can understand why adoption of these technologies can be difficult.
And that’s why the key to adoption of AI-infused analytics is trust. When it comes to analytics, trust
is created by delivering accurate, relevant, and transparent results. To do this, machines should not
rely solely on their own built-in learning algorithms but must work together with humans to ensure
every result meets these standards of trust. That’s why we built SpotIQ - our new automated
data insights engine that makes it easy for any business person to automatically generate trusted
insights with a single click.
9
CHAPTER 4
INTRODUCING AI-DRIVEN ANALYTICS FROM THOUGHTSPOT
With SpotIQ, there is a huge opportunity to enable millions of business people to make smarter
decisions fueled by automated AI-driven automated insights. Powered by an analytics platform
with massive computing power, it learns what matters most to teams based on user behavior,
automatically uncovers hidden trends and patterns in the data, and then pushes those insights
directly to people when it matters most.
10
Effective AI requires a platform-up approach
The ability to perform complex calculations on massive amounts of data at extremely high speeds
is crucial to deliver the promise of AI. However, making AI accessible to everyone by providing
a simple experience that can handle the user and data scale along with the complexities of the
enterprise is not an easy problem to solve. Traditional disk-based or even hybrid in-memory
and disk-based solutions are not adequate to meet the performance and scale demands of AI.
DATA APPS
EMBEDDED AUTOMATED
SEARCH
ANALYTICS
BOTS
INSIGHTS +
DATA ENTERPRISE
CONNECTORS BI & VISUALIZATION SERVER SECURITY &
& API GOVERNANCE
Instead, a distributed, massively parallel, in-memory execution engine will provide processing speeds
orders of magnitude faster than traditional architectures. Combine that with the enterprise-grade
requirements like security, governance, high availability, and manageability and what is needed is
a vertically-aware analytics stack built from the ground-up for AI-driven analytics.
ThoughtSpot’s next generation AI-driven analytics platform combines the precision of the world’s
first relational search engine with the smarts of a robust AI engine and the scale of a massively
parallel in-memory data cache and calculation engine. As data is cached in ThoughtSpot, the
entire data model is indexed, including the raw data, metadata, and relationships. This makes
it easy for anyone to perform database joins, drill anywhere in their data model, and change
aggregations on the fly, eliminating the need for rigid summary structures like cubes and
materialized views that often take technical resources hours to build.
ThoughtSpot is deployed on a cluster of nodes—each of which has its own set of services and
processes running in-memory. A Distributed Cluster Manager provides optimal distribution of
workload for scale and performance as well as for fault tolerance, redundancy, and failover,
while minimizing administrative overhead. ThoughtSpot also supports table sharding across
multi-node clusters, multi-parallel processing of queries, query caching for frequently used
queries, just-in-time query compilation, and compression-aware query execution—all in an
in-memory, columnar data cache.
ThoughtSpot can also handle data model complexity with ease. ThoughtSpot’s BI & Visualization
11
Server can automatically identify complex data models, like many-to-many relationships found
in chasm traps and master-detail relationships found in fan traps. Complex queries are executed
against any data model, generate an ordered set of subqueries to execute, and automatically
determine the proper grouping and aggregation level to get you to a 100% accurate result.
ThoughtSpot supports automated insight discovery for 100s of thousands of users who each
generate 1000s of explorations. These result sets are processes through scalable, custom-built AI
algorithms and then insights are ranked and pushed to users automatically. To do this, SpotIQ has
a dedicated workload balancer to prioritize user requests, leverages the power of ThoughtSpot’s
data platform for calculations, and uses home-grown, optimized AI algorithms—like supervised
machine learning—to process results and pick the best insights for each user.
The result of this integrated, vertically-aware stack is a radically different analytics experience
that is lightning fast and scales to terabytes of in-memory data across multiple sources—all
with granular security and governance access control applied. Thousands of users can execute
complex queries on billions of rows of data, and receive answers in seconds.
Built into our Relational Search engine is DataRank, a machine-learning algorithm, that ranks
search suggestions as you type. It gets smarter with use as it understands the characteristics
of your data, your profile, and usage patterns, and applies granular security rules to help guide
you to the right answer with personalized search suggestions. With Relational Search, we make
it effortless f or n on-technical b usiness p eople t o g ain v aluable i nsights f rom c ompany d ata
in seconds. There’s no faster way to get access to information than search, if you know what
question you are asking. But what about the questions you don’t know to ask? In fact, taking into
consideration all the different dimensions that business people care about across their business,
and you’ll find that the universe of possible questions to ask of the data is prohibitively massive.
As a result, it is not possible for a typical analytics team to pre-build reports or dashboards that
can answer all possible questions, especially those they may not have known to ask.
12
CHAPTER 5
SPOTIQ—THE POWER OF AUTOMATED INSIGHTS
Now imagine if an intelligent and powerful machine could access numerous data sets, generate
thousands of questions, analyze billions of data points, spot hidden trends and anomalies, and
proactively push relevant and personalized insights to you, all in seconds - with a single click of a
button. That is SpotIQ and the power of automated insights.
Built on our next-generation AI-driven analytics platform, SpotIQ leverages ThoughtSpot’s massively
scalable high-performance computing backend, working with Relational Search hand-in-hand to
curate deep and relevant insights for users that they may not have thought to look for on their
own. With a single click, SpotIQ can automatically ask thousands of questions about billions of
data points and bring back dozens of insights in seconds. It automatically generates a dashboard
full of personalized insights, each accompanied by a smart narrative in natural language explaining
what is meaningful in the data.
13
Humans working hand-in-hand with smart machines
For AI to be effective in delivering the most accurate, relevant and trusted insights, machines should
not rely solely their built-in learning algorithms. Instead, rather than machines and humans working
independently of each other, humans are always in the loop with SpotIQ. As a result, the larger the
human and data scale, the smarter the platform.
SPOTIQ FINDS THE MOST RELEVANT INSIGHTS BASED ON YOUR SEARCH BEHAVIOR
14
Trust every insight you get
The concept of teaching the machine how to learn and building
in trust as part of the process is paramount to SpotIQ. As a
result, SpotIQ does not behave like a black box. Rather, with
SpotIQ, you always get full visibility into what question was asked,
which algorithm was used, how answers were calculated and
why each insight is deemed relevant and important to the user.
Combine that with a Relational Search engine that performs zero
guesswork or interpretation into each query that is executed and
you have trust-based, transparent interaction model with humans
always at the core.
15
CHAPTER 6
HOW SPOTIQ WORKS
DataRank
Natural Language
Understands your user profile and
Generation
data characteristics to automatically
narrow down explorations Crafts a natural-language
Sophisticated narrative so you can make faster,
Algorythms better informed decisions
Processes results through
dozens of insight-detection
algorithms
2 4 6
1 3 5
7
User Feedback Loop
16
3. Sophisticated insight-detection algorithms
SpotIQ then processes the results from its queries through a series of built-in insight-
detection algorithms. These algorithms help you uncover anomalies and outliers, or
identify relationships between measures that you didn’t know about. Or find upward or
downward trends on noisy data. You can even analyze an entire data set or granularly
explain differences between two data points. Over time, SpotIQ’s library of algorithms will
continue to grow to address our customers’ expanding set of use cases.
TECHNIQUE DESCRIPTION
Trend Lines Regression analysis Best fit line for time series data
is estimated, and picks out the
positive and negative trends that
stand out the most
17
5. Natural language narratives
Accompanying each automatically generated insight is a smart narrative that identifies
what is significant and meaningful about the data and describes it in a way you understand.
SpotIQ automatically generates the narratives in natural language so you no longer
have to spend time to study the data or rely on data experts to interpret it for you. More
consumable analytical insights leads to faster, better informed decisions.
6. Best-fit visualizations
As each insight is computed, ThoughtSpot analyzes the characteristics of the resulting
dataset, intelligently determines the best- fit visualization for the analysis and presents
back an interactive chart to the end user, in an automatically generated dashboard. Each
of the charts and Pinboards are first class objects in ThoughtSpot, meaning users can
interact with the chart, ask additional ad-hoc questions, or pin it to additional Pinboards.
18
CHAPTER 7
CONCLUSION: BRINGING AI TO THE MASSES
SpotIQ AI-driven analytics automatically uncovers answers to questions business people may
not have known to ask. Powering SpotIQ is a new breed of analytics architecture and in-memory
calculation engine that was built from the ground up for speed at scale on billions of rows of
data across multiple data sources - all while delivering sub-second performance and enterprise-
wide governance. Working hand-in-hand with Relational Search, it learns what’s most important
based on usage behavior, spots hidden trends and patterns in the data, and delivers trusted and
personalized insights to any business person. With SpotIQ, you now have the power of a thousand
analysts in your hand.
19
About ThoughtSpot
ThoughtSpot, the leader in Search and AI-Driven analytics for humans, is helping the largest
companies in the world succeed in the digital era by putting the power of a thousand analysts in
every business person’s hands. With ThoughtSpot, business people can use a Google-like search
to easily analyze billions of rows of data or automatically get trusted insights to questions they
didn’t know to ask - all with a single click. ThoughtSpot connects with any on-premise, cloud, big
data, or desktop data source, deploying 85 percent faster than legacy technologies. Customers like
Amway, Bed Bath and Beyond, Capital One, Celebrity Cruises, Chevron Federal Credit Union, De
Beers, Insurethebox and Scotiabank have put ThoughtSpot at the of their core business processes.
With ThoughtSpot, business leaders and frontline workers alike have made more than 3 million data
informed decisions.
ThoughtSpot was co-founded in 2012 by Ajeet Singh, former co-founder and Chief Product Officer
at Nutanix, the largest tech IPO of 2016. With an engineering team built with Google, Amazon, and
Facebook DNA, ThoughtSpot has raised over $160M in funding from Lightspeed Venture Partners,
Khosla Ventures, General Catalyst Partners, Geodesic Capital and Capital One Growth Ventures.
The company is headquartered in Palo Alto, with offices in Seattle, London and Bangalore. For
more information please visit thoughtspot.com.
20