Slide 3
Slide 3
(MSITec7111)
Flexible schema
• Wide column representation: allow each tuple to have a
different set of attributes, can add new attributes at any
time
• Sparse column representation: schema has a fixed but
large set of attributes, by each tuple may store only a
subset
Multivalued data types
• Sets, multisets
E.g.,: set of interests {‘basketball, ‘La Liga’, ‘cooking’,
‘anime’, ‘jazz’}
• Key-value map (or just map for short)
Store a set of key-value pairs
E.g., {(brand, Apple), (ID, MacBook Air), (size, 13),
(color, silver)}
Operations on maps: put(key, value), get(key),
delete(key)
• , Arrays
Features of Semi-Structured Data Models
Arrays
• Widely used for scientific and monitoring applications
• E.g., readings taken at regular intervals can be represented
as array of values instead of (time, value) pairs
[5, 8, 9, 11] instead of {(1,5), (2, 8), (3, 9), (4, 11)}
Multi-valued attribute types
• Modeled using non first-normal-form (NFNF) data model
• Supported by most database systems today
Array database: a database that provides specialized support
for arrays
• E.g., compressed storage, query language extensions etc
• Oracle GeoRaster, PostGIS, SciDB, etc
Nested Data Types
Knowledge graph
Triple View of RDF Data
Querying RDF: SPARQL
Triple patterns
• ?cid title "Intro. to Computer Science"
• ?cid title "Intro. to Computer Science"
?sid course ?cid
SPARQL queries
• select ?name
where {
?cid title "Intro. to Computer Science" .
?sid course ?cid .
?id takes ?sid .
?id name ?name .
}
• Also supports
Aggregation, Optional joins (similar to outerjoins),
Subqueries, etc.
Transitive closure on paths
RDF Representation (Cont.)
User-defined types
• create type Person
(ID varchar(20) primary key,
name varchar(20),
address varchar(20)) ref from(ID); /* More on this later */
create table people of Person;
Table types
• create type interest as table (
topic varchar(20),
degree_of_interest int);
create table users (
ID varchar(20),
name varchar(20),
interests interest);
Array, multiset data types also supported by many databases
• Syntax varies by database
Type and Table Inheritance
Type inheritance
• create type Student under Person
(degree varchar(20)) ;
create type Teacher under Person
(salary integer);
Table inheritance syntax in PostgreSQL and oracle
• create table students
(degree varchar(20))
inherits people;
create table teachers
(salary integer)
inherits people;
• create table people of Person;
create table students of Student
under people;
create table teachers of Teacher
under people;
Reference Types
Measures of effectiveness
• Precision: what percentage of returned results are actually
relevant
• Recall: what percentage of relevant results were returned
• At some number of answers, e.g. precision@10, recall@10
Keyword querying on structured data and knowledge bases
• Useful if users don’t know schema, or there is no predefined
schema
• Can represent data as graphs
• Keywords match tuples
• Keyword search returns closely connected tuples that
contain keywords
E.g. on our university database given query “Zhang Katz”,
Zhang matches a student, Katz an instructor and advisor
relationship links them
Spatial Data
Spatial Data
Region queries deal with spatial regions. e.g., ask for objects
that lie partially or fully inside a specified region
• E.g., PostGIS ST_Contains(), ST_Overlaps(), …
Nearness queries request objects that lie near a specified
location.
Nearest neighbor queries, given a point or an object, find the
nearest object that satisfies given conditions.
Spatial graph queries request information based on spatial
graphs
• E.g., shortest path between two points via a road network
Spatial join of two spatial relations with the location playing
the role of join attribute.
Queries that compute intersections or unions of regions