0% found this document useful (0 votes)

35 views51 pages

ECS765P - W9 - Large-Scale Graph Processing

This document discusses large scale graph processing. It covers graph applications, graph databases, graph databases with Python, Pregel, and GraphX. Graphs are used to model interactions and relationships in many domains like social networks.

Uploaded by

Yen-Kai Cheng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views51 pages

ECS765P - W9 - Large-Scale Graph Processing

Uploaded by

Yen-Kai Cheng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

ECS640U/ECS765P Big Data Processing

Large Scale Graph Processing

Lecturer: Ahmed M. A. Sayed
School of Electronic Engineering and Computer Science
ECS640U/ECS765P Big Data Processing
Large-Scale Graph Processing
Lecturer: Ahmed M. A. Sayed
School of Electronic Engineering and Computer Science

Credit: Joseph Doyle, Jesus Carrion, Felix Cuadrado, …

Weeks 6-11: Processing

Data
Ingestion Storage Processing Output
Sources

In this week, we will focus on Graph Processing

Big Data Processing: Week 9
Topic List:

● Graph Applications
● Graph Databases
● Graph Databases with python
● Pregel
● Graphx
Graph Definition

A graph G = (V,E), where

• V represents the set of vertices (nodes)
• E represents the set of edges (links)
• Both vertices and edges may contain additional information

Different types of graphs:

• Directed vs. undirected edges
• Cyclic vs Acyclic
• Temporal Graphs
Graphs are ubiquitous
Modeling and tracking interactions
Social Graphs

Social media defines interaction networks

• Contacts
• Messages
• Tags
Graph analysis is quite useful to obtain valuable information
• Identify leaders in a community
Measure of influence (centrality)
Identify “special” nodes and communities
• Find the right fitness Instagram influencer to advertise your protein
powder on
Community Detection
Community detection, also called graph partition
• Helps us to reveal the hidden relations among the nodes in the network.
• Many algorithms have been developed to detect communities

Communities of college football network, using colors for conferences and spatial clustering for identified communities
https://www.ese.wustl.edu/~nehorai/research/network_science/Lu_Community_Detection_SR_2018.html
Bipartite graphs

Bipartite: when the graph is partitioned into two groups and nodes only can have edges to the other part
https://en.wikipedia.org/wiki/Bipartite_graph

Example: “Stable marriage/matching” problem: how to find a stable matching between two equally sized
sets of elements given an ordering of preferences for each element. A matching is a bijection from the
elements of one set to the elements of the other set https://en.wikipedia.org/wiki/Stable_marriage_problem
Not Stable: if there is an element A of the first matched set which prefers some given element B of the
second matched set over the element to which A is already matched with, and similarly B also
prefers A over the element to which B is already matched with.

Practical applications: Web advertising, click prediction

Contagion/epidemic networks

How quickly will COVID-19 spread on this graph?

Contact tracing and analysis of epidemic spreading

“Needle exchange” networks of drug users [Weeks et al. 2002]

Interesting Properties – Power Law

The Power Law in the degree distribution (or popularity)

• The minority (only few number of nodes) has high degree of influence
• Also called Scale-Free Networks
• Quite common in many human/social networks
Central Nodes or Influencers with high degree

Scale-Free – degree distribution follows power law

Random – uniform degree distribution

Frequency
Degree

https://en.wikipedia.org/wiki/Scale-free_network
Big Data Processing: Week 9
Topic List:

● Graph Applications
● Graph Databases
● Graph Databases with python
● Pregel
● GraphX
Graph Management / Storage

Traditional DBs, NoSQL DBs can store graphs

But query languages do not support native queries on graph elements (checking for relations)

We need query languages/abstractions suitable for finding relationship patterns

Rise of graph DBs

• Neo4j : Graph database management system (Java-based, not distributed)
• Titan: Distributed Graph Database
• Amazon Neptune (Nov 2017) : Fully Managed Graph Database
Graph Databases

Database that uses graph structures with nodes, edges and properties to store data

Provides index-free adjacency

• Every node is a pointer to its adjacent element
• Fast for following relationships

Edges hold most of the important information and connect:

• nodes to other nodes
• nodes to properties/metadata (Resource Description Framework - RDF)
Neo4j

Java-based graph database management system

Similar to SQL: it is ACID – Atomic, Consistent, Isolated and Durable for logical units of work for database
transactions – (https://en.wikipedia.org/wiki/ACID)

Property graph model: powerful schema-less way to model graph-based information

Good performance for non-massive datasets

Not distributed – but sharded (partition)

Cypher - Graph-specific query language

• ASCII-art syntax for define and match patterns
The property graph model

Entities – Vertices and Edges

Tags – Entities have type(s)

Properties – Key value pairs attached to entities

The property graph model: Books
Tags

Property

Entity
Why SQL is not suitable for dealing with a graph-based data?

SQL: Modelling and Querying a Graph

Relationship graph between account holders

Imagine using such

cumbersome SQL to
query large social
network graphs!

Get non-immediate
friends of
Person001 who are
up to 3 hops away
Cypher query language

Query Language for Neo4j

• Becoming standard through OpenCypher initiative (https://opencypher.org)

Declarative and Expressive language

Match queries, returning all the graph elements who satisfy all the pattern
Sample Cypher query on a graph

*..5 => any number up to 5

Movies Database neo4j
Install neo4j: https://neo4j.com/docs/operations-manual/current/installation/
Download desktop edition: https://neo4j.com/download/

Open the Movies project in the desktop and then use command :play movies

Then follow the instructions for creating movies database and queries
Movies Database neo4j
Find Movies released in the 1990s
Find actors up to 4 hops away from Kevin Bacon
Find actors shortest path between two actors
Find co-co actors of Tom Hanks
Big Data Processing: Week 9
Topic List:

● Graph Applications
● Graph Databases
● Graph Databases with python
● Pregel
● GraphX

Break and Quiz

Neo4j Python Library

Install neo4j python library

Pip install neo4j

For example: https://github.com/neo4j-examples/movies-python-bolt

Can access neo4j database with python using the neo4j library
Neo4j Python Obtaining Json Graph

Obtain json of the graph defined in the movies database showing movie titles and their actors/cast
Neo4j Python Search Functionality

Search for movies in the database that has sub-text defined by the variable q in movie titles

Case Sensitive Matching Any character Zero or More times

https://neo4j.com/docs/cypher-manual/current/clauses/where/#query-where-regex
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html
Big Data Processing: Week 9
Topic List:

● Graph Applications
● Graph Databases
● Graph Databases with python
● Pregel
● GraphX
Graph Traversal in MapReduce
Approach: Parallel processing of each vertex
● Each Map/Reduce function has access to limited info
One node and its links
Iterative executions of a MapReduce job
● Map: compute something on each node. Potentially send information to that node or other nodes that
is aggregated by the Reducers.
● Reducers: compute something on each unique node
● The output of the reducers in iteration #n becomes the input of the mappers in iteration #n+1
Finding the Shortest Path: Intuition

Breadth-First Search (BFS) algorithm (https://en.wikipedia.org/wiki/Breadth-first_search)

We can define the solution to this problem via induction:

● distanceTo(startNode) = 0
● For all nodes n directly reachable from startNode à distanceTo(n) = 1
● For all nodes n reachable from some other set of nodes S,
distanceTo(n) = 1 + min(distanceTo(m) for all m neighbors ∈ S)
Visualizing Parallel BFS

Inefficient

Need to keep track of

the list of visited nodes
and pass it over
between jobs along
with the updated graph
state/structure
MapReduce graph processing performance
Iterative algorithms involve HDFS writing in each step
● Resending the graph structure in each iteration is VERY inefficient
One Map task per node, and sending of messages to other nodes depending on connections between
graph nodes results in significant communications cost.
In-memory systems are a much better fit for this type of computation à Spark Framework
● Graph-specific in-memory systems have been developed recently
More Efficient Alternatives
Google’s Pregel
● Original Google paper
● Google’s Pregel model: Think like a vertex [1]
Apache Giraph
● Java-based
Apache Spark GraphX
● Extension of Spark with Graph-centric computation model (Scala)
● GraphFrames for Python API (used for this week’s lab)

… Ongoing research efforts in this space

[1] Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., & Czajkowski, G. (2010, June). ”Pregel: a system for large-scale graph
processing.”, In Proceedings of the ACM SIGMOD
Pregel: Think like a vertex

The Pregel framework allows you to write “vertex-centric” code.

The same user code, a compute() function, is run concurrently
on each vertex of the graph.
Each instance of this function
1. keeps track of information
2. can iterate over outgoing edges (each of which has a value)
3. can send messages to the vertices connected to those edges
or to any other vertices it may know about (e.g., having
received a vertex ID via a message) Bulk Synchronous Parallel (BSP)

https://people.cs.rutgers.edu/~pxk/417/notes/pregel.html
Pregel’s node/vertex-centric processing model
Pregel-style graph processing systems
Computation is iterative but in the form of supersteps
● Every iteration, a function that is executed at each vertex
Vertices can send messages to its neighbours
Messages arrive in the next superstep
Computation is executed in parallel
● Each vertex is independent from the rest in the same step
● Messages are the synchronization mechanism

https://people.cs.rutgers.edu/~pxk/417/notes/pregel.html
Google’s PageRank
PageRank is a link analysis algorithm
The rank value indicates the importance of a particular web page
A hyperlink to a page counts as a vote of support
A page that is linked to by many pages with high PageRank receives a high rank itself
Example: A PageRank of 0.5 means there is a 50% chance that a person clicking on a random link
will be directed to the document with a PageRank of 0.5

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: bringing order to the web., WWW
PageRank Example
Rank of the neighbor

Initial value = 1 / N
(number of pages)
Outdegree of the neighbor

r1(P2) = r(P3) / d(P3) + r(P1) / d(P1) = (1/6)/3 + (1/6)/2 = 1/18 + 1/12 = 30 / 216 = 5 / 36
r2(P2) = r1(p3)/d(p3) + r1(P1)/d(P1) = (1/12)/3 + (1/18)/2 = 1/36+1/36 = 1/18
https://en.wikipedia.org/wiki/PageRank
Big Data Processing: Week 9
Topic List:

● Graph Applications
● Graph Databases
● Graph Databases with python
● Pregel
● GraphX
Spark GraphX
Spark’s library for graph processing
Provides specialized RDDs for representing graph structure, as well as its information (property graphs)
Provides methods for creating graph, transforming them, implementing multiple common graph metrics
and algorithms
GraphX is written in Scala à Graphframes is the Python library for using Spark’s Graph Processing
Spark GraphX Property Graphs
Spark GraphX RDD
Holds graph data and provides methods for manipulating them
VertexRDD[VertexId, VertexData]
Vertex IDs have to be Integer/Long
VertextData Holds vertex properties
EdgeRDD [EdgeData]
Edgedata holds source and destination IDs and edge properties
Technically a directed graph
Triplets
Join of source vertex, destination vertex, and edge
GraphX predefined methods
A Graph RDD has multiple convenience methods that provide access to its information and implement
relevant operations

● Access to RDDs with the property information

graph.vertices, graph.edges, graph.triplets
● Provides a tuple with (vertexId, degree of each vertex)
graph.degrees
● Obtains each of the connected components of the graph
graph.connectedComponents
GraphX predefined methods
_2 is second field in table -> Property column _2 is second field in Property column -> position of the person

graph.vertices.map(v => v._2._2).collect() //returns (student, postdoc, professor, professor)

graph.edges.filter ( e => e._3.equals("PI")).count() //returns 1
graph.vertices.filter { case (id, (name, pos)) => pos == "postdoc" }.count // Count all users that are postdocs
return 1

https://spark.apache.org/docs/latest/graphx-programming-guide.html
Graph aggregate computation
Aggregate transformations send and process messages to every vertex through each edge
graph.aggregateMessages: This operator applies a user defined sendMsg function to each edge triplet in
the graph and then uses the mergeMsg function to aggregate those messages at their destination vertex.

The operation involves the following:

● sendMsg: EdgeContext[VD, ED, Msg] => Unit
Can send messages to either source or destination, using context (Same as Map in MapReduce)
● mergeMsg: (Msg, Msg) => Msg
All the received messages by a vertex are reduced into one (Same as Reduce in MapReduce)
Returns a tuple of (vertexId, results)

https://spark.apache.org/docs/latest/graphx-programming-guide.html#aggregate-messages-aggregatemessages
Age of the oldest follower of each node
(Scala code)

val oldFollowers: VertexRDD[(Int, Double)] =

graph.aggregateMessages[(Int, Double)](
// sendMessages Max(23, 42) = 42
edge => edge.sendToDst(edge.srcAttr),
//mergeMessages
(a, b) => math.max (a,b)
75
)

http://webprojects.eecs.qmul.ac.uk/ag316/notesSite/BDP_slides/Week7%20%7C%20BigGraphs/ECS640-9-BigGraphs.pdf
Big Data Processing: Week 9
Topic List:

● Graph Applications
● Graph Databases
● Graph Databases with python
● Pregel
● GraphX

End and Quiz

Tom Rose - From The Red Notebook 2nd Edition
75% (4)
Tom Rose - From The Red Notebook 2nd Edition
33 pages
The Passion of An Amateur Card Magician
100% (4)
The Passion of An Amateur Card Magician
557 pages
Matt Mello - Thought Control
No ratings yet
Matt Mello - Thought Control
16 pages
Week 3 v1.1 (hidden) Supervised Learning (Regression)
No ratings yet
Week 3 v1.1 (hidden) Supervised Learning (Regression)
52 pages
ECS765P_W3_Hadoop principles and components
No ratings yet
ECS765P_W3_Hadoop principles and components
47 pages
Week 4 v1.1 (hidden) - Supervised Learning (Classification)
No ratings yet
Week 4 v1.1 (hidden) - Supervised Learning (Classification)
43 pages
ECS765P_W5_Spark Programming
No ratings yet
ECS765P_W5_Spark Programming
43 pages
ECS765P_W4_Introduction to Spark
No ratings yet
ECS765P_W4_Introduction to Spark
39 pages
Note_Wireless Communications for Everybody
No ratings yet
Note_Wireless Communications for Everybody
2 pages
Graph Database
No ratings yet
Graph Database
64 pages
Arm® Compiler For Embedded Errors and Warnings Reference Guide
No ratings yet
Arm® Compiler For Embedded Errors and Warnings Reference Guide
115 pages
Handheld Inkjet Printers
No ratings yet
Handheld Inkjet Printers
23 pages
الاجوبة المرضية عن الاسئلة النحوية للشيخ ابي عبدالله محمد بن اسماعيل الغرناطي (ت 853 هـ) - الرسالة العلمية PDF
No ratings yet
الاجوبة المرضية عن الاسئلة النحوية للشيخ ابي عبدالله محمد بن اسماعيل الغرناطي (ت 853 هـ) - الرسالة العلمية PDF
361 pages
Cloud Computing Lab 2
No ratings yet
Cloud Computing Lab 2
4 pages
ECS726-Week01 Intro
No ratings yet
ECS726-Week01 Intro
70 pages
ECS726-Week05 Cryptographic Protocols Key Management-P
No ratings yet
ECS726-Week05 Cryptographic Protocols Key Management-P
58 pages
ECS726-Week02 Symmetric EncryptionP
No ratings yet
ECS726-Week02 Symmetric EncryptionP
62 pages
ECS726-Week04 - Hash - MAC - Digital Sinatures - Freshness - Dynamic Password Schemes
No ratings yet
ECS726-Week04 - Hash - MAC - Digital Sinatures - Freshness - Dynamic Password Schemes
52 pages
ECS781P-9-Cloud Data Management
No ratings yet
ECS781P-9-Cloud Data Management
79 pages
ECS765P - W10 - Stream Processing
No ratings yet
ECS765P - W10 - Stream Processing
39 pages
ECS765P - W11 - Stream Processing II
No ratings yet
ECS765P - W11 - Stream Processing II
47 pages
W3 Ecs7020p
No ratings yet
W3 Ecs7020p
51 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
ECS765P - W6 - Big Data Ingestion and Storage
No ratings yet
ECS765P - W6 - Big Data Ingestion and Storage
34 pages
ECS781P-11-Edge of The Cloud
No ratings yet
ECS781P-11-Edge of The Cloud
30 pages
Seabat T20-P: High Resolution Multibeam Echosounder
No ratings yet
Seabat T20-P: High Resolution Multibeam Echosounder
2 pages
ECS781P 6 CloudPerformanceSLAs
No ratings yet
ECS781P 6 CloudPerformanceSLAs
39 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
Basic SQL and Etl
No ratings yet
Basic SQL and Etl
29 pages
ECS781P 10 Microservices
No ratings yet
ECS781P 10 Microservices
34 pages
ECS781P-3-Cloud Applications
No ratings yet
ECS781P-3-Cloud Applications
50 pages
Ecs781p 4 Rest
No ratings yet
Ecs781p 4 Rest
47 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
Unit 5-1
No ratings yet
Unit 5-1
6 pages
Magic Pen Script 10-05-19
No ratings yet
Magic Pen Script 10-05-19
4 pages
Fiber Optic & Accessories - Belden
No ratings yet
Fiber Optic & Accessories - Belden
14 pages
Lights Illusions Script 08-26-19
No ratings yet
Lights Illusions Script 08-26-19
6 pages
Sample Thesis Hardware
100% (3)
Sample Thesis Hardware
8 pages
ECS7020P ClassificationExercisesSolutions II
No ratings yet
ECS7020P ClassificationExercisesSolutions II
7 pages
The Hardware Is Divided Into Three Categories
No ratings yet
The Hardware Is Divided Into Three Categories
5 pages
Nif Lect Prelims
No ratings yet
Nif Lect Prelims
13 pages
From The Editor-in-Chief: Turkish Online Journal of Distance Education-TOJDE April ISSN 1302-6488
No ratings yet
From The Editor-in-Chief: Turkish Online Journal of Distance Education-TOJDE April ISSN 1302-6488
180 pages
04 - Solution Fit Gap Delivery Guide - Implementing AX
100% (2)
04 - Solution Fit Gap Delivery Guide - Implementing AX
30 pages
Case Study: Segofer Technical Services: Design and Implementation of An Automated Inventory Management System
No ratings yet
Case Study: Segofer Technical Services: Design and Implementation of An Automated Inventory Management System
149 pages
Graph Database
No ratings yet
Graph Database
92 pages
This Study Resource Was: Simulab Activity 2.1. Voltage and Current Division Principle
100% (1)
This Study Resource Was: Simulab Activity 2.1. Voltage and Current Division Principle
5 pages
Unit 5 Nosql
No ratings yet
Unit 5 Nosql
72 pages
Introduction To GRAPH Database
No ratings yet
Introduction To GRAPH Database
18 pages
Service Technician Workbench: What's New in 2004B
No ratings yet
Service Technician Workbench: What's New in 2004B
53 pages
Cobbett & Jenkin - Indian Clubs
No ratings yet
Cobbett & Jenkin - Indian Clubs
133 pages
Graph Database
No ratings yet
Graph Database
4 pages
NoSQL Database Document
No ratings yet
NoSQL Database Document
5 pages
CS133 Minglanasarge
No ratings yet
CS133 Minglanasarge
2 pages
8 Results and Reports: 8.1 Results of A Calculation
No ratings yet
8 Results and Reports: 8.1 Results of A Calculation
10 pages
216-219, Tesma0802,IJEAST
No ratings yet
216-219, Tesma0802,IJEAST
4 pages
Auto Multiple Choice - en
No ratings yet
Auto Multiple Choice - en
53 pages
Name: Program: Studentid: Group: Q1.Write Method Headers (Not The Bodies) For The Following Methods
No ratings yet
Name: Program: Studentid: Group: Q1.Write Method Headers (Not The Bodies) For The Following Methods
3 pages
Neo4j - Graph Database PDF
No ratings yet
Neo4j - Graph Database PDF
19 pages
Ping Ultrasonic Range Finder
100% (1)
Ping Ultrasonic Range Finder
5 pages
Electrical Attributes
No ratings yet
Electrical Attributes
8 pages
Graph Neo4j
No ratings yet
Graph Neo4j
25 pages
GraphBasedDataScience
No ratings yet
GraphBasedDataScience
37 pages
R23-IDS-Unit4-PPT_2.0
No ratings yet
R23-IDS-Unit4-PPT_2.0
38 pages
Middleware
No ratings yet
Middleware
31 pages
Neo4j Graph Analytics
No ratings yet
Neo4j Graph Analytics
20 pages
Beginnerpresentation 120429104540 Phpapp01[1]
No ratings yet
Beginnerpresentation 120429104540 Phpapp01[1]
30 pages
Extreme Programming: By: Amuno Mazino Student No.:20101126
No ratings yet
Extreme Programming: By: Amuno Mazino Student No.:20101126
29 pages
Secondary Storage Media
No ratings yet
Secondary Storage Media
4 pages
GraphDatabase Lab Practices
No ratings yet
GraphDatabase Lab Practices
18 pages
M1-Assignment-No.2 - Vallejos Rafael
No ratings yet
M1-Assignment-No.2 - Vallejos Rafael
3 pages
8.2.4.12 Packet Tracer - Troubleshooting Enterprise Networks 1 Instructions
No ratings yet
8.2.4.12 Packet Tracer - Troubleshooting Enterprise Networks 1 Instructions
4 pages
DSE
No ratings yet
DSE
19 pages
9 NoSQL Database
No ratings yet
9 NoSQL Database
53 pages
An Introduction to Graph Data Management
No ratings yet
An Introduction to Graph Data Management
39 pages
Mathematics-2_
No ratings yet
Mathematics-2_
10 pages
Graph Databases: Phil Bartie
No ratings yet
Graph Databases: Phil Bartie
83 pages
Prolog CH 3
No ratings yet
Prolog CH 3
47 pages
Introtoneo4jwebinar331 160331235041
No ratings yet
Introtoneo4jwebinar331 160331235041
117 pages
Cs498 Week 11 Slide
No ratings yet
Cs498 Week 11 Slide
145 pages
neo4j_sessio11_graphDataModeling
No ratings yet
neo4j_sessio11_graphDataModeling
68 pages
Tomaž Bratanic - Graph Algorithms For Data Science - With Examples in Neo4j-Manning Publications (2024)
No ratings yet
Tomaž Bratanic - Graph Algorithms For Data Science - With Examples in Neo4j-Manning Publications (2024)
10 pages
Unit - 4
No ratings yet
Unit - 4
22 pages
9 Neo4j
No ratings yet
9 Neo4j
8 pages
09 - Introduction to Graph Data Model
No ratings yet
09 - Introduction to Graph Data Model
22 pages
Online AppQ HR Q1-Q30
No ratings yet
Online AppQ HR Q1-Q30
30 pages
SP200 Brochure
No ratings yet
SP200 Brochure
5 pages
8 - Graph Databases
No ratings yet
8 - Graph Databases
7 pages
Topic 1- Graphs
No ratings yet
Topic 1- Graphs
14 pages
Module 5
No ratings yet
Module 5
26 pages
Experiment X
No ratings yet
Experiment X
1 page
ADO Lecture IX 2023-25
No ratings yet
ADO Lecture IX 2023-25
44 pages
Graph Neo4j
No ratings yet
Graph Neo4j
46 pages
Module 2 Lecture-1
No ratings yet
Module 2 Lecture-1
25 pages
Graph Databases: Immanuel Trummer
No ratings yet
Graph Databases: Immanuel Trummer
38 pages
Graph Database Query Feature
No ratings yet
Graph Database Query Feature
6 pages
EUC1502 Module5 Big-Data
No ratings yet
EUC1502 Module5 Big-Data
46 pages
Analysis of Fraudulent in Graph Database For Identification and Prevention
No ratings yet
Analysis of Fraudulent in Graph Database For Identification and Prevention
8 pages
Graphs-Fundamental-Concepts-and-Applications
No ratings yet
Graphs-Fundamental-Concepts-and-Applications
10 pages
Lecture 8 Graph Databases
No ratings yet
Lecture 8 Graph Databases
77 pages
Graph Databases
No ratings yet
Graph Databases
24 pages
Unit 5 Lecture Notes 5
No ratings yet
Unit 5 Lecture Notes 5
20 pages
Best of Both Worlds - Combine KG and Vector Search For Enhanced RAG - Neo4j
No ratings yet
Best of Both Worlds - Combine KG and Vector Search For Enhanced RAG - Neo4j
40 pages
Neo4j: What's A Graph Database?
No ratings yet
Neo4j: What's A Graph Database?
2 pages
10 Class 2016 Partii (Read-Only)
No ratings yet
10 Class 2016 Partii (Read-Only)
23 pages
Graph in Datastructure
No ratings yet
Graph in Datastructure
34 pages
Social Media IR
No ratings yet
Social Media IR
39 pages
Reversing On The Edge Recon14 Jspelman Jjones PDF
No ratings yet
Reversing On The Edge Recon14 Jspelman Jjones PDF
32 pages
Graph Database-An Overview of Its Applications and Its Types
No ratings yet
Graph Database-An Overview of Its Applications and Its Types
5 pages
CS109 Data Science: Trees, Networks & Databases
No ratings yet
CS109 Data Science: Trees, Networks & Databases
80 pages
Lecture02 GraphDatabases Neo4J PDF
No ratings yet
Lecture02 GraphDatabases Neo4J PDF
95 pages
Implement - Graph Databases
No ratings yet
Implement - Graph Databases
40 pages
Graph Analytics PDF
No ratings yet
Graph Analytics PDF
13 pages
Graph Database - Wikipedia
No ratings yet
Graph Database - Wikipedia
15 pages
Chapter 3. Graph Platforms and Processing: Platform Considerations
No ratings yet
Chapter 3. Graph Platforms and Processing: Platform Considerations
12 pages
Neo4j Fundamentals Summary
No ratings yet
Neo4j Fundamentals Summary
1 page
08 Graph Databases Survey
No ratings yet
08 Graph Databases Survey
7 pages
Graph Databases: Their Power and Limitations
No ratings yet
Graph Databases: Their Power and Limitations
12 pages
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Graph Data Modeling and Analytics with Neo4j: Definitive Reference for Developers and Engineers
From Everand
Graph Data Modeling and Analytics with Neo4j: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Venkat Ankam
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ECS765P - W9 - Large-Scale Graph Processing

Uploaded by

ECS765P - W9 - Large-Scale Graph Processing

Uploaded by

ECS640U/ECS765P Big Data Processing

Large Scale Graph Processing

Credit: Joseph Doyle, Jesus Carrion, Felix Cuadrado, …

In this week, we will focus on Graph Processing

A graph G = (V,E), where

Different types of graphs:

Social media defines interaction networks

Practical applications: Web advertising, click prediction

How quickly will COVID-19 spread on this graph?

Contact tracing and analysis of epidemic spreading

“Needle exchange” networks of drug users [Weeks et al. 2002]

The Power Law in the degree distribution (or popularity)

Scale-Free – degree distribution follows power law

Random – uniform degree distribution

Traditional DBs, NoSQL DBs can store graphs

We need query languages/abstractions suitable for finding relationship patterns

Rise of graph DBs

Provides index-free adjacency

Edges hold most of the important information and connect:

Java-based graph database management system

Property graph model: powerful schema-less way to model graph-based information

Good performance for non-massive datasets

Not distributed – but sharded (partition)

Cypher - Graph-specific query language

Entities – Vertices and Edges

Tags – Entities have type(s)

Properties – Key value pairs attached to entities

SQL: Modelling and Querying a Graph

Imagine using such

Query Language for Neo4j

Declarative and Expressive language

*..5 => any number up to 5

Break and Quiz

Install neo4j python library

For example: https://github.com/neo4j-examples/movies-python-bolt

Case Sensitive Matching Any character Zero or More times

Breadth-First Search (BFS) algorithm (https://en.wikipedia.org/wiki/Breadth-first_search)

We can define the solution to this problem via induction:

Need to keep track of

… Ongoing research efforts in this space

The Pregel framework allows you to write “vertex-centric” code.

● Access to RDDs with the property information

graph.vertices.map(v => v._2._2).collect() //returns (student, postdoc, professor, professor)

The operation involves the following:

val oldFollowers: VertexRDD[(Int, Double)] =

End and Quiz

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.