1. The document discusses Project Geode, an open source distributed in-memory database for big data applications. It provides scale-out performance, consistent operations across nodes, high availability, powerful developer features, and easy administration of distributed nodes.
2. The document outlines Geode's architecture and roadmap. It also discusses why the project is being open sourced under Apache and describes some key use cases and customers of Geode.
3. The presentation includes a demo of Geode's capabilities including partitioning, queries, indexing, colocation, and transactions.
29. 29Pivotal Confidential–Internal Use Only
Geode Transactions
Across multiple Entries and Regions
Full ACID
Isolation level: Repeatable Read
JTA
– Last Resource
– Provider
Optimistic, conflict detection rather than locks
Faster than doing individual operations
Ability to suspend and resume
Work on Colocated data
30. 30Pivotal Confidential–Internal Use Only
Usage
TransactionManager provides methods to begin, commit, rollback, suspend, resume.
E.g.
– TransactionManager txMgr = cache.getTransactionManager();
– txMgr.begin();
– Region1.put(k1, v1)
– Region2.get(k2)
– Region2.put(k2, v2)
– txMgr.commit();
Single entry operations supported via ConcurrentMap methods
– putIfAbsent(K, V)
– replace(K, V, V)
– remove(K, V)
31. 31Pivotal Confidential–Internal Use Only
Implementation
Repeatable Read ThreadLocal
At commit()
– Grab a d-lock on key set. (tx with different key set can still execute concurrently)
– Conflict detection Reference checks
– Send the commit set to all replicas – no ack
– Send a commit message
– Recipients apply the commit only on getting the second message and keep track of last few transactions
Failure Scenarios
– Replica fails No problem, it will do a GII operation when it starts up again
– Coordinator fails Replicas gossip to arrive at the outcome of the transaction
– If no member has commit message, some members may be missing commit set, abort transaction
– If at-least one member has commit message, all members have commit set, apply transaction
#5: Explosion of user population, bursty nature of load, at a reasonable cost.
Rising expectations of uptime
Disruptive innovators
#6:
For an enterprise perspective on real time data: https://www.gesoftware.com/blog/can-memory-storage-solve-one-datas-greatest-challenges
First, let’s talk about consumer driven applications. Applications today are really about serving this market. Historically, enterprise applications were truly the focus, where the interactions were expected to be that… interactive. With consumer driven applications, you are really pushing the limits of instantaneous information to users, even going so far as to be predictive with what may be useful or appealing to them, to provide “value” to that consumer. What makes you different than your competitors in driving consumers to you is more about what extras you provide than the base service.
Consumer driven applications until recently were not so data driven. Data flowed through them, but they didn’t provide information back, except in a pull fashion.
Geode is the distributed NoSQL, in memory database for big data apps that need:
Scale out performance – as demand goes up and down, due to seasonal demand, flash types of events, or increasing data pipes, Geode can scale up and down with commodity hardware, providing predictable, linear scalability, without downtime.
Consistent database operations across globally distributed nodes: Geode focuses on data being consistent. This has been our bent from the beginning. The only thing we will sacrifice performance for is consistency. This is key to our customers that sell things like train tickets and stocks.
High availability, resilience, and global scale – Geode is intended for mission critical data and applications. Through a series of innovations, Geode can provide continuous availability of data at a global scale, through code changes, model changes, hardware changes, major version upgrades and smoking hole disasters
Powerful developer features – Even though Geode is pure Java, it is accessible through many APIs. It also provides a rich event framework, allowing applications to subscribe to individual data events in Geode
Easy administration of distributed nodes: Easy administration of distributed nodes: Geode relies on you to say how to evenly distribute your data from a statistical perspective and takes care of the runtime implementation. Geode manages partitions of data between nodes in a dynamic fashion, removing the chore of mapping partitions to specific nodes, slaves, and masters in a cluster. Geode manages that for you, along with recovery of nodes. Additionally, Geode provides tools to help you manage and understand the behavior of the system.
#7: Please see the case study here. Talk about what they did first. Then go over the key points in this slide
http://blog.pivotal.io/pivotal/case-studies-2/china-railway-corp-for-chinese-new-year-chunyun
Key stats:
Query time reduced from 15 seconds to .2 sec – a 75x improvement (other customers have seen up to 100x)
Queries went from 3500 per second to 10s of 1000s per second
Train tickets to major cities sold out in 20 seconds
Growth is about 20% per year
Emphasize “room to grow” in this quote.
Why was this so successful? A few things:
CRC used a partitioned data set in memory. What this gave them was:
In memory performance. Data is “persisted” to at least 2 nodes in memory, rather than to disk, saving IO cost
Ability to scale up and down as needed, maximizing infrastructure usage. They don’t have to keep expensive dedicated hardware in place to accommodate peak usage
Optimized data distribution – CRC defines how the data should be partitioned. Geode takes care of the physical implementation of this actively, optimizing the distribution of data as machines are added and removed.
Geode is a shared nothing, in memory architecture. Data can be persisted to disk or another system, but it is out of band with the transaction, using the network to “persist” the data. This allows Geode to perform at maximum speeds with consistency. As more nodes are added, Geode can be notified to take advantage of these new nodes based on current resource usage. This gives us predictable linear performance as we scale up and down to meet peaky demands, while keeping data absolutely consistent – we can’t sell the same train ticket twice!
Colocation of related data sets allows us to scale transactions to hundreds of thousands of concurrent transactions running in a single cluster
As we distribute this data, our ability to operate on it is also scaled out. Queries, events, and functions that the grid operates on are all done in a distributed, parallel fashion.
#8: Read case study here:
http://www.pivotal.io/sites/default/files/Pivotal_vFabric_CS_Newedge_061213__0.pdf
Quote:
Group CIO, Alain Courbebaisse, says, “We have successfully implemented the most advanced post-trading platform of the clearing industry.”
As well as the strategic alignment, the NVision program resulted in a number of further, more tangible objectives being met:
Support for higher clearing volumes and create a platform that is horizontally scalable as more markets are connected to it
Reduced time to market for new market adapters
Global cache provides single version of the truth and removes synchronization issues created by latency sensitive global flows
Creation of a globally consistent trade flow across regions and exchanges, serving the derivative and equity business seamlessly
SIGNIFICANT IMPROVEMENT IN CLIENT ON-BOARDING TIME AND CLIENT SERVICE
Faster resolution window (investigate, resolve and re-submit)
Able to resolve reference data issues once and propagate out
Replay capability removes manual re-keying
Thin about this – a trading platform. Operating in markets all over the world. A few things:
These have to be FAST to minimize risk – trades need to be executed as much as they can be in real time in order to make sure that the actual price was as close to the agreed to price as possible. The longer the delay, the more the possibility of drift between these prices, and the more risk is introduced.
The data must be consistent.
Geode provides
Performance optimized persistence – “first line persistence” can be to other nodes in memory, making the speed of persistence as fast as the network will allow it to be. Disk can still be used for long term storage, but your transaction in Geode is ACID compliant in memory. This takes disk i/o out of the critical path of the transaction, but still allows it to take place.
Consistency can be configured to meet your performance and data requirements. If you care less about accuracy and more about response time, you can configure data sets to those characteristics.
Distributed queries and regional functions – as queries and function calls are sent to the grid, their path is optimized as much as possible. What this means is that when queries or functions are sent to the grid, they are sent in parallel, and they are optimized to route directly to nodes that hold the data. When you query the grid or execute functions on it, you can be assured that your client will get a consistent overall view of the data. Queries and logic are distributed for optimal performance, but always lean towards consistency.
Indexing, triggers and event notifications are provided to react in real time to data as it is coming into the system
Data can also be configured to propagate to other geographically dispersed clusters, to allow you to operate on the data as close to the “action as possible” – minimize the distance data has to travel.
---------
We support persistence and make sure updates are consistent. We use replication for higher performance Geode uses the concept of having redundant copies in memory to make the data more available. What this means, is that instead of writing the data to disk to persist it, I will write it to one or more other nodes. If I am doing this on a reasonably fast network, I can “persist” the data in less than 1 ms. Data can be configured to write to local attached disk (shared nothing) or some other data source at the appropriate time via a rich event framework
Data can also be configured to propagate to other geographically dispersed clusters, to allow you to operate on the data as close to the “action as possible” – minimize the distance data has to travel.
#9: Read case study here:
http://blog.pivotal.io/pivotal/case-studies-2/how-argentina-pays-its-bills-19-million-cash-transactions-a-month-on-unreliable-networks-with-pivotal-Geode-and-spring
(Rapipago is their most well known brand)
With over 2,600 branches, 4,000 kiosk-based points of sale, and a huge call center, Rapipago is part of GIRE’s business of providing billing, collection, payment, and transaction processing services.
To put it simply, consumers visit our locations to pay their bills. Rapipago supports payments between 1200+ companies and their consumers—around 19 million transactions per month. Rapipago’s card, check, and cash-based transactions ultimately collect money on behalf of cell phone companies, automotive, banking, energy, gas, water, insurance, cable, credit cards, schools, municipalities, tourism operators, and more.
The biggest problem Geode has helped us solve has to do with unreliable network connectivity. This limited our ability to report on the business operations and take certain actions. In our network, we collect money from 6 AM until 9 PM, depending on the region. Before Geode, we had to wait until the next day to have visibility into how our network is collecting money. The previous system would process batch files each night. Our manaGEOent team could only see transactions after a day’s time passed. In addition, many locations have unreliable network connections that make it harder to get a current view of the information.
With Geode, we can see the information in real-time, even if there is an unreliable network. The data is synced 24 x 7 as the network allows. Now, we have a much more accurate view of the cash at each location. From an operations perspective, having a better picture of each payment point throughout the day means we can now decide to do things like send an armed vehicle to take cash off the street when large sums of money start to accumulate.
In each Kiosk, transactions are captured in a local Geode instance. Geode also places the transactions in a shared branch region. This way, we can share information between kiosks within a branch.
In each branch, we have a Geode peer-to-peer topology set up—a branch’s kiosks are part of a distributed system. Before Geode, business rules were on the server side, and the system had to be online for the rules to run. With Geode’s P2P topology, we have those rules running in each kiosk, and they can be executed when the server is offline.
The kiosks also place information on the Geode WAN gateway to synchronize information to the central data center. With the WAN gateway, a returning network connection allows us to synchronize with the data center’s master database. When we don’t have internet connection to our central datacenter, we store all the transactions in the gateway’s queue. This is how we get a near real-time view of the entire network in a central place.
Can you explain more about how Geode handles the WAN synchronization?
Yes. This is the key function that allows us to have up to date information and deal with lost network connections. We use the WAN gateway to synchronize transactions between kiosks and our data center. Geode’s WAN gateway allows us to loosely couple multiple, independent Geode systems. So, each of our kiosks has it’s own Geode instance and region data is shared with the central Geode instance via the WAN gateway. If communications between sites fail or become slow, the systems still run independently, and persistent queues operate for messaging between sites.
Within a cluster, data can be made resilient to failures. This means that nodes can come and go without data loss. Additionally, data can be persisted to local disk as an added measure. If servers fail within a cluster, it is transparent to the client. Data is kept consistent, and the connection is automatically routed to an available node.
Cluster to cluster connectivity gives the data a way to survive smoking hole failures as well. The data is queued up to write asynchronously to other clusters. These queues can also be written to local disk to avoid data loss of queues in memory, in case a data center loses connectivity. The data is saved in the local cluster safely until network connectivity is restored. Gire specifically uses this to handle their unreliable network issues.
#10: Geode is written in 100% Java, but has access via several other languages. Before you say “but Java is slow”, keep in mind, some of the fastest trading platforms in the world utilize Geode at their core.
Geode has many ways to interact with it.
Web applications can be made more performant without any code changes by leveraging Geode as an HTTP SSC or Hibernate L2 cache without any changes to code. Just configuration. Many customers have seen significant performance improvements just by plugging these features into their existing applications. Additionally, Geode supports a Memcache API that can be used to enhance applications based on any of the 70 or so memcache clients where it just isn’t scaling to what is needed. This can be a nice transitional step towards Geode without major code changes.
Geode provides a RICH set of application functions that act as more than just a “driver” to Geode. When a client connects to a grid, that client can
Send functions to the grid and have the grid route the function to the appropriate nodes for execution, aggregate the results, and send them back to the client, without the client needing to be aware of being attached to a distributed grid.
Query the grid via OQL (a standard from the OMG – Object ManaGEOent Group)
Clients can subscribe to events on the server side, or listen for data matching particular criteria, without polling Geode. Geode’s extensive event framework provides client side event listeners as well. Clients can rely on data coming from the server, even if the client temporarily disconnects. The queues holding data going to clients can be made just as reliable as data in the grid
Developers of the grid themselves have access to a rich event framework that allows for real time monitoring and reaction to data as it flows through the system and passes transactional and physical boundaries.
3. There are native clients available for Java, C++, and c#
4. Being a k/v store, Geode accommodates user defined objects that are arbitrarily complex (as long as they are serializable). Additionally, Geode nateively supports JSON documents.
5. Geode allows versioning of object schemas in the grid, without having to restart the grid, or change client applications using old versions of the schema. Geode detects these changes, and manages the different version for you.
6. Additionally, when you are interacting with Geode, you can use the java hashmap interface or Spring data Geode. Serialization APIs are also available for extraordinary performance and memory tuning.
#11: Deploying data to Geode is done via “configuration” of the data. . You specify how the data should be partitioned from a logical perspective. When this is deployed to the grid, Geode manages the specific physical implementation of this configuration. This means that while keeping the data consistent, it manages where the data lives for you, making the most of the resources you have available in the cluster.
A dashboard is provided to monitor the most critical aspects of your grid. Additionally, JMX APIs are available for integration into most enterprise monitoring environments, keeping monitoring costs down.
For historical root cause analysis, Geode comes with Visual Statistics Display (VSD) to allow you to correlate Application down to OS and hardware stats to performance tune the cluster for optimal throughput.
A rich CLI, gfsh, provides automation of the system via scripting.
#12: Geode can be run in a number of different topologies, providing flexibility for a number of deployments.
In it’s most basic form, Geode can be used embedded in a process, like a web server. This provides performant, transactional data, however, it doesn’t provide scale or high availability.
Moving to a peer to peer model, we can achieve scale across an application. This is generally for a data set that is very application specific – frequently used content in a custom web app for example, or HTTP SSC.
When our data becomes relevant across more than one application, it makes sense to deploy in a client server type of architecture. With Geode, data can be deployed to a dedicated, individually scalable and configurable grid. Client applications, such as web apps, dashboards, and alerting, can be clients of the data in the grid. Client can choose to interact with the grid in a pull fashion, or have the grid send updates in a push fashion, meaning real time dashboards are updated as new data is arriving, not after some polling interval has passed.
Independent Geode grids can also be kept in sync with each other via the WAN gateway, or async event listeners. Different distributed systems can either have exact copies of data, or some other version of the data, such as an aggregate.
Which brings us to geo-distributed. The WAN gateway can be used to distribute data reliably across the globe, getting data to where it needs to be for DR, reporting, or simply to be closer to the action (like with trading). Think about stock exchanges. Where the data lives matters. You do not want to have to do a global round trip to make your trade. You are constrained by the speed of light. Also to consider is the fact that you don’t always want certain data to cross international boundaries. We can put these rules in place at the boundaries between clusters.
#15: Reads are completely network-bound in theses runs due to the 1 gbit network used.
#18: There are a lot of different ways to partition data in SQLFire, by default SQLFire will try to evenly distribute data at random across all servers If that's not good enough you can exert a lot of control over how data is divided and distributed using list, range or expression based partitioning.
#19: There are a lot of different ways to partition data in SQLFire, by default SQLFire will try to evenly distribute data at random across all servers If that's not good enough you can exert a lot of control over how data is divided and distributed using list, range or expression based partitioning.
#20: There are a lot of different ways to partition data in SQLFire, by default SQLFire will try to evenly distribute data at random across all servers If that's not good enough you can exert a lot of control over how data is divided and distributed using list, range or expression based partitioning.
#21: There are a lot of different ways to partition data in SQLFire, by default SQLFire will try to evenly distribute data at random across all servers If that's not good enough you can exert a lot of control over how data is divided and distributed using list, range or expression based partitioning.
#22: There are a lot of different ways to partition data in SQLFire, by default SQLFire will try to evenly distribute data at random across all servers If that's not good enough you can exert a lot of control over how data is divided and distributed using list, range or expression based partitioning.
#36: Region on servers is broken up to buckets, and buckets are assigned to nodes. Key is hashed to a bucket.
#37: Replication is synchonous, key is locked. This is what gives consistency between the nodes.
#56: GemFire is network-bound or would be even faster. Cassandra is not.
Latency has similar ratios, with Cassandra having 3.5-4.5X higher latency.
This run was for 16 serevers with 8 client nodes running 400 client against 0.5 TB data (1K object size) with redundancy for total of 1 TB data.
GemFire also 2X faster for the load phase, with both GemFire and Cassandra doing batched inserts (500 objects per insert).
#57: Reads are completely network-bound in theses runs due to the 1 gbit network used.