Oracle Clusterware
Oracle Clusterware
Oracle Clusterware
Oracle Clusterware
Oracle Clusterware is software that enables servers to operate together as if they are one server. Each
servers look like a standalone server. However, each server has additional processes that communicate
with each other. So here the separate server appears as if they are one server to the application and end
users.
Starting with the version 10g Release 1 Oracle introduced an own portable cluster software Cluster Ready
Services. This product has been renamed in the version 10g Release 2 to Oracle Clusterware, from 11g
Release 2 is part of the Oracle Grid Infrastructure software.
Scalability: multiple nodes allow cluster database to scale by single node database.
Availability: if any nodes failure other nodes in cluster, clients can continue working without any effects.
Manageability: more than one database can be handled by oracle Cluster ware.
Ability to monitor processes and restart them if they stop
Eliminate unplanned downtime due to hardware or software malfunctions
Reduce or eliminate planned downtime for software maintenance
Oracle Cluster Registry (OCR): stores and manages configuration information about the cluster resources
managed by Oracle clusterware such as Oracle RAC databases, database instance, listeners, VIPs, and
servers and applications.
Oracle Local Registry (11gR2): Similar to OCR, introduces in 11gR2 but it only stores information about the
local node. It is not shared by other nodes of cluster and used by OHASd while starting or joining a cluster.
RAC Components
Cluster Synchronization Services (CSS): Manages the cluster configuration by controlling which nodes
are members of the cluster and by notifying members when a node joins or leaves the cluster.
Cluster Ready Services (CRS): The primary program for managing high availability operations within a
cluster. Anything that the crs process manages is known as a cluster resource which could be a
database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on.
The crs process manages cluster resources based on the resource's configuration information that is
stored in the OCR. This includes start, stop, monitor and failover operations. The crs process generates
events when a resource status changes. When you have installed Oracle RAC, crs monitors the Oracle
instance, Listener, and so on, and automatically restarts these components when a failure occurs. By
default, the crs process makes five attempts to restart a resource and then does not make further
restart attempts if the resource does not restart.
Event Management (EVM): A background process that publishes events that crs creates.
Oracle Notification Service (ONS): Allows clusterware events to be (propagate) send to nodes in
cluster, middle tier application servers, clients. EVMD publishes events through ONS.
RACG: Extends clusterware to support Oracle-specific requirements and complex resources. Runs server
callout scripts when FAN events occur.
Process Monitor Daemon (OPROCD): This process is locked in memory to monitor the cluster and
provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the
expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in
Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux platforms.
The list in this section describes the processes that comprise CRS. The list includes components that are processes on
Linux and UNIX operating systems, or services on Windows.
Cluster Ready Services (CRS): The primary program for managing high availability operations in a
cluster. The CRS daemon (crsd) manages cluster resources based on the configuration information that
is stored in OCR for each resource. This includes start, stop, monitor, and failover operations. The crsd
process generates events when the status of a resource changes. When you have Oracle RAC installed,
the crsd process monitors the Oracle database instance, listener, and so on, and automatically restarts
these components when a failure occurs.
Cluster Synchronization Services (CSS): Manages the cluster configuration by controlling which nodes
are members of the cluster and by notifying members when a node joins or leaves the cluster
The cssdagent process monitors the cluster and provides I/O fencing. This service formerly was
provided by Oracle Process Monitor Daemon (oprocd), also known as OraFenceService on Windows. A
cssdagent failure may result in Oracle Clusterware restarting the node.
Oracle ASM: Provides disk management for Oracle Clusterware and Oracle Database.
Cluster Time Synchronization Service (CTSS): Provides time management in a cluster for Oracle
Clusterware.
Event Management (EVM): A background process that publishes events that Oracle Clusterware
creates.
Oracle Notification Service (ONS): A publish and subscribe service for communicating Fast Application
Notification (FAN) events.
Oracle Agent (oraagent): Extends clusterware to support Oracle-specific requirements and complex
resources. This process runs server callout scripts when FAN events occur. This process was known as
RACG in Oracle Clusterware 11g release 1 (11.1).
Oracle Root Agent (orarootagent): A specialized oraagent process that helps crsd manage resources
owned by root, such as the network, and the Grid virtual IP address.
-
This section describes the processes that comprise the Oracle High Availability Services stack. The list
includes components that are processes on Linux and UNIX operating systems, or services on Windows.
Cluster Logger Service (ologgerd): Receives information from all the nodes in the cluster and persists in
a CHM repository-based database. This service runs on only two nodes in a cluster.
System Monitor Service (osysmond): The monitoring and operating system metric collection service
that sends the data to the cluster logger service. This service runs on every node in a cluster.
Grid Plug and Play (GPNPD): Provides access to the Grid Plug and Play profile, and coordinates updates
to the profile among the nodes of the cluster to ensure that all of the nodes have the most recent
profile.
Grid Interprocess Communication (GIPC): A support daemon that enables Redundant Interconnect
Usage.
Multicast Domain Name Service (mDNS): Used by Grid Plug and Play to locate profiles in the cluster,
as well as by GNS to perform name resolution. The mDNS process is a background process on Linux and
UNIX and on Windows.
Oracle Grid Naming Service (GNS): Handles requests sent by external DNS servers, performing name
resolution for names defined by the cluster.
Cluster Interconnects
It is the communication path used by the cluster for the synchronization of resources and it is also used in
some cases for transfer of data from one instance to another. Typically, the interconnect is a network
connections that is dedicated to the server nodes of a cluster (thus is sometimes referred as private
interconnect)
Ahmed Fathi - Senior Oracle Consultant
P ag e |5
Email: ahmedf.dba@gmail.com Blog: http://ahfathi.blogspot.com LinkedIn: http://linkedin.com/in/ahmedfathieg
LMS
LMD
LMON
LCK0
DIAG
LMS- Lock Manager Server Process is used in Cache Fusion. It enables consistent copies of blocks to be
transferred from a holding instance's buffer cache to a requesting instance buffer cache without a disk
write under certain conditions.
It rollbacks any uncommitted transactions for any blocks that are being requested for a consistent read by
the remote instance.
Global Enqueue Service Daemon (LMD)
LMD-Lock Manager Daemon process manages Enqueue service requests for GCS. It also handles deadlock
detection and remote resource requests.
Global Enqueue Service Monitor (LMON)
The LCK0 process manages non-Cache Fusion resource requests such as library and row cache requests.
Diagnosability Daemon (DIAG)
This background process monitors the health of the instance and captures diagnostic data about process
failures within instances. The operation of this daemon is automated and updates an alert log file to record
the activity that it performs.
Clusterware and heartbeat mechanism
Cluster needs to know who is a member at all times. Oracle cluster has Two (02) types of heartbeats:
1. Network heartbeat
-
Each node of a cluster writes a disk heartbeat to voting disk every second
Node evicts from cluster if no heartbeat is updated within I/O (MissCount/Disktimeout) timeout.
This mechanism has been changed in version 11.2.0.2 (first 11g Release 2 patch set). After deciding which
node to evict, the Clusterware:
- attempts to shut down all Oracle resources/processes on the server (especially processes generating
I/Os)
- will stop itself on the node
- Afterwards Oracle High Availability Service Daemon (OHASD)5 will try to start the Cluster Ready
Services (CRS) stack again. Once the cluster interconnect is back online, all relevant cluster resources
on that node will automatically start
- Kill the node if stop of resources or processes generating I/O is not possible (hanging in kernel
mode, I/O path, etc.)
Generally Oracle Clusterware uses two rules to choose which nodes should leave the cluster to assure the
cluster integrity:
- In configurations with two nodes, node with the lowest ID will survive (first node that joined the
cluster), the other one will be asked to leave the cluster
- With more cluster nodes, the Clusterware will try to keep the largest sub-cluster Running
When node does reboots?
-
Split-Brain scenario
The term "Split-Brain" is often used to describe the scenario when two or more co-operating processes in a
distributed system, typically a high availability cluster, lose connectivity with one another but then continue
to operate independently of each other, including acquiring logical or physical resources, under the
incorrect assumption that the other process(es) are no longer operational or using the said resources.
Advantageous by preventing applications from Waiting for TCP/IP timeouts when a node fails, Trying to
connect to currently down database service and Processing data received from failed node.
And can be notified using Server side callouts, Fast Connection Failover (FCF), ONS API
Why Use Virtual IP?
The goal is application availability.
When a node fails, the VIP associated with it is automatically failed over to some other node. When this
occurs the following thing happens:
-
Without using VIP, clients connected to a node that died will often wait for TCP-IP timeout period (which
can be up to 10 minutes) before getting an error. As a result you dont have really good High Availability
solution without using VIP.
Connecting with Public IP Scenario