Distributed Configuration
The distributed configuration consists of 3 files under the config/ directory:
Main topics:
- Replication
- Asynchronous replication mode
- Return distributed configuration at run-time
- Load Balancing
orientdb-server-config.xml
To enable and configure the clustering between nodes, add and enable the OHazelcastPlugin plugin. It is configured as a Server handler. The default configuration is reported below.
File orientdb-server-config.xml:
<handler class="com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin">
<parameters>
<!-- NODE-NAME. IF NOT SET IS AUTO GENERATED THE FIRST TIME THE SERVER RUN -->
<!-- <parameter name="nodeName" value="europe1" /> -->
<parameter name="enabled" value="true" />
<parameter name="configuration.db.default"
value="${ORIENTDB_HOME}/config/default-distributed-db-config.json" />
<parameter name="configuration.hazelcast"
value="${ORIENTDB_HOME}/config/hazelcast.xml" />
</parameters>
</handler>
Where:
Parameter | Description |
---|---|
enabled | To enable or disable the plugin: true to enable it, false to disable it. By default is true |
nodeName | An optional alias identifying the current node within the cluster. When omitted, a default value is generated as node |
configuration.db.default | Path of default distributed database configuration. By default is ${ORIENTDB_HOME}/config/default-distributed-db-config.json |
configuration.hazelcast | Path of Hazelcast configuration file, default is ${ORIENTDB_HOME}/config/hazelcast.xml |
default-distributed-db-config.json
This is the JSON file containing the default configuration for distributed databases. The first time a database run in distributed version this file is copied in the database's folder with name distributed-config.json
. Every time the cluster shape changes the database specific file is changed. To restore distributed database settings, remove the file distributed-config.json
from the database folder, and the default-distributed-db-config.json
file will be used.
Default default-distributed-db-config.json file content:
{
"autoDeploy": true,
"hotAlignment": false,
"executionMode": "undefined",
"readQuorum": 1,
"writeQuorum": 2,
"failureAvailableNodesLessQuorum": false,
"readYourWrites": true,
"servers": {
"*": "master"
},
"clusters": {
"internal": {
},
"index": {
},
"*": {
"servers" : [ "<NEW_NODE>" ]
}
}
}
Where:
Parameter | Description | Default value |
---|---|---|
autoDeploy | Whether to deploy the database to any joining node that does not have it. It can be true or false |
true |
hotAlignment | Whether the synchronization queue is left or not for a node leaving the cluster for hot alignment when the node joins the cluster again. It can be true or false |
false |
executionMode | It can be undefined to let to the client to decide per call execution between synchronous (default) or asynchronous. synchronous forces synchronous mode, and asynchronous forces asynchronous mode |
undefined |
readQuorum | On "read" operation (record read, query and traverse) this is the number of responses to be coherent before sending the response to the client. Set to 1 if you don't want this check at read time | 1 |
writeQuorum | On "write" operation (any write on database) this is the number of responses to be coherent before sending the response to the client. Set to 1 if you don't want this check at write time. Suggested value is N/2+1 where N is the number of replicas. In this way the quorum is reached only if the majority of nodes are coherent | 2 |
failureAvailableNodesLessQuorum | Whether to return error when the available nodes are less then quorum. Can be true or false |
false |
readYourWrites | Whether the write quorum is satisfied only when also the local node responded. This assures current the node can read its writes. Disable it to improve replication performance if such consistency is not important. Can be true or false |
true |
servers | (Since v2.1) Optional, contains the map of server roles in the format server-name : role . * means any server. Available roles are "MASTER" (default) and "REPLICA". For more information look at Server roles |
- |
clusters | if the object containing the clusters' configuration as map cluster-name : cluster-configuration . * means all the clusters and is the cluster's default configuration |
- |
The cluster configuration inherits database configuration, so if you declare "writeQuorum" at database level, all the clusters will inherit that setting unless they define your own. Settings can be:
Parameter | Description | Default value |
---|---|---|
readQuorum | On "read" operation (record read, query and traverse) is the number of responses to be coherent before to send the response to the client. Set to 1 if you don't want this check at read time | 1 |
writeQuorum | On "write" operation (any write on database) is the number of responses to be coherent before to send the response to the client. Set to 1 if you don't want this check at write time. Suggested value is N/2+1 where N is the number of replicas. In this way the quorum is reached only if the majority of nodes are coherent | 2 |
failureAvailableNodesLessQuorum | Decide to return error when the available nodes are less then quorum. Can be true or false |
false |
readYourWrites | The write quorum is satisfied only when also the local node responded. This assure current the node can read its writes. Disable it to improve replication performance if such consistency is not important. Can be true or false |
true |
servers | Is the array of servers where to store the records of cluster | empty for internal and index clusters and [ "<NEW_NODE>" ] for cluster * representing any cluster |
"<NEW_NODE>"
is a special tag that put any new joining node name in the array.
Default configuration
In the default configuration all the record clusters are replicated but internal
, index
, because all the changes remain locally to each node (indexing is per node). Every node that joins the cluster shares all the rest of the clusters ("*" settings). Since "readQuorum" is 1 all the reads are executed on the first available node where the local node is preferred if own the requested record. "writeQuorum" to 2 means that all the changes are in at least 2 nodes. If available nodes are less then 2, no error is given because "failureAvailableNodesLessQuorum" is false.
100% asynchronous writes
By default writeQuorum is 2. This means that it waits and checks the answer from at least 2 nodes before to send the ACK to the client. If you've more then 2 nodes configured, then starting from the 3rd node the response will be managed asynchronously. You could also set this to 1 to have all the writes asynchronous.
hazelcast.xml
A OrientDB cluster is composed by two or more servers that are the nodes of the cluster. All the server nodes that want to be part of the same cluster must to define the same Cluster Group. By default "orientdb" is the group name. Look at the default config/hazelcast.xml configuration file reported below:
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.0.xsd"
xmlns="http://www.hazelcast.com/schema/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<group>
<name>orientdb</name>
<password>orientdb</password>
</group>
<network>
<port auto-increment="true">2434</port>
<join>
<multicast enabled="true">
<multicast-group>235.1.1.1</multicast-group>
<multicast-port>2434</multicast-port>
</multicast>
</join>
</network>
<executor-service>
<pool-size>16</pool-size>
</executor-service>
</hazelcast>
NOTE: Change the name and password of the group to prevent external nodes from joining it!
Network configuration
Automatic discovery in LAN using Multicast
OrientDB by default uses TCP Multicast to discover nodes. This is contained in config/hazelcast.xml file under the network tag. This is the default configuration:
<hazelcast>
...
<network>
<port auto-increment="true">2434</port>
<join>
<multicast enabled="true">
<multicast-group>235.1.1.1</multicast-group>
<multicast-port>2434</multicast-port>
</multicast>
</join>
</network>
...
</hazelcast>
Manual IP
When Multicast is disabled or you prefer to assign Hostnames/IP-addresses manually use the TCP/IP tag in configuration. Pay attention to disable the multicast:
<hazelcast>
...
<network>
<port auto-increment="true">2434</port>
<join>
<multicast enabled="false">
<multicast-group>235.1.1.1</multicast-group>
<multicast-port>2434</multicast-port>
</multicast>
<tcp-ip enabled="true">
<member>europe0:2434</member>
<member>europe1:2434</member>
<member>usa0:2434</member>
<member>asia0:2434</member>
<member>192.168.1.0-7:2434</member>
</tcp-ip>
</join>
</network>
...
</hazelcast>
For more information look at: Hazelcast Config TCP/IP.
Cloud support
Since multicast is disabled on most of the Cloud stacks, you have to change the config/hazelcast.xml configuration file based on the Cloud used.
Amazon EC2
OrientDB supports natively Amazon EC2 through the Hazelcast's Amazon discovery plugin. In order to use it include also the hazelcast-cloud.jar library under the lib/ directory.
<hazelcast>
...
<join>
<multicast enabled="false">
<multicast-group>235.1.1.1</multicast-group>
<multicast-port>2434</multicast-port>
</multicast>
<aws enabled="true">
<access-key>my-access-key</access-key>
<secret-key>my-secret-key</secret-key>
<region>us-west-1</region> <!-- optional, default is us-east-1 -->
<host-header>ec2.amazonaws.com</host-header> <!-- optional, default is ec2.amazonaws.com. If set region
shouldn't be set as it will override this property -->
<security-group-name>hazelcast-sg</security-group-name> <!-- optional -->
<tag-key>type</tag-key> <!-- optional -->
<tag-value>hz-nodes</tag-value> <!-- optional -->
</aws>
</join>
...
</hazelcast>
For more information look at Hazelcast Config Amazon EC2 Auto Discovery.
Other Cloud providers
Uses manual IP like explained in Manual IP.
Asynchronous replication mode
In order to reduce the latency in WAN, the suggested configuration is to set executionMode
to "asynchronous". In asynchronous mode any operation is executed on local node and then replicated. In this mode the client doesn't wait for the quorum across all the servers, but receives the response immediately after the local node answer. Example:
{
"autoDeploy": true,
"hotAlignment": false,
"executionMode": "asynchronous",
"readQuorum": 1,
"writeQuorum": 2,
"failureAvailableNodesLessQuorum": false,
"readYourWrites": true,
"servers": {
"*": "master"
},
"clusters": {
"internal": {
},
"index": {
},
"*": {
"servers" : [ "<NEW_NODE>" ]
}
}
}
Starting from v2.1.6 is possible to catch events of command during asynchronous replication, thanks to the following method of OCommandSQL:
onAsyncReplicationOk()
, to catch the event when the asynchronous replication succeedonAsyncReplicationError()
, to catch the event when the asynchronous replication returns error
Example retrying up to 3 times in case of concurrent modification exception on creation of edges:
g.command( new OCommandSQL("create edge Own from (select from User) to (select from Post)")
.onAsyncReplicationError(new OAsyncReplicationError() {
@Override
public ACTION onAsyncReplicationError(Throwable iException, int iRetry) {
System.err.println("Error, retrying...");
return iException instanceof ONeedRetryException && iRetry<=3 ? ACTION.RETRY : ACTION.IGNORE;
}
})
.onAsyncReplicationError(new OAsyncReplicationOk() {
System.out.println("OK");
}
).execute();
Load Balancing
(Since v2.2) OrientDB allows to do load balancing when you have multiple servers connected in cluster. Below are the available connection strategies:
STICKY
, the default, where the client remains connected to a server until the close of databaseROUND_ROBIN_CONNECT
, at each connect, the client connects to a different server between the available onesROUND_ROBIN_REQUEST
, at each request, the client connects to a different server between the available ones. Pay attention on using this strategy if you're looking for strong consistency. In facts, in case the writeQuorum is minor of the total nodes available, a client could have executed an operation against another server and current operation cannot see updates because wasn't propagated yet.
Once a client is connected to any server node, it retrieves the list of available server nodes. In case the connected server becomes unreachable (crash, network problem, etc.), the client automatically connects to the next available one.
To setup the strategy using the Java Document API:
final ODatabaseDocumentTx db = new ODatabaseDocumentTx("remote:localhost/demo");
db.setProperty(OStorageRemote.PARAM_CONNECTION_STRATEGY, OStorageRemote.CONNECTION_STRATEGY.ROUND_ROBIN_CONNECT);
db.open(user, password);
To setup the strategy using the Java Graph API:
final OrientGraphFactory factory = new OrientGraphFactory("remote:localhost/demo");
factory.setConnectionStrategy(OStorageRemote.CONNECTION_STRATEGY.ROUND_ROBIN_CONNECT);
OrientGraphNoTx graph = factory.getNoTx();
Use multiple addresses
If the server addresses are known, it's good practice to connect the clients to a set of URLs, instead of just one. You can separate hosts/addresses by using a semicolon (;). OrientDB client will try to connect to the addresses in order. Example: remote:server1:2424;server2:8888;server3
.
Use the DNS
Before v2.2, the simplest and most powerful way to achieve load balancing seems to use some hidden (to some) properties of DNS. The trick is to create a TXT record listing the servers.
The format is:
v=opf<version> (s=<hostname[:<port>]> )*
Example of TXT record for domain dbservers.mydomain.com:
v=opf1 s=192.168.0.101:2424 s=192.168.0.133:2424
In this way if you open a database against the URL remote:dbservers.mydomain.com/demo
the OrientDB client library will try to connect to the address 192.168.0.101 port 2424. If the connection fails, then the next address 192.168.0.133: port 2424 is tried.
To enable this feature in Java Client driver set network.binary.loadBalancing.enabled=true
:
java ... -Dnetwork.binary.loadBalancing.enabled=true
or via Java code:
OGlobalConfiguration.NETWORK_BINARY_DNS_LOADBALANCING_ENABLED.setValue(true);
Troubleshooting
Users reported that Hazelcast Health Monitoring could cause problem with a JVM kill (OrientDB uses Hazelcast to manage replication between nodes). By default this setting is OFF, so if you are experiencing this kind of problem assure this is set: hazelcast.health.monitoring.level=OFF
History
v1.7
Simplified configuration by moving. Removed some flags (replication:boolean, now it’s deducted by the presence of “servers” field) and settings now are global (autoDeploy, hotAlignment, offlineMsgQueueSize, readQuorum, writeQuorum, failureAvailableNodesLessQuorum, readYourWrites), but you can overwrite them per-cluster.
For more information look at News in 1.7.
v2.2
Introduced Load balancing at client level. For more information look at load balancing.