michelangelus - Fotolia

Tip

Migrate an Oracle RAC database to new servers with zero downtime

Most organizations can't afford any downtime on their Oracle systems. Here's how to move RAC databases to new server nodes and still stay up and running in an Oracle cluster.

I work with an Oracle RAC database in which some physical servers reached their end of life. We needed to replace the hardware with new servers, and we wanted to move the database with zero downtime for our users, which can, in fact, be done.

To show how, I'm going to walk you through an example where I replace existing server nodes 1 and 2 with new nodes that will support new instances 3 and 4 in the Oracle Real Application Clusters (RAC) environment. At the end of this task, instances 1 and 2 will be removed.

The first step is to set up the new nodes with all the network bandwidth and shared disk space needed to support and access the Oracle RAC database. Next, the required administrative users and groups are created on the new nodes, as well as Secure Socket Shell keys so that we can use passwordless SSH to administer the servers.

Once the servers are fully set up, I run Oracle's Cluster Verification Utility (CVU) from one of the existing nodes to verify that we won't have issues, such as a bad cluster interconnect configuration or the wrong operating system libraries.

cluvfy stage -pre nodeadd -n new_node1,new_node2 -verbose

If the CVU finds any issues, we need to address them before proceeding. When everything checks out, I add the new nodes to the cluster. To do so, I enter the following from an existing node:

cd $GRID_HOME/addnode

./addnode.sh -silent "CLUSTER_NEW_NODES={new_node1,new_node2}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={new_node1-vip,new_node2-vip}"

This prompts me to run some scripts as root upon completion. Once the new nodes are part of the cluster, I extend the database software to them. Again, I enter the following from an existing node:

cd $ORACLE_HOME/addnode

./addnode.sh -silent "CLUSTER_NEW_NODES={new_node1,new_node2}"

However, before I can start any Oracle RAC database instances on the new nodes, I need to add redo log threads. I can do so in the embedded SQL*Plus tool by issuing commands such as:

alter database add logfile thread 3 group 30 '/directory/redo30a.log' size 5g;

The above adds an online redo log group for thread 3, which can be used by one of the new nodes. I create enough groups for that thread, normally at least three of them, and then add similar groups to thread 4 for use by the other new node. Once I have all the online redo log groups created, they need to be enabled.

alter database enable thread 3;

alter database enable thread 4;

Next, I create undo tablespaces for the new nodes to store information that's needed to roll back future changes to the database.

create undo tablespace undotbs03 datafile '/directory/undotbs03.dbf' size 30g;

create undo tablespace undotbs04 datafile '/directory/undotbs03.dbf' size 30g;

Then, I need to add entries to our parameter file to support each of the new database instances. First, I enter:

alter system set thread=3 sid='orcl3' scope=spfile;

alter system set instance_number=3 sid='orcl3' scope=spfile;

alter system set undo_tablespace='UNDOTBS03' sid='orcl3' scope=spfile;

The commands above are repeated for the fourth instance. In each new node, I also have to make sure that the $ORACLE_HOME/dbs directory has soft links to the database's server parameter file (i.e., SPFILE) and password file on shared storage.

Once all of that is done, I'm now ready to start up the new instances on the new nodes.

srvctl add instance -db orcl -instance orcl3 -node new_node1

srvctl start instance -db orcl -instance orcl3

Again, I repeat those commands for instance 4. We now have the Oracle RAC database running on both the old and new hardware.

I always prefer to use Oracle's services facility for application connectivity and workload management. The next step is to modify our existing service to run on the new nodes, but not the old ones.

srvctl modify service -db orcl -service service_name -modifyconfig -preferred "new_node1,new_node2"

srvctl start service -db orcl -service service_name -node new_node1

srvctl start service -db orcl -service service_name -node new_node2

Application connections are now made to the new nodes. I then stop the service from running on the old nodes.

srvctl stop service -db orcl -service service_name -node old_node1

srvctl stop service -db orcl -service service_name -node old_node2

However, stopping the service won't affect the existing connections. As such, I now must wait for all the users to disconnect in their own due time from the old nodes. Once there are no active connections to instances 1 and 2, I can shut them down and remove them.

srvctl stop instance -db orcl -instance orcl1 -o immediate

srvctl remove instance -db orcl -instance orcl1

I repeat the above for instance 2. Next, I remove the old nodes from the cluster by entering these commands on them in turn:

cd $ORACLE_HOME/oui/bin

./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={old_node1}" -local

Next, we let the inventory directory on the new nodes know they are the only ones left in the cluster for hosting the database software. On one of the new nodes, I enter:

cd $ORACLE_HOME/oui/bin

./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME "CLUSTER_NODES={new_node1,new_node2}" LOCAL_NODE=this_nodes_name

With the database software no longer part of the inventory directory on the existing servers, it's time to remove the Oracle Grid Infrastructure software, as well. On the nodes to be removed, I enter the following:

cd $GRID_HOME/bin

./crsctl stop crs

cd $GRID_HOME/oui/bin

./runInstaller -updateNodeList ORACLE_HOME=$GRID_HOME "CLUSTER_NODES={old_node1}" CRS=TRUE -silent -local

Again, we let the inventory on the new nodes know they are the only ones left in the cluster for Oracle Grid Infrastructure. I enter this on one of the new nodes:

cd $GRID_HOME/oui/bin

./runInstaller -updateNodeList ORACLE_HOME=$GRID_HOME "CLUSTER_NODES={ new_node1,new_node2}" CRS=TRUE -silent

That's all there is to it. We were able to migrate an Oracle RAC database with zero downtime. We may have needed to wait a period of time before all the existing user sessions were disconnected from the old instances, but our end users never noticed anything. 

Next Steps

Balance RAC database workloads

Oracle RAC vs. storage area networks

Spot the differences between RAC and Data Guard

Dig Deeper on Oracle database administration