The Exa-guard project

The Exa-guard project

Ok I cannot name the customer (Approval pending) but a very exciting project just came to an end.

You might be aware that I currently have a focus on exadata and high availability and sometimes nice projects come across your way.

The title is an invented one, as it was a combination of looooots of exadata stuff and dataguard. And admit, it just sounds cool, isn’t it? 😉

The beginning

The initial question was: “Can we expand our X3-2 non-prod exadata systems so we can put the disk groups in high redundancy to practice rolling cell patching”.
The answer is always: YES … but …

This customer had an X5 elastic rack (4x X5 computes 3x X5 storage) bare metal as a production system, an X3-2 elastic rack (4 computes: 2x X3 and 2x X5 and 3x X3 storage) and as DR a Half rack X3-2.
Yup this customer has all the databases on Exadata. It was a strategic choice but it turns out to be a very good one.

You might remember this blogpost about the EOL (end of life) dates from systems. Agreed, I need to update it, but that was the trigger to start this project.
Thing is, the X3-2’s were nearing their end-of-life date and the question was put back to “Are you going to do a big project, which puts you into a sustainable position for the future or just add storage for about the same price (less effort) but postpone the big jump”.
Well the answer was, we go for it and take the big jump.

The new environment

So, the big jump huh. Ok good. That means buying a new X7 system. First remark “yes but the X5’s are still good, right?” And yes they are. So a new requirement was born. Reuse as much as possible of the X5 components for non-prod and non-critical DR. Oh yes … did I tell already that we had a time limit? The datacenter will be taken down and it a new one is being build as we speak. An extra requirement was added “just for your interest, but we will combine this migration with the datacenter move, that should be doable, right?”. As always, the answer is … yes. So, the final picture.

A new X7-2 QR (in phase 1) for production, yes people, we did the prod move first … the X5-2 Elastic was reduced to a QR and moved to the new datacenter (i will call it the DR datacenter ) and reinstalled. The 2 X5 computes moved to the X7-2 QR which became an elastic one. These two X5 nodes will be used to protect the important non-prod databases with data guard.
Then Two compute nodes needed to move from the “old” datacenter to the DR datacenter. These are placed in the X5 QR to turn it into an Elastic rack against and then the physical moves are done.

You see, it is a little complex, but splitting this in phases, gives a maximum of availability and a maximum of reusing the components in new roles, read saving costs.

Good, this was the drawing board, lets face execution

The runbook

Sorry people, my guide in this journey was not apex, but a simple excel. We had around 50 prod Container Db’s to migrate and around 100-ish non-prod container dbs to migrate.

Doing all this by hand is nearly impossible, but the cool thing is … you can script dataguard! So that’s what has been done. That way it was easy to setup all data guards using batches. The production databases were first in line. So a data guard was setup from the Old X5 nodes to the new X7 nodes and the existing data guard towards the old datacenter with the X3’s was kept in place. So one primary, two standby’s. After that has been done in a controlled way, all was switched over to the new system and we kept it running and monitoring for about a week. No major issues found. After that, we could break the production dataguard to the X5 system on the same site and the X5 could be powered off and shipped to the new datacenter. The node 3 and 4 stayed there and were moved to the X7 system and prepared already to be the standby nodes for the non-production systems (which run in the other datacenter).

The X5 QR was reinstalled with OVM on exadata, a certain amount of clusters created to meet the customer requirements and dataguard was setup between the primary datacenter (X7) and the new datacenter (X5). After a stabilisation phase with the prod-db’s replicating to 2 sites, the final step was to disable the dataguard towards the old datacenter. This freed up the X5 nodes inside the X3-2 Elastic config and those nodes were shipped to the X5 in the new datacenter.
That X5 received a storage expansion with 3 X7 cells in the meantime to be able to hosts all non-prod db’s. So another extension was done (moving the X5 nodes from the old DC to the new one) and the final non-prod system looks like 4x X5 compute nodes, 3x X5 storage cells and 3x X7 storage cells. You can guess it, we did not mix generations of cells, each set of 3 cells hosts the diskgroups for one cluster. Yup, only 2 clusters needed, prod standby and non-prod.

Cleaning out the old datacenter followed the same approach. Generate a lot of config files, startup a lot of batches and duplicate all the databases towards the new datacenter. After a few days of rman duplicate saturating the line between the datacenters, we ended up with all non-prod db’s being replicated. For that one, we decided to do a big-bang approach on Friday evening. So scripts were created to launch batches in the background to automate the switchover and after one hour (< 60 seconds per database on average) all non-prod databases were running in the new datacenter on the virtualised exadata. We left it running like this for two weeks and in the meantime standby databases were created to the X5 nodes in the X7 rack on the primary site to protect the most important non-prod db’s.
After two weeks, no issues, literally no issues popped up and the go-/nogo- meeting was short. The Dataguard to the X3-2 Half rack was teared down and the X3-2 racks could be decommissioned.

Conclusion

Yes, it was a lot of work, a lot of meetings, a lot of follow-up but no over-complicated steps were taken.

The simplicity of Data guard in combination with the flexibility of Exadata offers made it possible to deliver this project on time. For me personally it was a great project that combined my (currently) 2 favourite technologies.
Does it sound impressive? Maybe, … is that important, not really. The customer made a datacenter migration with one hour downtime on non-prod systems and some minutes for the production system (to move it to the new system).
This resulted in a happy customer and in the end, that is why we do this job, right?

As always, questions, remarks? find me on twitter @vanpupi


Comments are closed.
%d bloggers like this: