Browsed by
Category: performance

The first performance related impressions of the new ODA X6-2M

The first performance related impressions of the new ODA X6-2M

New toys are always fun! When Oracle announced their “Small” ODA’s in the X6-2 generation, we were excited to test them. We were not the only ones, so it took a while before getting one, but the first week of january, it was playtime. An ODA X6-2M was delivered to our demoroom and testing could begin.

Normally I would start a blog post by “how to install” it. Actually this is very simple and very well documented. If you want me to blog about it as well, just let me know.

The nice thing about the database appliance is, that in the X6-2 generation, it is now possible to have single instances which can host standard edition. This is a good thing. One of the reasons you want to consider this, is that step-in costs can be reduced. For smaller companies, you get a database in a box which is just working. Nice, isn’t it?

So how does it perform?
Well … first things first! Slob. The wonderfull tool of Kevin Closson (you can find him here http://kevinclosson.net ). Slob helps stressing the storage so that you can find out how your system is behaving. It always one of the first things I run on a new system.

Marco Mischke (@dbamarco) was also playing with the X6 and he discovered an important performance difference between running your database on ASM and ACFS. It has been classified as a bug and “fixed” in the latest ODA image. Guess which version I installed on the ODA? Right, the latest one. So we got in touch and the first slob test was good. It reached far higher, so the problem looked to be fixed.

But looking a bit further, I wanted to test on ASM as well.
You know? I will provide the results you’re looking for by now 🙂

Ok First: ACFS, here we go.

ACFS IOPS Read

So with a limited set of workers we reach up to about 325000 iops. Given that the system has 20 cores available, this results into 16250iops per core.
If we translate that in MB/s we get this:

ACFS throughput MB

I left out the latencies here to make it a bit more clear but it peaks to 2,5GB/s at it’s most. So here are the latencies over the tests:

ACFS read latencies.

I put it into excel as well:

max read latency		2587.22	us	2.58722	ms
max write latency		2094.74	us	2.09474	ms

These are the maximum latencies during the test, so merely at the end. In my opinion, this is good.
If more details are needed, drop me a message and I will provide more information.

Let’s move on to ASM, exactly the same database, parameters, etc,… I love 12c! you can move the datafiles online, so that’s how it has been done.
ASM, your results please.

Oops, what’s that? 800.000 iops in read! And the write ones are only slightly better.

Then we go to the throughput:

So asm is faster than ACFS. I was expecting it to be a bit faster, but not this.
For completeness the latencies:

And then the figures:

max read latency		2508.65	us	2.50865	ms
max write latency		2893.87	us	2.89387	ms

This look likes expected. Good.

I talked to my team-lead and performance tuning expert, Geert De Paep about this behaviour. You could see the lights in his eyes, he wants to test it as well. So I’m looking forward to his blogpost as well. I can tell you already, by doing the queries manually on the swingbench schema, Geert was also able to see this behaviour. So we should also figure out what happens by using acfs. If it is still strange, we should contact Oracle as well. We will see.

If you run swingbench with the preconfigured runbooks, the first bottleneck you find is the cpu. This is due to all the pl/sql in swing bench. So knowing that … the next tests will be Logical IO.

As always, questions, remarks? find me on twitter @vanpupi

 

ORACLE DB IN THE AZURE CLOUD – PT2

ORACLE DB IN THE AZURE CLOUD – PT2

During the BI in the cloud project, one of the aspects we had to test is the network. Here is how we did it to figure out how the network performs and most of all, is it stable?

One of the most important things in a cloud environment is the network. It connects devices to eachother and makes it possible to have communication between devices. Sounds obvious, right?

Some tests we have done, were relying very heavily on the network, such like nfs, smb,… and in the beginning, we didn’t manage to get it stable. At some period in time, you have the “I-should-find-some-time-to-do”-moment. This was one of them. I should find some time to, in a very easy quick way, to check if the network remains “ok”. So, I came up with the most basic test a network test could be: ping! Ping? Pong, yes an easy ping. I know that firewalls give lower priority to ping but in this case they are configured well so this is good to go.

The test consists out a very little tiny script, which does 10 pings, some cli magic to grep the time out of it and record it in a file. It’s a quick and dirty script, and it’s a lot better to store it in a database. But hey, we just needed an idea, is the network stable or not. This script goes in the crontab for every 5 minutes on each of the 3 servers. This generates data and I harvested this data after a couple of days.  I would like to mention (oh oh, comment storm coming up) that regarding the network in this Microsoft Azure subscription, windows and linux servers are performing the same. Prerequisite is that you configure them well, so we did that 🙂

The first test is done on 2 servers, one linux and one windows, and stored in a different availability set (AS).

PTest1svg

This is no excel-graph. I would like to thank my team-lead Geert De Paep for letting me put my data into Pandora. Pandora is a tool which puts database data into every kind of svg-graph you would like. For the people interested, I can share the excel graph as well, but there were high peaks. To keep my detail, I needed the exponential graphs and pandora is the ideal choice to do so.

This looks to me that for every ping packet series, the first one takes some time and then it gets pretty stable.

The second test is also done on 2 servers, one linux and one windows. This time that are stored in the availability set (AS). But there’s a little other difference. The network throughput we had on other machines was bit disappointing. Hey Microsoft, can you do something about it? The answer was very easy. Use the preview of accelerated networking. So that is what we did.

PTest2svg

Strange behaviour in the beginning, but I assume, as it is a preview, that still something was going on. Timings are a bit lower, which is good. But also the same behaviour. One “slower” ping and then good results. Although between 18h and 20h we see some higher times on a daily rate. I think I should gather more data on this as well to spot if it is a recurring trend.

So that brings us to the third and final test. Just the same setup as the second one, except that it runs between 2 linux boxes. Azure, your results please!

PTest3svg

The graph looks different, but spot the time. While the windows boxes were shutdown between christmas and new year. No no no, it’s not because windows crashed, they were simply shutdown and resources are reused for other things.
But I do like the consistency. Still the same behaviour. One longer ping and then the rest lower but consistent.

As always, questions, remarks? find me on twitter @vanpupi

Oracle DB in the Azure cloud – Pt1

Oracle DB in the Azure cloud – Pt1

A few months ago (about october) ago we were contacted with the simple question: Can you run an oracle database in the cloud, the Azure cloud. Well … it depends. The little detail was, that the database is about 34TB and there are a few other multi TB databases AND there are a lot of copies of them. And … the final decision for go live is … end of 2016.  Well, we accepted the challenge.

The deadline was strict, so that’s also the reason I had less time to blog and these Azure cloud series won’t be completely chronological, … but (and this is a spoiler alert) I’m interested in sharing what we ended up with.

This post will focus on how the database tests using slob were done. Credits for @kevinclosson for the SLOB-tool and @flashdba for his slob testing harness. Combining these 2 provides a very quick way of running consistent tests. We needed such a quick testing framework as we were changing about everything to see if it impacted disk throughput / iops or not.

Why we choose those machines is for another post, but we opted for the DS15_V2 vm ( details here ). The explanation from the machine I borrowed from the Microsoft website: “Dv2-series, a follow-on to the original D-series, features a more powerful CPU. The Dv2-series CPU is about 35% faster than the D-series CPU. It is based on the latest generation 2.4 GHz Intel Xeon® E5-2673 v3 (Haswell) processor, and with the Intel Turbo Boost Technology 2.0, can go up to 3.1 GHz. The Dv2-series has the same memory and disk configurations as the D-series.”
Looks good, right? And we can attach up to 40TB to the machine, which makes it a candidate to be used for the future database servers.
It gets better, these family of servers can use also the Microsoft premium storage, which are basically SSD’s and disk caching is possible if needed.
As the databases are a bit bigger, only way we could do was use the P30 disks ( more details about them here ) So a disk limit of 5000 iops and 200MB/s. Should be ok as a first test.

The first test was done using iozone. The results of that will be in a different blogpost as I still need to do the second tests to crosscheck them. But let’s continue, but not before I would like to ask, if there are remarks, questions or suggestions to improve, I’ll be happy to test them.
The vm is created, 1 storage account was used, and in the storage account, it was completely filled up with 35 premium storage ssds.
Those disks were presented to the virtual machine, added into one big volume group and an xfs striped filesystem was created on a logical volume, which will host the SLOB database.
The db was created db using cr_db.sql from create database kit after enabling it for the 4k redologs. After finishing all steps to make it a Physical IO test we were good to launch the testing harness. It ran for a wile and eventually our top load profile looked like this during all the tests:

AWR_example_cloud

I think that’s ok? So after that it’s time to run the slob2-analyze.sh to generate a csv file. That csv was loaded in excel and this was the result.

1SA40disks_cloud

 

 

 

 

 

First I splitted the write and read iops, but then I decided to use the total iops as the graph follows the trend. My understanding (please correct me if wrong) is that around 30000 iops of a 8k database block is around 234MB/s? These tests were done without disk caching.

Then we decided to do the whole test again, but this time, instead of using 1 storage account with a bunch of disks, we used a bunch of storage accounts with only one disk in it. The rest of the setup was done exactly the same (created a new vm with same size, same volumegroup, same striping, …) and the database was created using the same scripts again. Here are the results:

40SA1Disk_cloud

 

 

 

 

 

I think it is remarkable that even in the cloud, the way how you provide the disks to the machine really does matters. For example if you take the 32 workers. With one storage account, remarkably less work was done.

More to come of course. Feedback is welcome about what might be the next blogpost. Let’s make it interactive 🙂

As always, questions, remarks? find me on twitter @vanpupi