Redo apply is slow? Or not really?

Redo apply is slow? Or not really?

A common question that often arrives in my mailbox is that redo apply on an Active Data Guard standby database is significantly slower than on the mounted standby. A famous wise man once said: “If the primary can generate it, the standby can apply it”. I totally agree.

If the primary can generate it, the standby can apply it.

Larry M. Carpenter

Let’s start by assuming that the redo apply best practices have been followed and the synchronous best practices or asynchronous best practices, depending on the transport method of choice, have been applied. Following these best practices often solves the problem, but if not we obviously need to investigate further to determine what is really going on. Oh by the way … make sure to try to avoid transport lag as that can be an obvious culprit. You can’t apply faster than that data comes in. It Sounds obvious, but sometimes because of other assumptions, transport lag can be overlooked. When you suspect issues with transport lag, you can investigate that with the OraTCPTestTool from My Oracle Support.

So where is this behavior coming from when people experience performance issues? To answer that question, it is vital to understand in which areas Active Data Guard performs more work to make sure it can be tuned properly. We roughly can define 6 areas that impact the redo apply performance:

  • Environments using Oracle RAC at the standby site may experience increased redo apply lag time caused by locking or cache fusion related wait events when using ADG on Multi Instance Redo apply. In this case, ADG recovery needs to get fusion locks for media recovery buffers to maintain global cache coherency. This may incur some cross-instance communication cost for which a high performance, low-latency network is recommended.
  • There is also maintenance of additional buffer cache queues. ADG recovery maintains additional data structures. For example, object queues that link all buffers belonging to the same object into one queue, to speed up queries. Mounted recovery doesn’t incur this cost.
  • Additional buffer cache pressure. If recovery is running on an open instance, then recovery and queries contend on buffer cache resources, resulting in possibly more free buffer waits, more contention on buffer cache related latches, and recovery issuing more read IOs due to less number of buffers for media recovery buffer. Mounted recovery doesn’t have this issue.
  • Data Guard recovery needs to perform additional handling of library cache and row cache invalidation redo and other redo to maintain the consistency of row cache and library cache, as well as other caches (e.g. result cache, etc.) on Active Data Guard. Mounted recovery simply skips them.
  • Additional handling to maintain consistency of in-memory column store on ADG. When IMC is enabled on ADG, ADG recovery incurs the additional cost of analyzing every redo change in practice, grouping them into transactions and invalidating corresponding IMC cache at the right time. Mounted recovery doesn’t incur this cost.
  • CPU/IO contention with queries.

In general, ADG recovery is not expected to have big differences between recovery speed between mounted recovery and ADG recovery in general.

So to conclude; when mounted recovery is considered faster than ADG recovery, then it is important to look into this to understand why that is the case. The question remains the same: “What is the database waiting for?”.

As always, questions, remarks? 
Find me on twitter @vanpupi

Leave a Reply

Your email address will not be published. Required fields are marked *

eighteen + fourteen =

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: