Tuesday, January 8, 2008

NFS and Oracle - Mount options - noac, actimeo, forcedirectio, et al.

While doing my first install of Oracle RAC on Solaris using NFS as the Clustered Filesystem, I took the opportunity to test the various mount options specified by oracle (Doc ID: Note:359515.1).

Let us take a look at the options specified

1. Oracle binaries -

rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,vers=3,suid -

I could understand rw,bg,hard,nointr, tcp,suid,vers=3. But noac, rsize and wsize did not make much sense to me.

rsize and wsize should be considerably higher than 32K. A Sun NFS server (While not supported by Oracle) supports upto 1MB. Still anyway, Oracle does not complain if you set it higher. However noac seems to be pretty much backword. From the manpage of mount_nfs on Solaris 10,

noac

Suppress data and attribute caching. The data cach-
ing that is suppressed is the write-behind. The
local page cache is still maintained, but data
copied into it is immediately written to the server.

Setting the noac option also disables attribute caching, but has the further effect of disabling client write caching. While this guarantees that data written by an application is written directly to a server, where it can be viewed immediately by other clients, it has a significant adverse effect on client write performance. Data written into memory-mapped file pages (mmap(2)) are not written directly to this server.

noac makes sense if you have a single Oracle home for all your RAC instances. I wonder how many dba's would install RAC that way. It takes away high-availability as an option completely.

The performance impact of noac is significant - install/patch etc takes forever as every write has to synced to disk before it can proceed.

I started installing RAC with noac and found it incredibly slow going. I did some testing and this is what I found - unzipping the 1.2GB 10.2.0.3 patchset for the DB (Solaris) using the various options specified. unzipping tests both the read and write operations. While the system I was testing is a 280R (a relic), it is still a good system for testing purposes.

Using noac (oracle recommendation)

rw,bg,vers=4,proto=tcp,hard,intr,rsize=524288,wsize=524288,noac, - 33 minutes, 34 secs

Using defaults (without noac, forcedirectio or actimeo)

rw,bg,vers=4,proto=tcp,hard,intr,rsize=524288,wsize=524288 - 7 minutes,44 secs

I also tested with forcedirectio and actimeo. From the manpage of mount_nfs,

forcedirectio | noforcedirectio

If forcedirectio is specified, then for the duration
of the mount, forced direct I/O is used. If the
filesystem is mounted using forcedirectio, data is
transferred directly between client and server, with
no buffering on the client. If the filesystem is
mounted using noforcedirectio, data is buffered on
the client. forcedirectio is a performance option
that is of benefit only in large sequential data
transfers. The default behavior is noforcedirectio.

actimeo=n

Set min and max times for regular files and direc-
tories to n seconds. See "File Attributes," below,
for a description of the effect of setting this
option to 0.

Setting actimeo=0 disables attribute caching on the client.
This means that every reference to attributes is satisfied
directly from the server though file data is still cached.
While this guarantees that the client always has the latest
file attributes from the server, it has an adverse effect on
performance through additional latency, network load, and
server load.

Using actimeo=0

rw,bg,vers=4,proto=tcp,hard,intr,rsize=524288,wsize=524288,actimeo=0 - 12 minutes,28 secs


Using forcedirectio and actimeo=0

rw,bg,vers=4,proto=tcp,hard,intr,rsize=524288,wsize=524288, forcedirectio, actimeo=0 - 18 minutes,12 secs


Using forcedirectio and noac

rw,bg,vers=4,proto=tcp,hard,intr,rsize=524288,wsize=524288, forcedirectio, noac - 19 minutes,10 sec

Obviously the winner is to enable write caching on the client (default options). In case you are sharing Oracle homes, then you would need to enable noac to ensure that writes are consistent and all attributes are referred back to the NFS server.

However, if the Oracle homes are independent, I would assume it is perfectly safe to use the default options (albeit with higher rsize and wsize values).

Now moving onto the data files, the options specified are

2. Oracle Data Files

rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,forcedirectio, vers=3,suid

Again except noac and forcedirectio, the other options are okay. What is odd is that forcedirectio and noac are doing the same thing (direct writes to the files bypassing buffer cache).

noac also disables attribute caching.

A more sensible option would be forcedirectio and actimeo=0. Forcedirectio enables direct io and actimeo=0 disables file attribute caching. noac is backwords and I wonder why Oracle still insists on it. The problem is that if you do not enable noac, then your instance will not start (complains about the NFS options). While you can do all the installs, dbca will fail to startup the instance with errors as below.

WARNING:NFS file system /RACS/und1 mounted with incorrect options
WARNING:Expected NFS mount options:rsize=32768,wsize=32768,hard,noac
Thu Dec 27 12:36:17 2007
Errors in file /RACS/orabase/racshome/admin/bdump/racs1_dbw0_17195.trc:
ORA-01157: cannot identify/lock data file 2 - see DBWR trace file
ORA-01110: data file 2: '/RACS/und1/undotbs01.dbf'
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options

Now about the options for CRS and voting disks

3. CRS and Voting disks

rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,vers=3,noac, forcedirectio

Same argument as before - noac and forcedirectio are doing the same thing. It would be better to use forcedirectio and actimeo=0.

Oracle seems to have gotten it right with Linux with actimeo=0, however I do not see an option for forcedirectio for nfs (neither ver3 nor ver4).

4 comments:

Anonymous said...

Hi Krishna, this is vey ineresting. Did you ever compare a simple mkfile 1gig to an nfs share mounted with and without forcedirectio? In my tests with fdio or noac writing 1gig took about 1min to create, without it takes only 15secs.
Do you have the same effect?

thanks chris

Krishna Manoharan said...

Hi Chris,

Thanks for dropping by.

Yes, I would expect this to be the case. Without forcedirectio, the filesystem page cache mechanism would buffer the I/O thus giving the impression that it is fast.

Regards
Krishna

bryan said...

You realize that if a second node extends a datafile or something like that on your shared storage even though ORACLE knows the file is big enough, the file system will refuse sometimes to write to these new blocks if noac is not set? Results in an ugly error in a highly transactional environment.

Anonymous said...

This is exactly what we have faced.We were getting huge performance impact with noac option and the 1GB files takes about 4-5 minutes over a 1Gb Network NFS mount. The resolution was to use mount options actimeo=0,tcp,vers=3,hard,nointr .Do not use noac, as the network traffic will he high and slow write performance. actimeo=0 do not cache anything on server which is equal to noac, but the performance was much better.