/etc/system (Both Vxvm and OS settings)
Set the File Descriptors - The defaults are way too low and need to be bumped up. For Solaris 10, increasing the rlim_fd_cur should suffice.
- rlim_fd_max = 1024
- rlim_fd_cur = 64
- rlim_fd_max = 65,536
- rlim_fd_cur = 256
- Maxphys is the maximum size of physical I/O requests. If a driver sees a request larger than this size, the driver breaks the request into maxphys size chunks. File systems can and do impose their own limit. This value should be higher than all other settings (vol_maxio, vol_maxspecialio etc) such as in Filesystems/Volume Manager etc. maxphys is set in bytes.
Solaris 9 and 10
maxphys = 131072 (128K)
Virtual Memory values - The values below are best explained if one reads the book - Solaris Internals. The below ties into the VM. These values play a vital role during heavy memory operations.
- maxpgio - Maximum number of page I/O requests that can be queued by the paging system. This number is divided by 4 to get the actual maximum used by the paging system. It is used to throttle the number of requests as well as to control process swapping. maxpgio is in I/O's.
- slowscan - Minimum number of pages per second that the system looks at when attempting to reclaim memory. Folks set either slowscan or fastscan. I prefer to set slowscan.
- tune_t_fsflushr - Specifies the number of seconds between fsflush invocations.
- autoup - Along with tune_t_flushr, autoup controls the amount of memory examined for dirty pages in each invocation and frequency of file system sync operations.
On systems with more than 16GB memory, to reduce the impact of fsflush on the system, it is best to set autoup to higher values.
Solaris 9 and 10
maxpgio = 40
slowscan = The smaller of 1/20th of physical memory in pages and 100.
tune_t_fsflushr = 5
autoup = 30
CPU Affinity and Context switches -
- rechoose_interval - This settings tries to run threads on the same cpu it ran before. The understanding is that the cpu cache is warm and has the instructions and data for the thread improving efficiency. The rechoose_internal variable instructs the kernel on which cpu to select to run a thread if a choice needs to be made. So if a process hasn't run in rechoose_interval ticks it will be moved to another CPU. Otherwise it will continue to wait on the CPU it has been running on. A higher value of rechoose_interval "firms-up" the soft affinity. The down side is that you can end up with sluggish spreading out of processes on an application where a single processes forks a lots of children if this value is too high.
Default: - 3
VxVM System kernel parameters -
- vol_maxio - IOs of a size larger than this are boken up in the Veritas VxVM layer. Physical IOs are broken up based on the disk capabilities are unaffected of the setting of the logical IO size.
Default: 512 sectors. Remember that 512 sectors = 256KB
- vol_maxioctl - The size of the largest ioctl that VxVM will handle. Bigger than this and it will break it down. ODM uses ioctl, so it makes sense to make this bigger than the biggest request (reads/writes) that can be issued from Oracle. The below sets it to 128K which is the max.
Default: 32 KB
- vol_maxspecialio - The size of the largest value handled by an ioctl call as issued by the application (such as oracle when using ODM). The ioctl itself may be small, but it can have requested a large IO operation.
Default: 512 sectors
- vol_default_iodelay - Count in clock ticks that utilities will pause between issuing IOs,it they have been directed to throttle down speed but haven't been given a specific delay time. Utilities such as resyncronizing mirrors or rebuilding RAID-5 utilities will use this value.
Default: 50 ticks
- voliomem_chunk_size - The granularity of memory chunks used by VxVM when allocating or releasing system memory. A larger granularity reduces CPU overhead due to memory allocation by allowing VxVM to retain hold of a larger amount of memory.
- voliomem_maxpool_sz - The maximum memory requested from the system by VxVM for internal purposes. This tunable has a direct impact on the performance of VxVM as it prevents one I/O operation from using all the memory in the system.
Default: 5% of memory up to a maximum of 128MB.
Shared Memory settings -
I will cover these in a later discussion.
Vxfs Settings -
When using direct i/o (With ODM or forcedirectio options), it does not make any difference on how you set your prefetch or write-back policies for any of your volumes containing oracle data files as all I/O will bypass the file-system buffer cache. However you can set the read-ahead and write-back for other volumes - application, oracle binaries etc.
The below are the default settings for a concat filesystem (vxfs version 4.1)
read_pref_io = 65536
read_nstream = 1
read_unit_io = 65536
write_pref_io = 65536
write_nstream = 1
write_unit_io = 65536
pref_strength = 10
buf_breakup_size = 1048576
discovered_direct_iosz = 262144
max_direct_iosz = 1048576
default_indir_size = 8192
qio_cache_enable = 0
write_throttle = 0
max_diskq = 1048576
initial_extent_size = 8
max_seqio_extent_size = 2048
max_buf_data_size = 8192
hsm_write_prealloc = 0
read_ahead = 1
inode_aging_size = 0
inode_aging_count = 0
fcl_maxalloc = 162688000
fcl_keeptime = 0
fcl_winterval = 3600
oltp_load = 0
You could set much higher values than the default and see if it helps improve performance for non oracle data file volumes. Normally, you would set the read-ahead and write-back as a proportion to how your Raid Group is setup on the Array.
For e.g -
For a Raid5 RG say 6D+1 - The read-ahead would be 6*(IO size of Array)
For data files, it is always better if the application (such as oracle) handle the read-ahead and write-back. Oracle has it's own buffer cache and is intimately aware of what data is required to handle user requirements.
HBA Settings -
The below settings are for an Emulex HBA. There exists similar configs for qlogic too.
Network Settings -
I have observed that a Solaris 10 system running mpath can easily send/receive 60MB/sec on gigabits link. This is of course also dependent on how you push the traffic down the pipe. Normally I set the below parameters using the nddconfig script provided by SUNWjass.
- tcp_maxpsz_multiplier - This parameter will mean that we are doing fewer copy operations with more data being copied per operation.
- tcp_wscale_always -To ensure window scaling is available at least on the receiving side.
- tcp_cwnd_max -This parameter describes the maximum size the congestion window can be opened. This plays a vital role in gigabit links and can give you incredible results.
- tcp_max_buf - The maximum buffer size in bytes. It controls how large the send and receive buffers are set to by an application using setsockopt(3XNET). This plays a vital role in gigabit links and can give you incredible results.
- tcp_xmit_hiwat -This parameter influence a heuristic which determines the size of the initial send window. The actual value will be rounded up to the next multiple of the MSS, e.g. 8760 = 6 * 1460. On Solaris 10, the default is 1MB and so would not need to be changed.
- tcp_recv_hiwat -This parameter determines the maximum size of the initial TCP reception buffer. The specified value will be rounded up to the next multiple of the MSS. On Solaris 10, the default is 1MB.