Friday, September 17, 2010

Is there a silver bullet to improve intellij indexing performance?

Its known that Intellij indexing is I/O intensive. Upgrading to a faster storage device like solid-state device seems to be a silver bullet that solves Intellij indexing performance issues. Here are few reasons why this is not the case and why upgrading to SSD doesnt necessarily solve indexing problems:

Write performance of SSD is quite different from read performance. For random accesses of read operations, SSD performs much better than typical hard-disk. Where as, performance of SSD and hard-disk are almost similar in-case of sequential read operations. Problematic area with SSD is write operations. SSD performs writes at 'erase block' level. It merges write data with data in erase block, basically write operations are at erase block level. Erase blocks will be 1-4MB in size. As a result write performance of SSD is worse compared to typical hard-disks. To make things worse, existing file systems are not optim ized for SSDs. There are few file systems, like ZFS (for solaris) which is optimized for SSD. Here is example to illustrate the effect of write operations on performance of SSD: Intel X25 flash SSD input/output operations per second (IOPS) drops from ~2000 IOPS in pure read mode to ~700 IOPS in 80/20 read/write on 8k blocks. A typical hard-disk can perform ~200 IOPS. (Refer to the following articles for further information related to SSD.)

Just like input/output has been neglected in compute architecture; it has been neglected in profiling world also. Very few tools are available for input/output profiling of a process. These tools are rudimentary and some collect information from /proc/<pid>/io. These tools does not provide any information on number of read operations, write operations performed by a process. Read/write data volume or, read/write rates is not recommended way of measuring I/O performance. Refer to these articles to know why.

If we know the mix of read/write operations performed by Intellij, then we could compute effective performance improvement/degradation because of SSD. As we dont know mix of read/write operations, only way to measure the improvement of SSD is by running build/indexing etc with SSD and without SSD. The amount of time taken to build community edition index remained almost same with and without SSD. Here are index times for community edition on 8GB machine, 64-bit JVM and using Intellij 10.5

Storage Device
Num Files
Scanning Files Time
(secs)
Indexing Time
(secs)
SSD
78144
24.51
128.71
HardDisk drive
78144
15.69
141.1
Index times of Intellij community edition code using SSD, normal hard-disk. (Avg of 10 runs)

To summarize, SSD will be a silver bullet that solves intellij indexing problems only if the underlying filesystem is optimized for SSDs (like ZFS of solaris), number read operations performed is lot more than number of write operations. Till then, we need to find alternate solutions to fix intellij indexing performance issues.

Some more observations and notes related I/O profiling:

/proc/<pid>/io contains information about I/O done by the process. Here is amount of I/O performed by Intellij (during rebuild of Intellij community edition): (Refer to proc help to know the terminology used in io file)

rchar: 4263207814
wchar: 1363709925
syscr: 4788271
syscw: 1342145
read_bytes: 1504063488
write_bytes: 3310067712
cancelled_write_bytes: 405561344

read_bytes represents number of bytes this process really caused to be read from storage (~1.5GB). write_bytes represents number of bytes this process really caused to be written to storage (~3 GB).
wchar - The number of bytes which this process has caused to be written is less than write_bytes -

I/O Profiling Tools:
collectl can be used to monitor I/O on a system. This has capabilities similar to top command.