WD Green HDDs and excessive interrupts

Green as it’s cool, green as it’s quite, just like the trees. You’d think it’s all good and perfect. It’s also supposed to consume way less power. Yay, greener planet… Except…

When i started buying them in bulk, 500GB was a lot and a 32MB cache size seemed to be preferable to 16MB which the blue caviar offered at the time. From time to time when i was in hurry and couldn’t find a WD Green HDD, I’d settle with Blue. After couple of months a pattern started to emerge. Clients after clients started complaining about low performance. Their PCs would freeze. Sometimes as long as couple of minutes and then it would continue working again like nothing had happened. At the time I couldn’t quite figure out why after couple of months of usage, WD Green HDDs would start acting up like that.

The strange thing was that nothing was reported anywhere. Not a single suspicious system log or the so called SMART log. Even on couple of clients using Intel Raid, the “Intel RAID Chipset” seemed to be very happy with minutes of interrupts caused by the HDDs. And in a single case, one HDD suddenly died. Out of happiness I presume.

Maybe quality has gone worse?

Well, yeah, Maybe. Who can argue with that? Remember the old Quantum hard drives? I was worried that they would outlive me.

And yet interestingly enough, only the WD Green products suffered from this. It is understandable why WD Black would be an exception (It is much more expensive), but why WD Blue hard drives should be?

What’s really going on?

S.M.A.R.T.

Self-Monitoring, Analysis and Reporting Technology. This technology which is built into almost all of today’s hard drives, has a single job: Alert you when the hard drive is failing.

Sure, SMART is not perfect. For starters, I expected the system to alert me instantly when such huge interrupts occur as clearly something is wrong. But in this case, vendors and their implementation of it is more to blame than the technology itself.

You can educate yourself more with the technology Here.

Accessing SMART data of an HDD highly depends on the way the drive is connected and how (or if) the controller exposes the data.

Luckily smartmontools is here to help. This open source project being able to run under Windows, Linux and lots of other systems, not only can access most hard drives SMART data, but it can parse it according to it’s rich database. One of the most interesting aspect about smartmontools is it’s ability to read the SMART data off the hard drives behind most RAID controllers including Intel ICHxR. So no need to set your BIOS storage setting to AHCI and boot up a live CD just to read the data. smartmontools also comes with a manual which i highly recommend to read it as it’s also capable of running as a daemon and constantly monitoring your hard drive and alert you if needed. But for now, we only want to use it to extract the SMART data out of a WD Green hard drive.

The system that I want to gather SMART data from is running windows and has 4 WD Green HDDs in RAID 10.

First let’s find out the correct path to them:
smartctl --scan

/dev/sda -d scsi # /dev/sda, SCSI device
/dev/csmi2,0 -d ata # /dev/csmi2,0, ATA device
/dev/csmi2,1 -d ata # /dev/csmi2,1, ATA device
/dev/csmi2,2 -d ata # /dev/csmi2,2, ATA device
/dev/csmi2,3 -d ata # /dev/csmi2,3, ATA device

Great. I already know which one is failing so let’s jump straight to that:
smartctl --attributes /dev/csmi2,0

#ID ATTRIBUTE_NAME          VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     200   200   047    Pre-fail  Always       -       0
  3 Spin_Up_Time            149   145   039    Pre-fail  Always       -       3541
  4 Start_Stop_Count        098   098   050    Old_age   Always       -       2922
  5 Reallocated_Sector_Ct   200   200   051    Pre-fail  Always       -       0
  7 Seek_Error_Rate         100   253   046    Old_age   Always       -       0
  9 Power_On_Hours          039   039   050    Old_age   Always   FAILING_NOW 44802
 10 Spin_Retry_Count        100   100   050    Old_age   Always       -       0
 11 Calibration_Retry_Count 100   100   050    Old_age   Always       -       0
 12 Power_Cycle_Count       098   098   050    Old_age   Always       -       2920
192 Power-Off_Retract_Count 200   200   050    Old_age   Always       -       75
193 Load_Cycle_Count        001   001   050    Old_age   Always   FAILING_NOW 2206333
194 Temperature_Celsius     104   090   034    Old_age   Always       -       39
196 Reallocated_Event_Count 200   200   050    Old_age   Always       -       0
197 Current_Pending_Sector  200   200   050    Old_age   Always       -       0
198 Offline_Uncorrectable   200   200   048    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    200   200   050    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   200   200   008    Old_age   Offline      -       0

After trying to wrap my head around it for a while, i was astonished by the off-chart number of LCC.

LCC

Load Cycle Count number is basically for showing how many times the hard drive parked its heads (or unloads them). It usually happens when you power off the PC or put it into standby. In case of WD drives, it could also happen internally by the firmware after a defined idle time has been passed. Doing so, would preserve some power and also keep the drive cooler. Western Digital calls this technology IntelliPark.

The down side? When the heads are parked and unparked constantly, the involving parts (including the heads themselves) could potentially wear off. In fact according to WD Spec Sheet, The number is 300,000 for Green drives. The device is at the end of its life when it hits that limit and its reliability can not be guaranteed anymore. This hard drive has been working for about 5 years and recorded more than 2.2 million load cycles. With approximately 50 heads unloading every single hour, it had likely hit the limit just after 8.5 months!

This issue had been brought up over and over and over, again and again.

Western Digital however does not want to officially admit the poor of a decision which led to this. They mainly blame “utilities, operating systems, and applications” instead.

The reason for this behavior lies in their amazing decision to ship the initial WD Green series with the default idle time of 8 seconds. That’s it. 8 seconds of no activity and the heads are parked. This of course makes this hard drive amazingly terrible to be used for… well… anything. Not only it would make the drive slower as it would take time to unpark the heads, it also creates a cycle loop as soon as a program starts accessing the drive every 9 seconds or so.

The Solution

Luckily the solution is a rather simple one and provided as the last resort in the same WD knowledge base. It however requires manual intervention. Even in newer WD Green HDDs, high LCC is still an issue as the default idle time is set to 12 seconds. It is possible to adjust the idle time in the WD drive’s firmware by issuing a specific SMART command. WD official support page has a program just for that which should virtually work on all WD products (Even though the knowledge base article only targets it at some specific models). The file is called wdidle3.exe and can be downloaded from Here. “Idle 3” is what they call this technology and its time is adjustable with that program. Note however that it must be run under DOS. Make a FreeDOS bootable flash drive or a CD/DVD and put wdidle3.exe file there and boot from it. Also if you’re using RAID system you probably need to disable it temporarily to get raw access to each HDD.

NOTE: These commands affect all connected and recognized WD hard drives. non-WD HDDs are supposed to be recognized,reported, and yet not affected by the program

wdidle3.exe /R
would give you the current active idle 3 times.

wdidle3.exe /S300
would set the idle 3 time to 300 seconds(which is the maximum in most recent models).

wdidle3.exe /D
disables idle 3 technology entirely. Note however that there has been report of issues disabling idle 3 on hard drives of which they came pre-enabled.
Also some older WD Blue series come with the idle 3 timer disabled. Some newer ones however, seems to be set to the maximum time of 300 seconds.

If you are a Linux/Unix user, idle3-tools is an open source alternative program to the commercial WD software and does not require booting into DOS. In Debian based systems it can be simply installed by issuing sudo apt-get install idle3-tools in the terminal.

Do not forget to power cycle your system after applying the settings.

Hamy
Hamy
a sysadmin in the wind
comments powered by Disqus

Related