Thursday, 18 August 2016

WARNING: hpasmd: System Overheating (Zone 4, Location CPU, Temperature 85C) - SOLVED and EXPLAINED

In case you find this message in you server log then you are probably looking for reason why your HP server sometimes reboot it self with no particular reason.

WARNING: hpasmd: System Overheating (Zone 4, Location CPU, Temperature 85C)

So reason is overheating! As you know every machine have something called working temperature. Manufacturer of hardware always point's  that in that boring pdf that you never read. In case that working temperature is not in that temperature range, machine can go malfunctioning. To avoid that some manufactures develop certain mechanisms to prevent that. In HP servers case, that mechanism is server reboot. 

HP servers System Overheating reason


Reasons for system overheating can be various, from very high CPU load on certain CPU to too much dust covering air flow ventilators. System overheating can be one time thing(every few months) or every day. It all depends. 

You have to rule out reasons for system overheating. Monitor temperatures and try to connect them to CPU load. HP have tool called hpasm and with that you can monitor various things. For temperatures you can enter hpasm cli and user show temp:

hpasmcli> show temp
Sensor   Location              Temp       Threshold
------          --------              ----       ---------
#1        CPU#1                46C/114F   85C/185F
#2        CPU#2                43C/109F   85C/185F
#3        CPU#3                48C/118F   85C/185F
#4        CPU#4                50C/122F   85C/185F
#5        I/O_ZONE           37C/98F    60C/140F
#6        AMBIENT            22C/71F    40C/104F
#7        SYSTEM_BD       39C/102F   60C/140F

In case you have ruled out high load for system overheating condition, then it's probably air flow ventilator issue. How to fix this? Shutdown the server, open it and start vacuum cleaner. :) No, I am not joking.  You may find something like this
WARNING: hpasmd: System Overheating

 On right side is CPU heat sink before vacuum cleaning and on the right is CPU heat sink after vacuum cleaning. As you can see, there was wall of dust stopping normal air flow. This was reason for system overheating and reboot of server.

In case you still think that I am joking take a look at this time graph with CPU temperatures.

  As you can see, there are significant drop in CPU temperatures after dust removal! 

Point is keep you servers clean :)

BIOS update on HP Proliant DL 580 G4 - how to find right(very old) BIOS ROM package and few other things and tricks

Ok, ok, I know.. It's 2016... And this is server from 2007... 9 years... It's more then 7 years in Tibet :) Just kidding!

OK, so you have old server that you need up and running but you need to update BIOS ROM. BIOS ROM update procedure is pretty simple. You go to manufacture web site or portal and find necessary package, follow the instructions and that it. Basically it is but... there are few thing you need to know before you go like "Ok, it's 5 minute job". Good preparation can take days for something to be 5 minute job. Keep that in mind always.

BIOS update on HP Proliant DL 580 G4

For those who don't know, HP make servers. And a good one! There are few server classes like rack servers, blade servers, etc. Every or every second year HP put's on market new servers!  I am working with HP servers 8+ years now. In case you are now familiar with HP annotation, PROLIANT DL is server line, 580 is server type and G4 is means Generation 4. Every few years, HP put's out new generation of servers. From 2015. there is Generation 9. I hope you get the picture. G4 is very old! 

So, to update BIOS ROM you need to follow these steps:

The SCEXE components are self-extracting executable files. The SCEXE file unpacks itself, flashes the ROM, and cleans up.

To flash a ROM:
Download the file to the target server.

Execute sh CPxxxxxx.scexe where CPxxxxxx.scexe represents the filename of the component.

BIOS update on HP Proliant DL 580 G4

By the way, there are two types of update files : .exe and .scexe. Exe is for Windows environment and scexe is for Linux environment.
As you can see, there are small padlock near Obtain software. This means that package is not "free" to download. When you press Obtain software you will get this pop-up windows 
BIOS update on HP Proliant DL 580 G4

So lets suppose that you don't have this. So how to get this file? Or any other update firmware file for that particular old server? Answer in Firmware update DVD. You can find this for every HP server. Just download it. It's iso file. You can burn that .iso file to DVD, boot from it and update of BIOS ROM or some other firmware. Or you can mount that iso file and file that one package that you need. Easy? It sound's easy but ... as I mention in first part of this post, good preparation is what makes BIOS ROM update 5 minute job. Why? Because, on HP website for HP DL 580 G4 you can download Firmware update DVD 10.10. It would be perfectly logical to find there BIOS ROM package CP009618.scexe. But .... there are no such file on that .iso.

[root@]# mount -t iso9660 FW1010.2012_0530.49.iso /mnt/ -o loop
[root@]# cd /mnt/
[root@]# find . -name CP009618.scexe

Why? My guess it that you have to keep up with the new hardware. One of things to do that is to reduce support for old hardware.

So... now what?

So what to do now? In case you have noticed, there are date of in version. So file CP009618.scexe referrers as 2008.06.10(2 Sep 2008). My idea was to find Firmware update DVD from 2008 and try to find file that I need. And....

[root]# mount -t iso9660 FW840.2009_0209.17.iso /mnt/ -o loop
[root@]# cd /mnt/
[root@]# find . -name CP009618.scexe

This is on Firmware update DVD 8.4.

There it was... Package for BIOS ROM update for HP Proliant DL 580 G4!
From now it's 5 minute work.
Copy file to desired server and follow instructions.

[root]# sh CP009618.scexe
Online ROM Flash Engine Version: Linux-3.5.0-0
Name: HP ProLiant DL580 G4 (P59)
Software Version: 06/10/2008

The software is installed but is not up to date.

Current Version: 08/10/2007

Do you want to upgrade the software to a newer version (y/n) ?y

Flash in progress do not interrupt or your system may become unusable.
The installation procedure completed successfully.

A reboot is required to finish the installation completely.
Do you want to reboot your system now? yes
Do you want to reboot your system now? yes

For BIOS ROM update to take effect you have to reboot the server.