Tuesday, 22 September 2015

High CPU load because of ksoftirqd processes cause by LEAP second issue - SOLVED

 High CPU, ksoftirqd, LEAP second - SOLVED EXPLAINED

In case that your server start to act funny, funny like slow response time, high CPU load, etc. first thing that you do it to execute top command. In case your top looks something like this

leap seconf LINUX cpu load

 you know that something strange is going on. CPU load is very, very high! My first guess was that something is wrong with java application that this server is running. But when I notices that other servers have same issue(very high CPU load), I new that something strange is happening because those server very on different locations, different platforms(hardware, virtual), different OS(but all Linux) and different application were running on them(DB,java,etc.). Also I noticed these ksoftirqd  processes. They did not use so much CPU but it's strange that they are so high in top CPU list.

High CPU load because of ksoftirqd processes

So, all servers with issues had same problem - ksoftirqd process is causing this! What is ksoftird?

ksoftirqd is a per-cpu kernel thread that runs when the machine is under heavy soft-interrupt load. Soft interrupts are normally serviced on return from a hard interrupt, but it's possible for soft interrupts to be triggered more quickly than they can be serviced.

So these soft interrupts if from some reason causing other process (like java process) to use too much CPU.

What caused this? And why? To be worst, I rebooted one server with high CPU load because of ksoftirqd processes but this did not helped!

High CPU load because of ksoftirqd processes cause by LEAP second issue - SOLVED  

After few hours of searching on Internet what can be cause of this, I remembered that I saw a news and that then mention that "tonight" (night before problem started) there is going to be one extra second and that it is something that is normal and every few year extra second is added. I continue to search for solution and then on one forum I read that someone is mentioning "leap" second. What is leap second?

From Wikipedia:
A leap second is a one-second adjustment that is occasionally applied to Coordinated Universal Time (UTC) in order to keep its time of day close to the mean solar time, or UT1. Without such a correction, time reckoned by Earth's rotation drifts away from atomic time because of irregularities in the Earth's rate of rotation. 
 
The NTP packet includes a leap second flag, which informs the user that a leap second is imminent. This, among other things, allows the user to distinguish between a bad measurement that should be ignored and a genuine leap second that should be followed. It has been reported that never, since the monitoring began in 2008 and whether or not a leap second should be inserted, have all NTP servers correctly set their flags on a December 31 or June 30.This is one reason many NTP servers broadcast the wrong time for up to a day after a leap second insertion

So I start to search for this leap second issue solution as possible cause of my problem because it was 1.7.2015.
How to solve this leap second that is causing high CPU load because of ksoftirqd processes?

# date -s now

As soon as I execute this, CPU load start do drop! In a 2 minutes CPU load was back to normal.

No comments: