High CPU, ksoftirqd, LEAP second - SOLVED EXPLAINED
In case that your server start to act funny, funny like slow response time, high CPU load, etc. first thing that you do it to execute top command. In case your top looks something like this
you know that something strange is going on. CPU load is very, very high! My first guess was that something is wrong with java application that this server is running. But when I notices that other servers have same issue(very high CPU load), I new that something strange is happening because those server very on different locations, different platforms(hardware, virtual), different OS(but all Linux) and different application were running on them(DB,java,etc.). Also I noticed these ksoftirqd processes. They did not use so much CPU but it's strange that they are so high in top CPU list.
High CPU load because of ksoftirqd processes
So, all servers with issues had same problem - ksoftirqd process is causing this! What is ksoftird?
ksoftirqd is a per-cpu kernel thread that runs
when the machine is under heavy soft-interrupt load. Soft interrupts
are normally serviced on return from a hard interrupt, but it's possible
for soft interrupts to be triggered more quickly than they can be
serviced.
So these soft interrupts if from some reason causing other process (like java process) to use too much CPU.
What caused this? And why? To be worst, I rebooted one server with high CPU load because of ksoftirqd processes but this did not helped!
High CPU load because of ksoftirqd processes cause by LEAP second issue - SOLVED
After few hours of searching on Internet what can be cause of this, I remembered that I saw a news and that then mention that "tonight" (night before problem started) there is going to be one extra second and that it is something that is normal and every few year extra second is added. I continue to search for solution and then on one forum I read that someone is mentioning "leap" second. What is leap second?
From Wikipedia:
A leap second is a one-second adjustment that is occasionally applied to Coordinated Universal Time (UTC) in order to keep its time of day close to the mean solar time, or UT1. Without such a correction, time reckoned by Earth's rotation drifts away from atomic time because of irregularities in the Earth's rate of rotation.
The NTP packet includes a leap second flag, which informs the user that a
leap second is imminent. This, among other things, allows the user to
distinguish between a bad measurement that should be ignored and a
genuine leap second that should be followed. It has been reported that
never, since the monitoring began in 2008 and whether or not a leap
second should be inserted, have all NTP servers correctly set their
flags on a December 31 or June 30.This is one reason many NTP servers broadcast the wrong time for up to a day after a leap second insertion
So I start to search for this leap second issue solution as possible cause of my problem because it was 1.7.2015.
How to solve this leap second that is causing high CPU load because of ksoftirqd processes?
# date -s now
As soon as I execute this, CPU load start do drop! In a 2 minutes CPU load was back to normal.
No comments:
Post a Comment