Tuesday, 22 September 2015

High CPU load because of ksoftirqd processes cause by LEAP second issue - SOLVED

 High CPU, ksoftirqd, LEAP second - SOLVED EXPLAINED

In case that your server start to act funny, funny like slow response time, high CPU load, etc. first thing that you do it to execute top command. In case your top looks something like this

leap seconf LINUX cpu load

 you know that something strange is going on. CPU load is very, very high! My first guess was that something is wrong with java application that this server is running. But when I notices that other servers have same issue(very high CPU load), I new that something strange is happening because those server very on different locations, different platforms(hardware, virtual), different OS(but all Linux) and different application were running on them(DB,java,etc.). Also I noticed these ksoftirqd  processes. They did not use so much CPU but it's strange that they are so high in top CPU list.

High CPU load because of ksoftirqd processes

So, all servers with issues had same problem - ksoftirqd process is causing this! What is ksoftird?

ksoftirqd is a per-cpu kernel thread that runs when the machine is under heavy soft-interrupt load. Soft interrupts are normally serviced on return from a hard interrupt, but it's possible for soft interrupts to be triggered more quickly than they can be serviced.

So these soft interrupts if from some reason causing other process (like java process) to use too much CPU.

What caused this? And why? To be worst, I rebooted one server with high CPU load because of ksoftirqd processes but this did not helped!

High CPU load because of ksoftirqd processes cause by LEAP second issue - SOLVED  

After few hours of searching on Internet what can be cause of this, I remembered that I saw a news and that then mention that "tonight" (night before problem started) there is going to be one extra second and that it is something that is normal and every few year extra second is added. I continue to search for solution and then on one forum I read that someone is mentioning "leap" second. What is leap second?

From Wikipedia:
A leap second is a one-second adjustment that is occasionally applied to Coordinated Universal Time (UTC) in order to keep its time of day close to the mean solar time, or UT1. Without such a correction, time reckoned by Earth's rotation drifts away from atomic time because of irregularities in the Earth's rate of rotation. 
 
The NTP packet includes a leap second flag, which informs the user that a leap second is imminent. This, among other things, allows the user to distinguish between a bad measurement that should be ignored and a genuine leap second that should be followed. It has been reported that never, since the monitoring began in 2008 and whether or not a leap second should be inserted, have all NTP servers correctly set their flags on a December 31 or June 30.This is one reason many NTP servers broadcast the wrong time for up to a day after a leap second insertion

So I start to search for this leap second issue solution as possible cause of my problem because it was 1.7.2015.
How to solve this leap second that is causing high CPU load because of ksoftirqd processes?

# date -s now

As soon as I execute this, CPU load start do drop! In a 2 minutes CPU load was back to normal.

Wednesday, 5 August 2015

CRON job not running marked as UNSAFE - SOLVED

Yesterday I came across strange problem. I created new user on server. Lets call it user12_ABCDE. This user should run certain script periodically for simple FTP transfer. Of course crontab is used. When I run script manually it runs perfectly! When I put it in crontab, nothing happens! I check my  /var/log/messages and find this

Aug  4 16:01:01 server /usr/sbin/cron[27897]: (user12_ABCDE) UNSAFE (user12_ABCDE)

So... I start to google it! On lots of places you can find that it is usually permission issue. I check permission but every thing is OK. 
All other users can run their script in crontab with no problems. UNSAFE error message is only for this user12_ABCDE. 

After some time I run in to this sentence  about cron and UNSAFE

Some O/S restrict the range of characters in a username - some don't. - See more at: http://compgroups.net/comp.unix.admin/cron-fails-with-unsafe-in-log/51152#sthash.mUlmf92A.dpuf
 Some OS restrict the range of character in a username


This put a bug in my ear...
I checked all other user names with this new one. All users have small letters, underscore except this new user that have capital letters in his name. So... I create new user called user12_abcde and try do run crontab. It runs with no problem! Hm... So this cron job not running marked as UNSAFE is only because I have capital letters is my username.  

P.S.
I do not know if this issue is solved on newer Linux distros. I have this issue on SLES 10 kernel 2.6.16.21-0.8-smp.

Some O/S restrict the range of characters in a username - some don't - See more at: http://compgroups.net/comp.unix.admin/cron-fails-with-unsafe-in-log/51152#sthash.mUlmf92A.dpuf
Some O/S restrict the range of characters in a username - some don't - See more at: http://compgroups.net/comp.unix.admin/cron-fails-with-unsafe-in-log/51152#sthash.mUlmf92A.dpufSom
Some O/S restrict the range of characters in a username - some don't. - See more at: http://compgroups.net/comp.unix.admin/cron-fails-with-unsafe-in-log/51152#sthash.mUlmf92A.dpuf
Some O/S restrict the range of characters in a username - some don't. - See more at: http://compgroups.net/comp.unix.admin/cron-fails-with-unsafe-in-log/51152#sthash.mUlmf92A.dpuf
Some O/S restrict the range of characters in a username - some don't. - See more at: http://compgroups.net/comp.unix.admin/cron-fails-with-unsafe-in-log/51152#sthash.mUlmf92A.dpuf