I came across an obscure situation earlier this year with a new SharePoint farm that I had just designed and built. It was a fairly straight forward setup comprising of a dedicated SQL 2008 server, 2 x Web Front Ends utilising Windows NLB and 1 x Central Administration server all running Windows 2008 Enterprise R2 in a virtual environment utilising VMware ESX.
So after building the SharePoint farm, restoring the Content Databases and Shared Services Provider I thought to myself everything is actually going to plan for once, until…… I restarted one of the web front ends!
I had then noticed that the server that I had just restarted was taking an awfully long time for it login, often waiting 5-10 minutes before I would reach the logged in desktop. First area of troubleshooting was to launch Windows Task Manager, to immediately find out that the OWSTIMER.exe process was utilizing 90-100% of the CPU without any drop. My first instinct was to kill the process and by default, Windows will restart the process. Whalla!! The OWSTIMER process was back to normal and the Windows Server was back to its normal responsiveness. My immediate thoughts were, this is most likely a once off and boy was I wrong! After restarting the server a second time later that day, I ran into the same issue. My troubleshooting began and the first area I ventured into was Central Administration / Operations / Timer job status and Timer job definition to see if there were any jobs that were failing, however things seemed to be in order. Next area of investigation was analysing the SharePoint Logs (fun, fun, fun)… however, everything seemed to be in order there too. Scratching my head, I ventured into the trusty Windows Event Viewer and analysed the Application and System Logs and there it was, hidden away. I had noticed a Date and Time change from one event to another around the time the machine was starting up.
E.g. One event would be dated 15/03/2010 and then the next event would then be dated 16/03/2010! This sudden change in date and time whilst the machine was starting its services was throwing the OWSTIMER into a frenzy with the culprit being VMware!
I had just recently introduced a brand spanking new Vsphere 4 ESX host into an existing cluster which had the incorrect Date configured (in my case, it was running one day behind). In summary, my Windows 2008 R2 SharePoint front end server was picking up the VMware ESX host time when it was starting up and then a second later would synchronise with our Domain Controller and pick up the correct time. This sudden time shift (literally a single second) would cause the SharePoint Timer Service to have a fit and maxing out the processor.
Fair enough, I immediately amended the Date and Time on the ESX host and I was back in business.
Please be aware that this may not be the answer to your OWSTIMER CPU issue as I have come across plenty of posts around the same problem with differing causes.
I hope this post saves you some hours, because I had spent plenty, including re-building the SharePoint farm in question.
No Comments
Trackbacks/Pingbacks