How To Develop a Cloud Computing Error Logging Strategy
Error handling and trace logs are as old as the hills. Of course this is handled in the clouds, right?? Keep laughing...
Tue, June 07, 2011
CIO — Error handling is such a pedestrian issue, you take it for granted. Same thing with transaction and trace logs — these are just so obvious. In a simple virtualization cloud (e.g., a MySQL server) you can take low-level error logging for granted at several layers of the software stack.
But that doesn't mean you can take persistence of those logs for granted. Your logs may be wiped out if you didn't configure the VM properly, or if you didn't pay your cloud hosting provider to persist the logs if the VM crashes. Without those server-side logs, your troubleshooting is set back several hours, if not days. In unattended systems, the lack of persistent logs becomes a serious issue, and there are good forensic reasons why you'll want to archive logs for long periods. So don't skimp here.
The situation is even more interesting with cloud-based applications. Some of them don't have any server side logging at all, and even the best of them do full-scale logging only for a while — turning the logging off without any explicit notification. The best of them have a good story for debugging. But for reconstructing the crime of error conditions that occurred several hours ago...well, if you're lucky, you'll get an e-mail with the error stack trace. Goody.
So in the cloud (and particularly across clouds), you can't really count on server- (or service-) side error logging. Guess what: if you want to know what's really going on with server transactions, you'll have to write your own server-side error-throwing code. Generally speaking, you want to supplement the server cloud's native error logs with calls to a centralized error-logging service like Splunk, Exceptional, or SysLog based offerings. These things are cheap in comparison to an untrapped error.
Further, you need to develop a cloud logging strategy that focuses on the requester (or client) side for persisting the errors. Why do you need to develop this logging code?
• Only on the requester side can you know the application-level context under which the error occurred. On the server side, all the service knows for certain is a requester's URL/IP address, the call parameters, and a time stamp. Of course you could include a client-side thread identifier with each server call, but why not just do logging on both sides?
• Only the requester can log an error of "server not responding." And of course, only the requester can launch a retry or transaction-recovery strategy if the server's gone silent.