Error handling is such a pedestrian issue, you take it for granted. Same thing with transaction and trace logs \u2014 these are just so obvious. In a \n\nsimple virtualization cloud (e.g., a MySQL server) you can take low-level error logging for granted at several layers of the software stack.\n\n\nBut that doesn't mean you can take persistence of those logs for granted. Your logs may be wiped out if you didn't configure the VM properly, or if \n\nyou didn't pay your cloud hosting provider to persist the logs if the VM crashes. Without those server-side logs, your troubleshooting is set back \n\nseveral hours, if not days. In unattended systems, the lack of persistent logs becomes a serious issue, and there are good forensic reasons why you'll \n\nwant to archive logs for long periods. So don't skimp here.\n\n\nThe situation is even more interesting with cloud-based applications. Some of them don't have any server side logging at all, and even the best of \n\nthem do full-scale logging only for a while \u2014 turning the logging off without any explicit notification. The best of them have a good story for \n\ndebugging. But for reconstructing the crime of error conditions that occurred several hours ago...well, if you're lucky, you'll get an e-mail with the error \n\nstack trace. Goody.\n\n\nSo in the cloud (and particularly across clouds), you can't really count on server- (or service-) side error logging. Guess what: if you want to know \n\nwhat's really going on with server transactions, you'll have to write your own server-side error-throwing code. Generally speaking, you want to \n\nsupplement the server cloud's native error logs with calls to a centralized error-logging service like Splunk, Exceptional, or SysLog based offerings. \n\nThese things are cheap in comparison to an untrapped error.\n\n\nFurther, you need to develop a cloud logging strategy that focuses on the requester (or client) side for persisting the errors. Why do you need to \n\ndevelop this logging code?\n\n\n\u2022 Only on the requester side can you know the application-level context under which the error occurred. On the server side, all the service \n\nknows for certain is a requester's URL\/IP address, the call parameters, and a time stamp. Of course you could include a client-side thread identifier with \n\neach server call, but why not just do logging on both sides?\n\n\n\u2022 Only the requester can log an error of "server not responding." And of course, only the requester can launch a retry or transaction-recovery \n\nstrategy if the server's gone silent.\n\n\n\u2022 The requester knows (or can determine) exactly which transaction(s), user(s), and record(s) are affected by the error. The requester's error-\n\nhandling code may be able to execute a strategy that satisfies the application's needs (e.g., by putting the transaction in an "on hold" state) without the \n\nserver response. At the very least it can put the application data in a self-consistent state, so that the server error doesn't propagate into an \n\napplication-wide crash. Further, the requester can put as much transaction context and history as it wants to into local log files, to speed \n\ntroubleshooting and recovery.\n\n\nUnfortunately, client-side error handling and logging seem to be a "new requirement" for many developers. Just like code comments and \n\ndocumentation. Developers focus on satisfying the positive case, rather than dealing with the inevitable negative case (or even the null case of \n\n"nothing happened"). Cloud application developers need to be encouraged to develop better habits of error handling, since they often can't depend \n\non the server logs for the dirty work.\n\n\nIn developing the requester \/ client side error management, it's a good idea to think about where the logs should be kept. The traditional "throw a \n\nmessage into the text file" strategy doesn't work all that well in a virtualized environment, as that text file may be on another VM, with a file location \n\nthat might be tough to track down a month or so later. (And yes, your strategy needs to assume that you may not know about the need to \n\ntroubleshoot\/debug for several months after the fact, so the error logs will need to support some level of forensics.)\n\n\nWhat about storing the detailed error information in with the record that was involved? With a big stack trace (think J2EE), this can gobble up room. \n\nBut disk space is disk space, and a few BLObs won't really slow down your database much. The problem with this strategy is "what happens on the \n\nsecond (perhaps related) error involving a single record?" Your developers are smart enough to figure out how to archive past errors \u2014 give it \n\nto them as a programming challenge.\n\n\nA final area of error logging is notification: how is someone supposed to know to go looking for a problem? Many cloud environments don't have \n\na strong feature set for this, which means either a continuing accumulation of error conditions or a periodic chore for your admins and developers. \n\nWhether you want to fire off a message to a Twitter feed or turn on the USB port that powers on lava lamps, appropriate real-time notification is the \n\nfirst step toward error containment and rapid recovery.