This content is part of the Essential Guide: Taking charge of VM allocation, troubleshooting methods

Use log analysis tools to help you find problems in your data center

Log analysis tools do the heavy lifting when it comes to identifying the root cause of issues occurring in your data center, but they come with their own challenges.

For as long as things have gone wrong with hardware and software, we've had logs to help us figure out what happened. While logs normally don't spell everything out, they do give us an indication of where the problem started and what might have caused it. This staple in IT has remained unchanged for many years; however, the applications themselves have changed. They have continued to grow in size and complexity: Small log files are now replaced by one or more massive log files. In addition, the software stack has moved from a single server application to a multi-tier environment with a lot of moving and dependent parts. Taking size and dependency into consideration, the process you have to go through to find the problem is exhaustive, unless, of course, you take advantage of log analysis tools.

Troubleshooting manually

Without log analysis tools, not only would you have to identify which server the log you need is on, but then you'd have to identify which log file it might be in. Once you figured that out, you would then be able to concentrate on getting into the file and searching for clues. However, simply opening the file can be a challenge in itself. A text file that is gigabytes in size can be considered big data for most desktops. Though the logs contain what you need, locating the server, identifying the log and searching for the data is simply too much to take on.

Types of log analysis tools

If you decide to move to the cloud, keep in mind that it'll take more work on the front end to get the logs to the provider.

Unfortunately, it's simply not possible to scale-back on the size or volume of the logs if you want the ability to troubleshoot and conduct root cause identification. The good news is that there are a number of tools that can help the system admin out. VMware vRealize Log Insight, Loggy, Splunk and a host of others are now offered to help with this burden. These log analysis tools have some very unique benefits:

  • They collect data at large scales: One of the biggest challenges is that before you can correlate and use your logs, you first have to obtain them and convert them into a format that can be used, indexed and searched. These tools can pull your logs from a variety of sources and give you a firm base to start with.
  • Search/Index: Once you have your data, you can use it to analyze possible root causes of the issue. If an event occurred at a specific time, you can examine the logs occurring at the time in question across a wide variety of equipment to identify possible issues on a server or piece of hardware that may seem unrelated without the insight of the logs.
  • Insight: Having better insight on your applications/servers/networks can help businesses be more proactive when it comes to potential issues, which ultimately eliminates downtime and poor customer satisfaction.

However, not everything is clear when it comes to these log analysis tools; a few things need to be known. The process of log analysis and machine learning for issues is not easy. This is a very resource-intensive process that can require substantial hardware or virtual resources to operate. While the minimum resource requirement is 2 virtual CPUs (vCPUs) and 4 GB of memory, to truly take advantage of a product such as VMware's vRealize Log Insight tool, you will need 8 to 16 vCPUs and 16 to 32 GB of memory. Keep in mind that that doesn't produce real-time results; the amount of data is typically too large for that. That is why this data crunching process is often done in a cleanup mode rather than during the actual event.

Challenges of log analysis tools

When doing it internally, going through logs is an intense process with no real shortcuts, but you can take a shortcut by moving it to the cloud. When it comes to resources on demand, the cloud has them, and when it comes to data crunching your logs, it can scale about as wide as you need, for a price. If you decide to move to the cloud, keep in mind that it'll take more work on the front end to get the logs to the provider. Even though logs are typically text-based, they are often fairly large, so you'll need a big upload pipe. Logs don't carry password information, but they can carry IP addresses, server names, ports and so on. Security for this data is critical if it goes off-site. The final challenge is the cost itself. The cloud is simply not cheap, as you pay for every single CPU cycle, gigabyte of memory and space. Now, that's not saying your logs are not worth the cost, but even with the cloud the results are still not real time, so it might not be worth paying that additional premium for the cloud compared to just using internal resources.

Logs provide insight on the processes that occur in your data center and applications. With these new tools, we now have the ability to correlate the logs across multiple systems, which gives the system administrator a level of insight that has never been had. With today's multiple server applications, this ability is desperately needed. Real-time monitoring tools have always had a place in the front of the data center to identify and correct issues as they occur, but that is changing. The log analysis tools can now help to identify root cause, giving the admin the chance to prevent the issue from occurring in the future, which is taking center stage in the data center.

Next Steps

Familiarize yourself with other features of vRealize Log Insight

Learn what improvements vRealize Log Insight needs

Avoid future issues with Exchange logs

Big data power comes to the data center by way of IT operations analytics


Dig Deeper on Virtual machine monitoring, troubleshooting and alerting