ltstudiooo - Fotolia
One thing that I have found over the years in many organizations is that there is a lack of system logging, specifically when it comes to the virtualization stack. You may be asking yourself, "Why is this important?" Logging is a very critical component because it gives you visibility into your environment. This type of visibility can be key to determining the root cause of a failure or narrowing in on a system attack, as it will allow you to review events that took place prior to the failure or attack. And having a logging strategy in place for your virtualization stack is crucial because it touches so many different external components. Proper logging may keep you from looking for an issue in the wrong place and wasting time. Another reason is that you will most likely uncover issues that you may have never known existed, and this has been the case in every scenario that I have seen after setting up system logging.
Below are some key things to keep in mind when implementing a logging strategy in your environment:
- Timestamps: Make sure to use Network Time Protocol and time zones correctly on all devices you collect logs from.
- Data retention: Ensure that you have enough storage allocated to retain the amount of data required for your retention policy (30 days, 90 days, etc.). Also keep archival policies in mind for auditing purposes.
- Event correlation: The ability to correlate between devices by using timestamps and specific data to narrow in on which event actually took place prior to an incident (vSphere hosts, storage, network, etc.).
- Auditing: You will be able to use syslog data from devices to track logins, password changes and events like privileged execution of tasks. Each of these are important during an audit, whether it is PCI-DSS, SOX or just an internal audit conducted by your security team.
There are numerous types of syslog products available; your needs and budget will dictate which to choose. Some of the more popular products can be extremely expensive (e.g., Splunk), but they are still great products. VMware has its own Log Insight, which is a great product, but you will most likely need to obtain content packs for many products, which could end up costing a great bit. There are also many open source solutions available, which I personally lean towards because you will get the majority of the features and functionality of a paid product. However, with an open source product you may have to create some of the syslog parsing rules and create dashboards to provide a visual representation of your events. Two popular open source products I tend to see are Graylog2 and the ELK stack (Elasticsearch, Logstash and Kibana).
Once you have a good system logging product in place, you can also use other capabilities many of these products include, including automated alert notifications to your support staff. This can be accomplished by triggering specific actions based on an alert type or an exact alert message. Simply by implementing these alerts, you will become more proactive in your environment because you will be notified prior to an incident becoming a much larger issue.
No matter which product you choose, just choose one and start getting ahead of your environment with system logging instead of scrambling to catch up when something goes wrong.