Agent Based Logging
Posted by Jason Bolden on Jul 25, 2021
Whether you're working in site reliability, business intelligence, or security, one thing rings true. Logs are king. They are a window into your operations, providing insights into access, change, performance, who, what, when, where, and why. However, just as logs can be invaluable they can just as easily be burdensome and costly. Poor log hygiene plagues many organizations. While many business functions benefit from logs, they don't benefit from all logs and the excess translates directly to cost. In this post we'll explore some options for logging instrumentation that aid in filtering, routing, and maintaining the flow of logs at the host level.
Objective
In this post we'll explore 3 popular logging agents to better understand the pros and cons of each solution for different scenarios. The contenders are:
Created in 2011, Fluentd is one of the most popular logging agents in the cloud computing space. The project was open-sourced in the same year, and Google supports their own flavor as their standard cloud logging agent. In 2014, Fluentbit was created as a lighter weight agent for IoT workloads.
NXLog was created back in 2009 as an alternative to msyslog. Originally a closed source project, NXLog Community Edition was open-sourced in 2011 and has been free since. NXLog has a reputation in the cybersecurity space as a windows event log collector. SANS' SEC555 course references NXLog as a reliable free option for gaining visibility into windows and system logging.
Apache NiFi began as a project created by the NSA. It was later introduced to the Apache Software Foundation, and subsequently commercialized by Hortonworks (now Cloudera). The tool is a robust Data Flow Controller with the goal of making the automation and management of ETL processes simpler and more maintainable. MiNiFi is a sub-project that borrows the fundamental concepts defined by NiFi, but packages them in a smaller form factor for deployment to endpoints and IoT devices.
We'll run through a basic deployment to a Windows desktop to demonstrate local setup and Kubernetes for cloud. In my personal experience, I've leveraged NiFi quite extensively. To be as objective as possibly in our evaluation, we'll measure each tool according to the following criteria:
Documentation
- This will be judged on completeness, number of examples, and searchabilityEase of Use
- How short is the time to get up and running, does the agent support monitoring, is it easy to maintain?Cloud Readiness
- What cloud providers are supported? Does the documentation describe cloud installation?Architecture
- How well is the agent designed, how resilient is it to operational disruption, can the base functionality be easily extended?
For the sake of time, evaluating ease of use and documentation will be based on my ability to set up the agents after 30 minutes of reading the documentation. The following Figures depict our lab setup.
All configurations and scripts for this post can be found in this GitHub repo.
Tip
When investigating multiple tools that solve the same problem, it's always a good idea to use time boxes. While 30 minutes is a bit extreme, it's not uncommon to dedicate a day to hacking out an MVP to assess the feasibility of using one solution over another.
The Breakdown
Logging agents typically follow a 3 part architecture. Source
, defines how the agent interfaces with the log producing system. Channel
, a transport layer. Data are stored, transformed, and/or filtered in this layer. Lastly, Sink
defines the interface with the destination of the data. As we describe the architecture of the three Solutions, we'll tie their components back to these three constructs for consistency and ease of comparison.
Note
The astute would recognize this terminology from the Apache Flume project. This was the first logging agent I used in a production setting. Many projects have sprung up since, but their underlying architecture remains more or less the same.
We'll also be using this short python script to produce our applications logs in each test.
1 2 3 4 5 6 7 8 |
|
This script is used in both the Local and Cloud setups. The following is the Dockerfile
definition for our chatty-app.
1 2 3 4 5 6 7 |
|
Minikube and Docker should be installed on your machine if you intend to follow along. chatty-app
must be built with the minikube environment variables set beforehand.
1 2 3 4 5 |
|
Fluentbit
Fluentbit doesn't deviate much from our established formula. Configuration of the agent boasts a simplistic format and schema, utilizing Sections
and indented key/value paired Entries
30 Minute Sprint
After 30 minutes of combing the documentation, I felt pretty comfortable to start hacking away at an MVP. The docs are organized using GitBook, making it fairly painless to navigate. If I had to be nit-picky, there's quite a few grammatical errors throughout. It's not unreadable, but it does throw one off a bit.
Concerning the bells and whistles, security is supported through the use of TLS. All outputs
that require network I/O support options for TLS configuration. Agents are resilient through the use of buffering, enabling persistance to disk. Agent health can be monitored via API calls when configured. Alternatively, one could use the Prometheus Exporter output plugin to route monitoring metrics directly to a prometheus server. configuration.
Fluentbit also has a handy visualizer to aid in flow development. This can be very useful when troubleshooting complex dataflows.
Attention
Take care with buffering strategies on cloud workloads. Due to the ephemeral nature of container resources, one should ensure the Fluentbit storage paths point to persistent volumes in your K8S deployment.
Local Setup
After downloading the ZIP archive for windows, create two new *.conf
files to save the following settings:
fluent-bit.conf
defines our data flow.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[SERVICE] flush 5 daemon Off log_level info parsers_file parsers.conf [INPUT] name tail tag log_file path ../log-file.log parser custom [INPUT] name winlog tag win_log channels Security Interval_Sec 1 db winlog.sqlite [OUTPUT] name tcp match * host 127.0.0.1 port 8514 format json_lines
parsers.conf
holds our parsing definitions. In our case, we only define one parser to handle logs from thelog-gen.py
script.1 2 3 4 5 6
[PARSER] Name custom Format regex Regex ^(?<time>[^ ]*) (?<host>[^ ]*) (?<message>.*)?$ Time_Key time Time_Format %Y-%m-%dT%H:%M:%S
Attention
Ensure that the tail
INPUT path parameter matches the path used by the log-gen.py
script.
With our config files in hand, we can run the following commands to start our test.
- Run the logger script.
1
❯ python log-gen.py log-file.log
- In another terminal, with administrative privileges.
1 2
❯ cd td-agent-bit-* ❯ ./bin/fluent-bit.exe -c ./conf/fluent-bit.conf
- Lastly, on the receiving end, we'll start our TCP listener.
1 2 3 4 5 6 7
# From WSL ❯ nc -l 127.0.0.1 8514 {"date":1627405959.0,"host":"BoldDesktop","message":"Hello, World!"} {"date":1627405961.0,"host":"BoldDesktop","message":"Hello, World!"} {"date":1627405963.0,"host":"BoldDesktop","message":"Hello, World!"} {"date":1627405965.0,"host":"BoldDesktop","message":"Hello, World!"} {"date":1627405967.0,"host":"BoldDesktop","message":"Hello, World!"}
Note
I've left out the windows event logs from the samples above to minimize sharing of sensitive information.
Cloud Setup
More time was spent configuring the chatty-app container and troubleshooting minikube than actually configuring the fluent-bit container.
The config files for this setup are listed below:
fluent-bit.conf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[SERVICE] flush 5 daemon Off log_level info parsers_file parsers.conf [INPUT] name tail tag log_file path /var/log/log-file.log parser custom [OUTPUT] name stdout match *
parser.conf
is the same as before.cloud-logging-fluentbit.yml
is our pod configuration for K8S1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
apiVersion: v1 kind: Pod metadata: name: test-pod spec: containers: - name: chatty-app image: chatty-app imagePullPolicy: Never volumeMounts: - name: varlog mountPath: /var/log - name: fluent-bit image: fluent/fluent-bit volumeMounts: - name: varlog mountPath: /var/log - name: config mountPath: /fluent-bit/etc volumes: - name: varlog emptyDir: {} - name: config configMap: name: fluentbit-configmap
The fluentbit container config mounts our user defined configmap to the expected directory. To deploy run the following.
1 2 3 |
|
pod
and view the logs of the fluent-bit container.
Note
Fluentd and Kubernetes both document using Fluentbit agents in a daemonset configuration rather than a sidecar. For consistency in our testing we will not follow that recommendation; however, the sidecar configuration is still a recommended logging pattern documented by Kubernetes.
NXLog
NXLog may appear simple conceptually; however, each component is very extensible.
30 Minute Sprint
NXLog is definitely a very mature project. The documentation is dense and thorough. When I stumbled on their expression language page, I knew I was headed into power user territory. NXLog Routes
process event records
, collections of fields
. The configuration file is made up of 2 main constructs, Directives
and Modules
. There are 4 types of Modules
. Inputs
consume event records from source systems and optionally parse incoming records. Processors
provide functionality for transforming, buffering, and/or filtering event records. Outputs
emit event records to a destination system. Lastly, Extensions
provide extended functionality to the NXLog language. Directives
are parameters that define the various components of the NXLog configuration.
Several examples of buffering strategies are documented. Monitoring functionality is lackluster. Documentation only mentions OS specific methods for ensuring NXLog runs as a service; however, collection of health metrics is not mentioned. There are several useful features that are only available in the enterprise version of the agent. Their feature comparison section goes into more depth. In all, solid knowledge base backing the project. Almost overwhelming, but not so much so that we can't get an MVP running quickly. Security is supported via Input and Output modules that handle SSL/TLS configurations.
Local Setup
After downloading the installer and completing installation, create one new *.conf
files to save the following settings:
nxlog.conf
defines our data flow.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
<Extension _json> Module xm_json </Extension> <Input winlog> Module im_wseventing Exec to_json(); <QueryXML> <QueryList> <Query Id="0" Path="Application"> <Select Path="System">*</Select> </Query> </QueryList> </QueryXML> </Input> define EVENT_REGEX /(?x)^(\d+-\d+-\d+T\d+:\d+:\d+)\s+(\S+)\s+(.*)/ <Input file> Module im_file File "C:\\logs\\log-file.log" <Exec> if $raw_event =~ %EVENT_REGEX% { $EventTime = strptime($1,'%Y-%m-%dT%T'); $Host = $2; $Message = $3; to_json(); } else drop(); </Exec> </Input> <Output tcp> Module om_tcp Host 127.0.0.1 Port 8514 </Output> <Route winlog_to_tcp> Path winlog => tcp </Route> <Route file_to_tcp> Path file => tcp </Route>
Attention
Ensure that the File
directive matches the path used by the log-gen.py
script.
Testing procedures are as follows.
- Run the logger script.
1
❯ python log-gen.py log-file.log
- In another terminal, with administrative privileges.
1 2
❯ cd $NXLOG_PATH ❯ ./nxlog.exe -c ./nxlog.conf
- Lastly, on the receiving end, we'll start our TCP listener.
1 2 3 4 5 6
# From WSL ❯ nc -l 127.0.0.1 8514 {"EventReceivedTime":"2021-07-27 22:34:56","SourceModuleName":"file","SourceModuleType":"im_file","EventTime":"2021-07-27 22:34:56","Host":"BoldDesktop","Message":"Hello, World!"} {"EventReceivedTime":"2021-07-27 22:34:58","SourceModuleName":"file","SourceModuleType":"im_file","EventTime":"2021-07-27 22:34:58","Host":"BoldDesktop","Message":"Hello, World!"} {"EventReceivedTime":"2021-07-27 22:35:00","SourceModuleName":"file","SourceModuleType":"im_file","EventTime":"2021-07-27 22:35:00","Host":"BoldDesktop","Message":"Hello, World!"} {"EventReceivedTime":"2021-07-27 22:35:02","SourceModuleName":"file","SourceModuleType":"im_file","EventTime":"2021-07-27 22:35:02","Host":"BoldDesktop","Message":"Hello, World!"}
Attention
NXLog for Windows is an .msi
installation. For my setup, I added the path to the nxlog.exe
to my system path for ease of use. $NXLOG_PATH
is not defined by default.
Note
Again, I've left out the windows event logs from the samples above to minimize sharing of sensitive information.
Cloud Setup
The K8S config for NXLog is a bit different than for Fluentbit. Instead of writing the logs to stdout
, we've configured NXLog to write them to the agent's internal log file. NXLog does not support writing to stdout
.
NXLog documentation provides instructions to build the docker image for the agent locally; however, it can be pull remotely from DockerHub.
The config files for this setup are listed below:
nxlog.conf
- Logs are output to theom_null
module, and we use thelog_info()
function to capture them instead1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
User nxlog Group nxlog LogFile /var/log/nxlog.log LogLevel INFO <Extension _json> Module xm_json </Extension> define EVENT_REGEX /(?x)^(\d+-\d+-\d+T\d+:\d+:\d+)\s+(\S+)\s+(.*)/ <Input file> Module im_file File "/var/log/log-file.log" <Exec> if $raw_event =~ %EVENT_REGEX% { $EventTime = strptime($1,'%Y-%m-%dT%T'); $Host = $2; $Message = $3; log_info(to_json()); } else drop(); </Exec> </Input> <Output null> Module om_null </Output> <Route file_to_tcp> Path file => null </Route>
cloud-logging-nxlog.yml
is our pod configuration for K8S1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
apiVersion: v1 kind: Pod metadata: name: test-pod spec: containers: - name: chatty-app image: chatty-app imagePullPolicy: Never volumeMounts: - name: varlog mountPath: /var/log - name: nxlog-ce image: nxlog/nxlog-ce args: ["-c", "/etc/nxlog/nxlog.conf"] volumeMounts: - name: varlog mountPath: /var/log - name: config mountPath: /etc/nxlog volumes: - name: varlog emptyDir: {} - name: config configMap: name: nxlog-configmap
Like before, the NXLog container config mounts our user defined configmap to the expected directory. In this case, the path to the config is passed as a container argument. To deploy run the following.
1 2 3 |
|
pod
and start the terminal for the nxlog-ce container. Execute tail -f /var/log/nxlog.log
to see the agent logs.
Note
NXLog documents both Daemonset and Sidecar configurations for logging.
MiNiFi
Coined the Swiss Army Knife of Dataflows, there's almost nothing you can't do with Apache NiFi. Eat your heart out on the documentation.
30 Minute Sprint
NiFi/MiNiFi uses Processors
to create, transform, and transmit Flowfiles
. That's pretty much it. Full disclosure, I have many years of experience working with NiFi. I've deployed and maintained NiFi clusters in multiple data centers, developed configurations for MiNiFi in the cloud, and deployed agents to thousands of self service machines. I'm no stranger to the tool, so the 30 minutes time cap doesn't really apply here.
That said, my experience with this tool did not help me as much as it should have for our two test cases. For the sake of time and, frankly, to limit the scope of this post, I had to cut the testing of the MiNiFi agent short as I was going way too far in the weeds trying to get the configuration working. While NiFi and MiNiFi sport all the features you'd want in an enterprise setting (security, resiliency, monitoring, flexibility, extensibility, etc.), the tool is massive and requires a lot of overhead to utilize effectively.
Not to add insult to injury, but MiNiFi has two version, C++ and Java. The Java agent, for all intensive purposes, is a headless version of NiFi with fewer core packages included. It's still a hefty agent, but you have 100% feature parity with the server version. The C++ agent is much lighter, however not all functionality is supported.
Local Setup
MiNiFi agents are configured using as single yaml
; however, it is not recommended that one writes a configuration from a text editor (you'll soon see why). The Quickstart walks through how to create a dataflow from the NiFi UI, save it as a template, export, and convert to the .yml
config accepted by MiNiFi.
After conversion, the flow above becomes...
conf.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
|
Testing procedures are as follows. We're not going into great detail on this one.
- Run the logger script.
1
❯ python log-gen.py log-file.log
- In another terminal, ensure that the
config.yml
file is in theconf
directory for MiNiFi. Start the agent.1
❯ ./run-minifi.bat
- Lastly, on the receiving end, we'll start our TCP listener.
1 2 3 4 5 6
# From WSL ❯ nc -l 127.0.0.1 8514 2021-07-28T21:35:05 BoldDesktop Hello, World! <Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Microsoft-Windows-Security-SPP' Guid='{E23B33B0-C8C9-472C-A5F9-F2BDFEA0F156}' EventSourceName='Software Protection Platform Service'/><EventID Qualifiers='49152'>16394</EventID><Version>0</Version><Level>4</Level><Task>0</Task><Opcode>0</Opcode><Keywords>0x80000000000000</Keywords><TimeCreated SystemTime='2021-07-29T04:37:49.7185935Z'/><EventRecordID>3230</EventRecordID><Correlation/><Execution ProcessID='0' ThreadID='0'/><Channel>Application</Channel><Computer>BoldDesktop</Computer><Security/></System><EventData></EventData></Event><Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='SecurityCenter'/><EventID Qualifiers='0'>15</EventID><Version>0</Version><Level>4</Level><Task>0</Task><Opcode>0</Opcode><Keywords>0x80000000000000</Keywords><TimeCreated SystemTime='2021-07-29T04:38:03.9292335Z'/><EventRecordID>3231</EventRecordID><Correlation/><Execution ProcessID='0' ThreadID='0'/><Channel>Application</Channel><Computer>BoldDesktop</Computer><Security/></System><EventData><Data>Windows Defender</Data><Data>SECURITY_PRODUCT_STATE_ON</Data></EventData></Event><Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Microsoft-Windows-Security-SPP' Guid='{E23B33B0-C8C9-472C-A5F9-F2BDFEA0F156}' EventSourceName='Software Protection Platform Service'/><EventID Qualifiers='16384'>16384</EventID><Version>0</Version><Level>4</Level><Task>0</Task><Opcode>0</Opcode><Keywords>0x80000000000000</Keywords><TimeCreated SystemTime='2021-07-29T04:38:23.8359934Z'/><EventRecordID>3232</EventRecordID><Correlation/><Execution ProcessID='0' ThreadID='0'/><Channel>Application</Channel><Computer>BoldDesktop</Computer><Security/></System><EventData><Data>2121-07-05T04:38:23Z</Data><Data>RulesEngine</Data></EventData></Event>2021-07-28T21:35:07 BoldDesktop Hello, World! 2021-07-28T21:35:09 BoldDesktop Hello, World! 2021-07-28T21:35:11 BoldDesktop Hello, World!
Attention
The steps to convert the NiFi .xml
template to the .yml
config file have been omitted.
Cloud Setup
Leaving this as an exercise for the brave. While I've done this in the past, revisiting the subject has made me realize the tool is not suited for these quick evaluation scenarios.
Conclusion
This evaluation took longer than expected. Though all three tools are great products, each has their own unique advantages and disadvantages in comparison.
Tool | Documentation | Ease of Use | Cloud Readiness | Architecture |
---|---|---|---|---|
Fluentbit |
||||
NXLog |
||||
MiNiFi |
Fluentbit
- Fluentbit is the youngest on the block with lots of momentum. It's built for rapid prototyping and minimal enough to meet a majority of use cases while still being easy to use. Relatively, documentation is less mature than the others but still isn't bad. Architecture is simple, but loses out in comparison to its predecessors. It makes up for this in user friendliness and cloud readiness (Kubernetes includes FluentD in its logging strategy documentation!)NXLog
- Solidly in the middle ground. Old tool with a mature community. The NXLog language gives the user flexibility yet doesn't require in depth knowledge to use.MiNiFi
- Documentation and Architecture are solid for the NiFi project. I will continue to advocate that this is hands down one of the best tools for dataflow management. However, the overhead of maintaining this platform is not for the startup or tinkerer. Cloudera does provide a solution that simplifies the management of agents and deployment of configurations, but it is behind a pay wall. If you're in an enterprise setting and you're keen on a robust dataflow strategy, it may be worth the time investment. Otherwise, stick to one of the other two.
References
- Fluentbit Documentation
- NXLog Documentation
- MiNiFi Documentation
- Kubernetes Logging Architecture
- Minikube