ELK Stack... Not!!! FEK, it is.!!! Fluentd, Elasticsearch & Kibana

If you are here, you probably know what elasticsearch is and at some point, trying to get into the mix. You were searching for the keywords "logging and elasticsearch" or perhaps, "ELK"; and probably ended up here. Well, you might have to take the following section with a pinch of salt, especially the "ELK Stack"  fam.

At least from my experience, working for start-ups teaches oneself, a lot of lessons and one of the vast challenges include minimizing the resource utilization bottlenecks.
On one hand, the logging and real-time application tracking are mandatory; while on the other hand, there's a bottleneck in the allocated system resource, which is probably an EC2 instance with 4Gigs of RAM.

ELK Stack 101:

Diving in, ELK => Elasticsearch, Logstash, and Kibana. Hmm, That doesn't add up; don't you think? Elasticsearch stores the reformed log inputs, Logstash chops up the textual logs and transforms them to facilitate query, derivation of meaningful context, thereby, aiding as an input source to be visualized in Kibana.
Logstash uses grok patterns to chop up the log, doesn't it? So, an essential amount of time needs to be invested in learning how these patterns are different from that of traditional regular expressions.
But... But, who's gonna ship the logs from the application to Logstash and this shipping needs to be seem-less. Well, There's filebeat provided by elastic co, to ship all those.

So, Is it supposed to be ELFK or perhaps, FLEK stack? (WT*) 
You, be the judge!

Using four applications, singing to each other, what could go wrong?

WARNING: The following infographic may contain horrifying CPU spikes, that some readers might find disturbing.


Well.. Well.. Well.. What do we have here?

Extracting valuable information from logs is more like an excavation, digging deep to excavate the hidden treasures. It can't be at the cost of resource utilization.

Introducing, the FEK Stack.

Enter Fluentd AKA td-agent, an open-source data collection tool written in Ruby (not JAVA!!! Ruby - 1 Java - 0).


The setup is way too easy, that you can be up and running in no time.
# td-agent a.k.a fluentd installation
wget http://packages.treasuredata.com.s3.amazonaws.com/2/ubuntu/trusty/pool/contrib/t/td-agent/td-agent_2.3.5-0_amd64.deb
sudo dpkg -i td-agent_2.3.5-0_amd64.deb
############################################
# elasticsearch installation
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.4.1.deb
sudo dpkg -i elasticsearch-5.4.1.deb
############################################
# kibana installation
wget https://artifacts.elastic.co/downloads/kibana/kibana-5.4.1-amd64.deb
sudo dpkg -i kibana-5.4.1-amd64.deb
############################################
# Essential td-agent gems installation
## s3
sudo /usr/sbin/td-agent-gem install fluent-plugin-s3 --no-document
## elasticsearch
sudo /usr/sbin/td-agent-gem install fluent-plugin-elasticsearch --no-document
############################################
# Permission assignments
sudo chown td-agent:td-agent -R /var/log/td-agent/buffer
############################################


Locate to /etc/td-agent/ and replace the existing configuration template (td-agent.conf) with the following configuration.

<source>
format none
@type tail
path /home/ubuntu/logGenerator/*.log
pos_file /var/log/td-agent/buffer/demolog.log.pos
read_from_head true
format /\[(?<timestamp>.*)\] (?<source>.*)\.(?<severity>.*): (?<message>.*)=>(?<identifier>.*)$/
tag s3.demolog
</source>
<match s3.demolog>
<store>
type elasticsearch
host localhost
port 9200
index_name demolog
include_tag_key true
tag_key @log_name
logstash_format true
flush_interval 10s
</store>
</match>
view raw td-agent.conf hosted with ❤ by GitHub

The parameters are self-explanatory and the keyword: format is where the regex for log chopping is given. An important thing to note is the tag keyword. The value described here should be used in the <match> segment. This bonding between the source and the mapping happens with the aid of this keyword.

For demonstration purposes, you can use the following snippet of code for random log file generation.

https://github.com/datawrangl3r/logGenerator

The configuration file is synced with this code; it shouldn't be a hassle.

Thanks for reading.
Let me know how it all worked out in the comments below!

Featured Posts

ETL & Enterprise Level Practices

  ETL Strategies & Pipelines have now become inevitable for cloud business needs. There are several ETL tools in the market ranging fro...