Logging: Logstash and other things – Jordan Sissel from DreamHost – PuppetConf '12
Tutorial Hosting
I’m happy to introduce Jordan suzelle who is the Czar of logging for DreamHost always have to introduce somebody that in practices hate driven ops like the rest of us so Jordan no work station locked alright sweet so I’m here to talk to you guys about logs – and logs and other things I
Did promise other things my name is Jordan – cell as mentioned I work at DreamHost doing logging stuff this is one of the first times they’ve been able to work on an open source project full time which is a pretty incredible milestone for for me and the project I
Think but originally I’m assisted bin by trade I started operations 810 years ago more development role now but I still have that hostile mindset I’ve been using puppet for about four years things that I like I really love nethack these are things that I can talk about literally all day
I like coding and as of about nine months ago I have a little baby girl so I will talk about babies all day long but I’m not here to talk about those things first off I like to talk about a little bit about some of my other
Projects the first one being this event it’s a it’s an advent calendar type project where there are the first 24 24 days of crib Christmas Gees of December one article per day very system in focused very getting stuff done focused it’s a four year old this is this will
Be the fifth year of the project so there’s about a hundred articles online from the past four years and they’re really well with reading some puppet tricks I’ve been when I use puppet I’ve known to use it in masterless mode which means I don’t have a puppet master which
Helps scaling very easily and I also do this thing called notice where you have no node definitions and no external node classifier and if you’re interested in those things talk to me later happy to describe them I also wrote this project called fpm mostly because I was fed up
With building rpms and then I switched a job and now I have to use Debian packages and all the tools are different so I want one tool to sort of rule them all first step a story how everything kind of went wrong you guys probably recognize some of this if you’re leaning
Forward and trying to squint you’ve already failed the test there’s too much data here to read right you as a human you kids can’t you’d have to sit here get a pen and paper track stuff figure out what’s going on is there a problem here we don’t know so this happens
By the way you can’t see the time but it’s 4:30 in the morning Nagios is like hey hey wake up and you’re like dude what disc is critical what does that mean hold me it says and this is you basically if four in the morning anytime
Something wakes you up so you login you realize that your distal your distal because you don’t rotate your logs so you’re like bucket it’s 4:00 in the morning delete that I’ll worry about it when I’m awake what else sucks shitty error messages what the does that mean apparently it means this
That’s how I would feel so the next day after you got woken up at 4:00 in the morning blue whale logs somebody comes in business guy we have an angry customer help and you haven’t had sleep really so you but you’re like you know what this is my
Time you got adrenaline pumping I’m gonna save the day you swing in you’re like pearl you fly out of the room you make this and the business guys can understand this they see there’s this weird anomaly there and they work with the customer and figure out the problem
Is but now the business guys like you’re my hero and if you’ve ever looked at comics or like superhero universes whenever there’s a hero or a superhero there’s an opposite there’s a villain there’s a supervillain right anytime the superhero is is out of touch is lost the supervillain culmann’s and screws
Up right so now everyone has to rely on you to wipe their butts and that sucks like you were once upon a time you were assisted man you were an obstinate and it was cool but now you’re doing this one-liner things in Pearl that felt cool initially and now you’re doing it every
Day and it really is a drag you’re still getting woken up 4:00 in the morning right now yes and now you’re a human keyboard and by human keyboard I mean people are using you as their computer interests which sucks so don’t be a human keyboard before I start talking
About logging in log stash and things like that I want to frame discussion when I say a log what is a log it’s not one of these tree things you read that cool so this is the log but what are the features of a log it has a timestamp
Right and has some data but that’s about as common as a schema if you if you will that you’ll find with with logs right the the data format is gonna vary by the application so they only think two things that I found that are in common
It has a timestamp and there’s some data associated with it what’s the data I don’t know so that’s basically that when I say log this is what I mean a single event with some data attached to it and then general the life of log when the application admits it or records it you
Can transmit that somewhere over the network you might analyze it you’ll store it somewhere for a little while and you’d leave it right this is standard when you’re logging to syslog and your log rotate it and it goes away after a couple of days there’s some related
Tools in this area as well that I feel it worth mentioning for transportation flume fluent D scribe and some syslog projects are very useful for certain analytics Hadoop is really popular but there’s also more log focused ones great log and Elsa as well as long-term storage things like HDFS
Cassandra and lasting search you might notice these are own I’ve only listed open source tools because I don’t feel like commercial products need my might help with marketing so how can log stash help first log stash is an open source tool it’s free freeze and beer free is
In speech it’s pretty awesome if you ask me so before I talk about exactly what you can solve with log stash I’m just gonna show a couple of screenshots of a demo so here’s a simple example that pretty much that same log chunk of Apache that we saw earlier again if
You’re leaning forward in squinting you’ve already failed so in log stash the config file tries to be a little bit simple I’m saying watch this file there’s a patchy log file apply some kind of filter to it I’ll talk about grok in a little bit but basically you
Can read this as it’s parsing some kind of Apache log format take the time stamp that is a really dumb timestamp from Apache and do something with it it’s sort an elastic search very pluggable system and the result is that I can then do some cool search analytics on my logs
I can ask for the top downloads for logs – I can find out that people are automating installations of log stats were chef the query at the top says agent : chef these are people installing it automatically and if i zoom in I can see that every six minutes this one
Particularly address is running chef and installing I guess or at least downloading log – which is weird right every six minutes it seems a little too frequently so what else can we solve a logs – regular expressions that that moment earlier when you all swing it into the room valiantly type some pearl
On a keyboard and left and it was amazing you probably use some regular expressions it might have looked like this I hope not does anyone know what this would match Joey shut it out what time stamp well there’s like months twice I don’t know why that is that’s weird right uh-huh
Somebody’s clever there’s IP addresses so so far we look at it we say the weather’s sometime –is– thing and some IP address looking thing is TC you think TCP dump how about a chief proxy do you believe me could you could you prove it could you test it if you managed to
Write that could you do it the next day if you wrote that could you even read it the next day so that’s what brought comes in crock is a very what’s the best way to describe it it’s a thin layer on top of a regular expression engine you
Write patterns and give them a name so instead of doing something cool and you write it at to match an IP address you just write on IP address pattern and reuse it everywhere else just like you would with a code library or a puppet module you write it once to reuse it
Lots of places and it comes with a lot of default patterns to match numbers network addresses including IP addresses and MAC addresses URLs all other kinds of boring things that you don’t have to write yourself and the neat property about this is that it doesn’t it removes
The need for you to have any regular expression skills like whoever shot it out that that previous record expression matched part of an IP address somewhere like that skill set you can just mostly get rid of which i think is cool it frees up your brain cells so the idea is
That you specify the syntax it should match and you give it a semantic name an example here is that an Apache there’s a field that Matt that’s the number of bytes sent in the request well that’s a number you want to call it bytes very simple Stanford quoted strings it also
Has more complex things that like URL path parameters we’re math really long parameters and try and chunk it up for you but not everything is a string so Brock gives you these type conversions like number was an example if you were to capture a number with a regular
Expression yourself it would come out as a string and if you want to use that in some calculation later you would have to turn it into a number before you can use it grok helps you with that so here’s an example sample log it’s pretty simple syslog and you just use different
Patterns to describe the different parts again you’re using roughly English to describe it the structure of this log format and syslog timestamp for example breaks down into a month some spaces month day and time same for host name like I don’t want to write host name I
Wrote it once and I have some tests that verify that it works and never touch it again same for syslog program which is a which is a subset and I mentioned tests like how often do you write regular expressions and then write some like cool test suite to verify that it
Matches the things that you needed to and also doesn’t match the things that it shouldn’t well rock ships with a lot of that so it ships with a number pattern and a test suite to verify that if it generates a list of a thousand numbers it actually matches correctly a
Sigh a side rant also related to pattern matching is time so canoe you guys identify this time format shout it unix epoch is there a timezone UTC you hope so epoch is defined as being in UTC but I’ve seen applications that use epoch numbers in local time what so compared
To the last one what is what is wrong about this one what’s missing this is a time tantrum or Pachi milliseconds if you have more than one request per second you have no idea how far apart those were right and it’s in a weird format like you can’t sort it the first
Thing is day what’s missing in this one what’s actually what’s wrong with this one besides milliseconds missing hmm past timezones there it has Friday that’s kind of weird it’s not like duplicate information like if you ask the computer what day of the week was November 21st 97 it’s partly
Like dude it’s a Friday you don’t need that information this one missing what year is it right you have no idea you can assume current year but you’re probably wrong yeah what’s wrong with this one right yeah exactly which one is a year which one is the
Month which one is the day are those even things is it a star date well this one it’s a star date it has a period and a comma which is kind of cool because usually like in the US we use periods for stuff and everywhere else
Uses commas I’m sorry what is it time since boot in jiffy’s how long is it jiffy no one knows no it’s it’s year-month-day dot something you hope no timezone but it has milliseconds which is kind of cool oh this one any guys daemon tools users this is a format
Called TI ata i-64 it’s very weird also based in not in UTC it’s based on TI I yeah anyway but like like this is seven different time formats and the likelihood of you seeing all seven of these formats on a single server pretty high right so if you’re trying to
Correlate Apache errors with my sequel errors your your your joins or how you how you sort everything together is give me an on timestamp and they’d use completely different formats so logstash comes with add a filter to fix exactly this for example my sequel has its own pretty flower version of
Timestamps the date filter helps you turn that into something standard so that you can have all of your events with the same exact time format so you can search and sort and compare and see exactly when everything happened seriously stop inventing time formats makes me so angry so some of the things
You can do with logs – this is one event everyone agrees this is one event this is a stack trace from one thing that went wrong a lot of log processing systems do everything by lines you’ll get four events here and if you’re searching you’re not going to find
You’re gonna find part of the stack trace which kind of sucks so log stash has a filter to solve this you’re saying if the line starts with a space it belongs to the previous line so we’re merging up another good example is getting Apache metrics that graph I
Showed you earlier which I will show again using stats d2 output to graphite so that we can get your your response codes tracked counted and graphed so you can alert on them how many errors happened in the last hour so we alert on that also there’s a lot
Of outputs for different metric name systems as well so that that config previously generates this data from your logs and if you already have graphite installed getting your logs into this is quite easy it’s just it’s as simple as this config what else can we talk about
So I started doing a patch analysis at work and we added how long each request took and so I’m like what are the values let’s find out what’s the min the min is negative 16 1.6 million something like that’s weird a request took less than a second does Apache have a time machine
Have you all does it I mean I don’t know I don’t think it does it seems weird to me right so you start digging into it and I’m asking for word duration is less than 0 and I find a couple of hits I find only a few hosts are having this
Problem which I think is a problem so what’s going on here the requestor ation is less than 0 well like good operations guys the team of DreamHost uses ntp every few every few minutes and teepees like I’m fixing the clock cuz we got a little bit of skew but
Apache uses get time a day for counting durations and I think the best analogy I can give for this is that next time daylight savings happens when we roll forward 10 seconds before start holding your breath taken seconds after breathe out you just held your breath for an hour
That was cool so that’s the same thing Apache is doing it’s using like a wall clock that’s continuously being tuned by NTP to calculate durations so I don’t even know if I can trust this value so yeah time viewing backwards makes Apache have negative requests times so that’s
Some of the log stash power and with the wait log stash is designed is you have a bunch of inputs some filters and some outputs any of you unix admin’s have seen this before unix pipe model very similar if you’re a Windows admin you’ve seen PowerShell very similar model you can pipe things
To other things locks – comes with 23 inputs and this is always growing 18 filters and 40 outputs has a lot of plugins inputs open circular dude like syslog files other network things filters include the date filter rock which will help you parse out your stuff what else
Hmm mute a ya which lets you anonymize things if you need to what else outputs there’s a lot there’s way more outputs than anything else because as it turns out people don’t really know where they want to have all their event data stored so we just have lots of support
For it I’ll talk a little bit about the project focuses specifically function focus design function focus and community focus and community focus I think is the most important thing so a feature focus in general is transporting logs from anywhere to anywhere else that’s why there is so many input and
Output filter in input and output plugins and processing logs that’s why there are so many filter plugins because there are a lot of logs and really formats and logstash helps you sort of structure them and get them in a way that you can do analysis on them and once you
Process them ship them somewhere else you give them in a nice format I want to provide search and analytics those wet those screenshots I showed those are the web interface that logstash comes with so design features as assisted min I really hate if a piece of software
Sounds cool a lot of people are excited about it but to deploy it I have to make serious infrastructure changes about the way my my my systems and my network are designed that sucks so I want logstash first and foremost to figure infrastructure if you have to
Change a lot of things to even just try long stash the project is a failure I mentioned there’s a lot of plugins people are writing plugins pretty much every week we get a new one so Mike my comments there are that it is written in Ruby and that a lot of people
Don’t know Ruby but that doesn’t stop them from submitting plugins which are functional and they’re I get a lot of probably about five or six different plugins now where people send me patches to add them and they have this big apology apologetic like cover letter that says
Dear sir I am very sorry I don’t know Ruby I suck at it I’m a PHP guy but I wrote this please have it it works and that’s fine and I think that’s a good mark of a good extendable plug-in system for community wise I think this is my
Most important point that if a newbie is trying logstash and failing that’s a bug it’s either a bug in the code or – the bug in the documentation or something but it’s certainly not the new users fault I’ve had enough experience with open source projects where they have
That sort of send patches or die mentality that is because I am confident in writing code that’s fine with me I don’t have all the ideas and I’ll have all the bug reports so to me contributions are much much more than just code if you file bugs if you send
Ideas if you just file a bug and say you know I saw this weird thing I don’t know what happened that’s fine with me that’s data that’s useful data so yeah anyway contributions way more than just code and with that respect there’s a lot of community build up
The logstash project Cabana that web interface the web interface you saw in screenshots that’s Cabana that was written by someone outside the community who joined the community and said here’s this awesome tool I wrote there’s a puppet module available so you can install logstash pretty easily and that was thanks to the
Puppet labs recent recently had a module bounty where they got react logstash and a couple other modules submitted which is awesome and there’s also a third tool logstash CLI so you can do searches and analytics from the command line which is where a lot of UNIX Simmons are more
Comfortable so how do I have log stash deployed a dream host right now taking to about 20,000 events a second of only Apache logs if you don’t know what dream host is dream house one of dream host products is like shared shared web hosting they do WordPress a
Bunch of other stuff so I’m just trying to take all that data and get some analytics on it through logs – this average is out to be about 250 million events a day roughly 75 gigs a day of data 160 web servers and I’m probably gonna grow that to about a thousand soon
All of this being sucked in and managed by seven log stash servers and as far as resource utilization goes I can grow to a thousand web servers sending data without even increasing the size of my log stash cluster which is nice and this to me is a small deployment so the plan
What am I gonna use log stash for right I’m at dream host DreamHost needs to get value out of their employees how can I help log stash provide value for them and you can do all of this yourself this is what I this is what I intend for log
Has to do for you and your company’s first help you out guys tray stuff quicker right there’s a lot of servers how do we find out exactly where things are well give him a centralized search I also want to reduce the ops team human keyboard problem DreamHost is a very
Very tech customer support focused company so that means there’s a lot of technical support staff asking questions of the ops team a lot of times those those team team to team communications can become a big big boundary where you find walls being built people fighting and I find that you can sometimes insert
Tools to helps alleviate some of these some of these problems again giving logged access to other teams not just technical support if the developers ask for root access maybe they really just want access to the logs what if you collected all those logs and provided them in a nice and neat interface
Logstash can help with that provide real time visibility to customers your customers log in they think something’s down wouldn’t it be cool if you could proactively tell them if you knew something was down and I want to do this across the thousands of servers on billions events of events per day which
Again I think is probably like a medium sized deployment what I’m looking for so in closing you should get your stash on log stash done net is that is the site for the project there’s a mailing list that an IRC channel for support my best recommendation is that if you’re
Finding yourself having trouble again if newbies are having problems it’s a bug this these two support channels exist to make the documentation better you come report our shitty documentation we help you solve the problem and then we fix the docs for the next guy next person
That comes along has a good time and that’s it I’ve you have questions I’m happy to answer can you guys come up to the mic or is that too far that’ll give me a chance to get a drink Jordan so the question is something-something legacy systems how
Well does it integrate with Windows is that a good summary okay cool I can answer that so I mentioned there’s a lot of input plugins 23 ish input plugins one of those is the Windows Event log plug-in which I’m told works on Windows so that’s one that’s one source of logs on
Windows a second sources files you can use the file input for that Windows has like two event log things they call event log right I don’t remember it’s been a while since I’ve done Windows it man but anyway I know that I’ve gotten reports that the event log
Plug-in does work so that’s how you would do it to answer your question about legacy stuff I mentioned it was written in Ruby but the releases are packaged up into jar files and and we do the the way that’s achieved as I use JRuby which is a Java implementation of
The Ruby runtime and so if you’re used to chasing down dependencies to try and get new projects to work you won’t have that problem of log stash because log stash comes with win one big jar file with everything you need it in and that’s the way I deploy it at work did I
Answer your question cool any other questions go to the mic to the mic yo is there an HDFS output plug-in is there an H HBase no but it should be easy to write one for example we have a MongoDB output plug-in which no one uses because they’re smart but I don’t I
Don’t know no one uses yet but like the the plug-in itself – boilerplate is one line of code it’s like dot right event so doing the same thing for HDFS or HBase should be fairly straightforward it might even be cool if we had just the file output
Plug-in support like as three roles in HDFS URLs that would be cool you should file the bug and we’ll hop on that I what was the reason that you decided to package up everything in JRuby inside of a jar file ah so first I already answered that
Partially were to try out a new project you all like graphite is a great example graph I is a great project but it’s hard to chase down all the dependencies get everything set up right and I didn’t want to do that and I had used Hudson
Now it’s called Jenkins for a number of years and I was I rather enjoyed the fact that you can just download one file and run it and you were good to go that’s that was it that’s all you had to do and there’s assisted me and that’s pretty pretty straightforward normally
You’re expecting to look at all kinds of documentation about how to deploy it and then you’re like download all this and install it oh but you’re running an old version of Debian well you have to download the new version of that thing that you’re upstream repository doesn’t have and ah so you
Just download one jar file and you can deploy it and the reason I chose JRuby was for a couple of reasons you get proper access to real threads so if you have multi-core CPUs you get more performance when when there are times that a ruby library is not sufficient
And I don’t have the energy to write a new new solution to that problem there’s lots and lots and lots of Java libraries that may solve the same problem ten minutes and so you can call those directly from JRuby which is really awesome I think I had a third reason but
I forgot did that answer your question I forget what you went oh yeah cool right on question this is really nice but as far as I can see it’s only for access logs event logs stuff like that what about like application level locks yeah so in sections like how we we
Personally use sentry to log stuff like that uh-huh does DS integrate some how would it how does it overlap with it with sentry so when I say a log is a timestamp with data I think log stash itself doesn’t come with like built-in support for certain
Log types it’s really up to you to help it partially understand some of those things and it provides you tools to make that easier right it gives you more efficient hammers with which to hit your logs to make them more useful for you the demo I showed was Apache logs but I
Don’t know how to explain that it is a generic log processing system it is designed with the idea that a log is a timestamp and some data and what that data is it doesn’t really matter the filter plugins exists to help you process those things does that answer your question
Cool decided to ask another question so how do you do indexing so I see that you can do pretty cool search on your logs how do you do indexing is it in-flight is it post factum so how do i do indexing so i mentioned there are 40
Output plugins one of those outputs is elastic search elastic search is an open source tool for full-text search the language it speaks basically is JSON so if you send it if you say hey index and give it this thing of JSON it happily does that so you can search by field
Let’s see it’s horizontally scalable so you can add more machines which is quite nice it’s pretty configurable as far as when I’m wearing my sis in Manhattan it’s one of the better server tools that I’ve ever run because it just the defaults are very saying out of the box
If you don’t have a config file it works it’s great I clusters pretty easily what else can I tell you the community floor is really amazing like if you have a question they usually get a good answer for you and they’ve been really good about helping logstash
Along actually because log data is not it is fairly compressible because a lot of its just like English text and that compresses quite well so they added block compression to elastic search to help logstash to make better more efficient use of your disk cool and if
Any of you guys are hanging out in the IRC channel the public on fire c channel and you want to forward a question if they have any that’s cool too quick question how would you differentiate between a mysql log and an Apache log in a JBoss log on the same machine through
Your tool I mean can you say show me all that is okay so you do ahead to that and then at input time you can you can give every event of label and there’s two kinds of labels there’s a type which is a single string or tags which is a list
Of things tags are useful if you want to tag it by like cluster if you have 100/100 Apache servers and some of them belong to certain like logical clusters and you’re in your sense right you can tag them with that but the in a sense they’re Apache logs so you’ll want to
Process them with filters the same way so you can see a an example here that I’m saying all of these logs that come from accessed out log are typed web and then in the filter section they only apply to logs of type web and that’s how
You sort of get get your event routing judge thanks and there are questions I can dance cool thanks guys whoop somebody oh thanks see how quick does a loss of search data grow this rose data creep say that one more time how fast or how quick does elastic search data grow as
In data creep data creep like you mentioned there’s a blog post you did where you’re effectively storing everything twice because you had the raw uh log entry ism so there is a problem and that problem is grok and the problem is that grok tries that help you write a
Regular expression without having to know regex that parses pieces out of your log and gets likes a structured data an example is I don’t know if I have a good slide for that let me see no I will whoa that’s huge cool so you can see parts of this like
That 200 at the very top line I can point yay all right if you’ve used Apache before you know that’s the response code right if you think about how to represent that in a structured a structured format like JSON you say response code : 200 and you’re done
But in Apache you get this like weird format that you hope never changes and this is like bytes you hope it’s not labeled so what happens is Drock will take this as an event parse it with the pattern that you give it and create this like nice structured data along with
This as the original message so you essentially get like data bloat the best way to solve that is to not write on structured log formats to begin with there’s a lot of tools that can output JSON or Avro or protocol buffers or thrift is another good example to solve
This particular problem you can make Apache output JSON and then I have a post explaining how to do that but the duplication you can get rid of right once you once you parse this and understand that this is a response code his bytes you can just delete this
Entire message and you don’t have any duplication and that gets a little bit more too too technical and I can explain that delete or if I have like a good at a good venue for doing a live demo but hopefully I answered that question in the past couple of weeks I’ve I’ve
Gotten storage efficiency improved by a factor of three and a half if you parse this really properly with grok it in place to like a ridiculous like 6x inflation in bite size but now it’s down to one and a half and that’ll be available in the next version but you
Can use it in the current version hopefully I’m not rambling too much we have a few minutes for a couple of questions if you guys have any any more IRC questions otherwise I have stickers thank you
βLogging: Logstash and Other Thingsβ by Jordan Sissel, logging expert at DreamHost. From PuppetConf '12. Learn more about Puppet: http://bit.ly/OUeBLk Summary: Protocols are a mess. Too many formats, no common timestamps, and too many fancy one-liner scripts to handle. Pooh! There is hope! Logstash and other tools will help you with this. This talk will cover common logging practices (good and bad), some tools you might want to explore later, as well as an overview of Logstash and how you can use it to achieve victory in the fight against log abuse! Narrator Bio: Jordan is a hacker living under a rock in San Jose. He practices hate-driven development and gives out hugs upon request. When he's not hacking, he loves doing great things as a father and husband. Don't miss the next PuppetConf: http://www.puppetconf.com
#Logging #Logstash #Jordan #Sissel #DreamHost #PuppetConf
https://i.ytimg.com/vi/RuUFnog29M4/hqdefault.jpg