helpful awk command examples when working with Apache access logs

There are some utils that analyze access logs, one of such util is webalizer. But I always believe that getting to know you access log files is more appropriate when you want to do some more in depth analysis on something specific.

There are several commands available such as grep, cut, sed etc that you can use for different scenarios, depending on what kind of information you actually need, but in this post I will touch base on awk command and working with access logs. awk is a pattern directed scanning and processing language. Very powerful language indeed.

awk manual can be found here

Here is an Apache Access logs example

The format of any entry above is shown below, columns are whitespace delimited

To extract above columns individually we can use the following commands

 

You will see that above commands taking a space as a delimiter, we can change this however as shown below

Let’s work on some scenarios now, say if you want to get list of unique ip addresses from your logs, you can run this command to get that info

or if you want to see which IP addresses has been accessing a specific resources then you can use either of these commands

Check if the requests are coming from an automated scripts

When checking for automated scripts, we will check for an empty user agent value, generally these scripts won’t send through a user agent information

Here is the command that you can use

To check how many times a resource has been requested

You can use the following command

 

Identify issues with your web resources

Generally we will be working for 404 errors, we can get this kinda report from Google Analytics as well but Google won’t list internal linking resource errors, you can also use developer tools to check if 404 errors are being produced on a certain page, let’s use awk to do that now

We are check for column 9 and pattern matching it against string 404. We are then piping the output to another awk command to print the required data and then piping the output to sort command to sort the output

We can also use something like this

Above will produce similar result without using 2 awk commands

There is so much more you can do with awk all you have to do is to understand how the command works, combine other commands such as sed to customize your output

If you are using awk command to achieve other things please do leave your comments.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.