Linux/logparse
Introduction[edit | edit source]
Logparse is a shell script that parses Caddy's JSON log files and outputs them as formatted text. It supports the Common Log Format
and Combined Log Format
, usually used by traditional HTTP servers like Apache, however it is easy to customise the output using the --selector
option or by using a config file. Logparse uses the powerful jq tool to process the JSON logs.
JSON, or JavaScript Object Notation, is a lightweight data interchange format that is easy for humans to read and write, and easy for machines to parse and generate. JSON is built on two structures: a collection of name/value pairs (often realised as an object or dictionary) and an ordered list of values (often realised as an array). Its simplicity and compatibility with most programming languages make it an ideal choice for configuration files, data storage, and transmission of data between a server and a client.
Caddy is a HTTP-1-2-3 server is a secure and powerful web server that uses JSON extensively, providing structured and easily parseable data. This makes it straightforward to analyse server logs programmatically, allowing for detailed monitoring and troubleshooting. By using JSON, Caddy ensures that log entries are consistent and comprehensive, enabling administrators to leverage tools like jq for querying logs, or integrate with other monitoring solutions that support JSON.
While JSON is excellent for machine parsing, it can be cumbersome when you need to quickly parse trough log files using shell tools like grep
, less
or tail
. The huge volume of data makes it hard to get a quick overview. I wrote logparse specifically to make it easier for me to read Caddy's JSON log files, though it should be easy to adapt the configuration for other JSON files.
Common Log Format[edit | edit source]
Traditional HTTP/web servers like Apache usually use plain text files to store access logs or error logs. A typical access_log file consists of one line per access, containing a number of predefined fields. Most servers supports extensive customisation of what fields that are included, but to make it easier for external tools to parse and analyse the logs, two standards exists, the Common Log Format (CLF) and the Combined Log Format. The Apache website https://httpd.apache.org/docs/2.4/logs.html has detailed information on these formats.
In the Common Log Format, the fields are Client IP
User ID
[datetime]
"Method
URI
Protocol"
Status code
Size
Common Log Format
192.168.0.122 - [2024-05-26 09:38:04] "GET /img/favicon.webp HTTP/3.0" 200 6866
In the Combined Log Format, the fields are: Client IP
User ID
[datetime]
"Method
URI
Protocol"
Status code
Size
"Referer"
"User agent"
Combined Log Format
192.168.0.122 - [2024-05-26 09:38:04] "GET /img/favicon.webp HTTP/3.0" 200 6866 "https://mirrors.tnonline.net/img/" "Mozilla/5.0 (X11; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"
JSON Logs[edit | edit source]
The Caddy JSON log file contains a lot more information, which can be crucial when gathering statistics, analysing issues, etc. Each line represents a complete JSON object with both the request headers and the response headers, as well as error messages and other information.
Caddy JSON format
{"level":"info","ts":1716716284.7144635,"logger":"http.log.access.log16","msg":"handled request","request":{"remote_ip":"192.168.0.122","remote_port":"40159","client_ip":"192.168.0.122","proto":"HTTP/3.0","method":"GET","host":"mirrors.tnonline.net","uri":"/img/favicon.webp","headers":{"Upgrade-Insecure-Requests":["1"],"Sec-Fetch-Dest":["document"],"Sec-Fetch-User":["?1"],"Accept-Language":["en-GB,en;q=0.5"],"Accept-Encoding":["gzip, deflate, br"],"Alt-Used":["mirrors.tnonline.net"],"Sec-Fetch-Site":["same-origin"],"User-Agent":["Mozilla/5.0 (X11; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"],"Referer":["https://mirrors.tnonline.net/img/"],"Sec-Fetch-Mode":["navigate"],"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h3","server_name":"mirrors.tnonline.net"}},"bytes_read":0,"user_id":"","duration":0.003348583,"size":6866,"status":200,"resp_headers":{"Date":["Sun, 26 May 2024 09:38:04 GMT"],"Strict-Transport-Security":["max-age=31968001; preload"],"X-Content-Type-Options":["nosniff"],"Last-Modified":["Wed, 19 Dec 2018 23:21:02 GMT"],"Accept-Ranges":["bytes"],"X-Frame-Options":["DENY"],"Cache-Control":["max-age=2592000, public"],"Content-Length":["6866"],"Content-Security-Policy":["default-src 'self'"],"Etag":["\"bqn9jjvvsuvq5aq\""],"Permissions-Policy":["none"],"Vary":["Accept-Encoding"],"Content-Type":["image/webp"],"Server":["Caddy"],"Referrer-Policy":["same-origin"]}}
If we "prettify" it using the jq
tool, it looks a lot better. However, we now have 109 lines of code for each log entry!
{
"level": "info",
"ts": 1716716284.7144635,
"logger": "http.log.access.log16",
"msg": "handled request",
"request": {
"remote_ip": "192.168.0.122",
"remote_port": "40159",
"client_ip": "192.168.0.122",
"proto": "HTTP/3.0",
"method": "GET",
"host": "mirrors.tnonline.net",
"uri": "/img/favicon.webp",
"headers": {
"Upgrade-Insecure-Requests": [
"1"
],
"Sec-Fetch-Dest": [
"document"
],
"Sec-Fetch-User": [
"?1"
],
"Accept-Language": [
"en-GB,en;q=0.5"
],
"Accept-Encoding": [
"gzip, deflate, br"
],
"Alt-Used": [
"mirrors.tnonline.net"
],
"Sec-Fetch-Site": [
"same-origin"
],
"User-Agent": [
"Mozilla/5.0 (X11; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0"
],
"Referer": [
"https://mirrors.tnonline.net/img/"
],
"Sec-Fetch-Mode": [
"navigate"
],
"Accept": [
"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8"
]
},
"tls": {
"resumed": false,
"version": 772,
"cipher_suite": 4865,
"proto": "h3",
"server_name": "mirrors.tnonline.net"
}
},
"bytes_read": 0,
"user_id": "",
"duration": 0.003348583,
"size": 6866,
"status": 200,
"resp_headers": {
"Date": [
"Sun, 26 May 2024 09:38:04 GMT"
],
"Strict-Transport-Security": [
"max-age=31968001; preload"
],
"X-Content-Type-Options": [
"nosniff"
],
"Last-Modified": [
"Wed, 19 Dec 2018 23:21:02 GMT"
],
"Accept-Ranges": [
"bytes"
],
"X-Frame-Options": [
"DENY"
],
"Cache-Control": [
"max-age=2592000, public"
],
"Content-Length": [
"6866"
],
"Content-Security-Policy": [
"default-src 'self'"
],
"Etag": [
"\"bqn9jjvvsuvq5aq\""
],
"Permissions-Policy": [
"none"
],
"Vary": [
"Accept-Encoding"
],
"Content-Type": [
"image/webp"
],
"Server": [
"Caddy"
],
"Referrer-Policy": [
"same-origin"
]
}
}
logparse[edit | edit source]
Logparse takes file in JSON Lines text format as input and can output it as plain text in Common, Combined or a Custom format. If no options are used, the output will be in Common Log Format.
# logparse --help
Usage: /usr/bin/logparse [-c | -C | -s "selectors"] [-F <config_file>] filename Options: -c, --common Apache Common Log Format (default) -C, --combined Apache Combined Log Format -s, --selector Use a space separated list of selectors -F, --config-file Use a configuration file -h, --help Show this help message and exit
Usage[edit | edit source]
Common Log Format[edit | edit source]
Logparse uses the CLF as default option if no other options are used.
# logparse caddy.log
10.10.10.117 - [2024-05-26 11:42:52] "GET /gentoo/gentoo-distfiles/distfiles/9b/tl-hrlatex.source-2021.tar.xz HTTP/2.0" 200 3768 10.10.10.134 - [2024-05-26 11:43:50] "GET /gentoo/gentoo-portage/app-emulation/metadata.xml HTTP/2.0" 200 446 10.10.10.132 - [2024-05-26 11:44:00] "GET /gentoo/gentoo-distfiles/distfiles/b6/libvirt-python-9.9.0.tar.gz HTTP/2.0" 200 246357 240e:1:1::1234 - [2024-05-26 00:22:43] "GET /res/browse.css HTTP/3.0" 200 1577
Combined Log Format[edit | edit source]
The Combined Log Format contains the CLF plus the Referer and User-Agent fields. The label "null" is used in place of missing selectors.
# logparse -C caddy.log
10.10.10.117 - [2024-05-26 11:42:52] "GET /gentoo/gentoo-distfiles/distfiles/9b/tl-hrlatex.source-2021.tar.xz HTTP/2.0" 200 3768 "null" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 10.10.10.134 - [2024-05-26 11:43:50] "GET /gentoo/gentoo-portage/app-emulation/metadata.xml HTTP/2.0" 200 446 "null" "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)" 10.10.10.132 - [2024-05-26 11:44:00] "GET /gentoo/gentoo-distfiles/distfiles/b6/libvirt-python-9.9.0.tar.gz HTTP/2.0" 200 246357 "null" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" 240e:1:1::1234 - [2024-05-26 00:22:43] "GET /res/browse.css HTTP/3.0" 200 1577 "https://mirrors.tnonline.net/gentoo/gentoo-portage/app-dicts/aspell-lt/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/117.0.5938.60 Safari/537.36
Using custom selectors[edit | edit source]
With the --selector
option, it is possible to customise the output. The following selectors are available:
Selector | Description |
---|---|
ts | Unix Epoch timestamp in 0.1 μs resolution. |
datetime | Timestamp of the log entry in "YYYY-MM-DD HH:MM:SS" format. |
datetime_l | Localized timestamp of the log entry in "YYYY-MM-DD HH:MM:SS Z" format. |
datetime_ms | Timestamp of the log entry with milliseconds in "YYYY-MM-DD HH:MM:SS.sss" format. |
datetime_iso | ISO 8601 formatted timestamp of the log entry. |
client_ip | IP address of the client making the request. |
remote_ip | IP address of the remote host making the request. |
remote_port | Port number of the remote host making the request. |
proto | Protocol used for the request (e.g., HTTP/1.1, HTTP/2, HTTP/3). |
method | HTTP method used for the request (e.g., GET, POST). |
host | Host header value of the request. |
uri | URI of the request. |
user_agent | User-Agent header value of the request. |
user_id | User ID associated with the request, if available. |
referer | Referer header value of the request. |
accept | Accept header value of the request. |
accept_encoding | Accept-Encoding header value of the request. |
connection | Connection header value of the request. |
bytes_read | Number of bytes read from the client during the request. |
duration | Duration of the request in seconds. |
size | Size of the response in bytes. |
status | HTTP status code of the response. |
tls_resumed | Indicates if the TLS session was resumed (true/false). |
tls_version | TLS version used for the request (e.g., TLS 1.2, TLS 1.3). |
tls_cipher_suite | Cipher suite used for the TLS connection. |
tls_proto | Protocol used for the TLS connection. |
tls_server_name | Server name indicated in the TLS handshake. |
rh_content_length | Content-Length header value of the response. |
rh_content_type | Content-Type header value of the response. |
rh_server | Server header value of the response. |
rh_x_frame_options | X-Frame-Options header value of the response. |
rh_last_modified | Last-Modified header value of the response. |
rh_alt_svc | Alt-Svc header value of the response. |
rh_vary | Vary header value of the response. |
rh_etag | ETag header value of the response. |
rh_x_content_type_options | X-Content-Type-Options header value of the response. |
rh_referrer_policy | Referrer-Policy header value of the response. |
rh_cache_control | Cache-Control header value of the response. |
rh_accept_ranges | Accept-Ranges header value of the response. |
rh_content_security_policy | Content-Security-Policy header value of the response. |
rh_strict_transport_security | Strict-Transport-Security header value of the response. |
rh_permissions_policy | Permissions-Policy header value of the response. |
There are also special keywords that can be used to format the output.
Placeholder | Description |
---|---|
_ | Single Space |
space | Single Space |
tab | Single Tab |
q | Double quote ( " ) |
quote | Double quote ( " ) |
squote | Single quote ( ' ) |
i | Pipe ( | ) |
pipe | Pipe ( | ) |
lsqb | Left square bracket ( [ ) |
rsqb | Right square bracket ( ] ) |
lbrace | Left brace ( ( ) |
rbrace | Right brace ( ) ) |
lcurly | Left curly brace ( { ) |
rcurly | Right curly brace ( } ) |
colon | Colon ( : ) |
semicolon | Semicolon ( ; ) |
comma | Comma ( , ) |
dot | Dot ( . ) |
slash | Slash ( / ) |
backslash | Backslash ( \ ) |
hyphen | Hyphen ( - ) |
underscore | Underscore ( _ ) |
plus | Plus ( + ) |
equals | Equals ( = ) |
ampersand | Ampersand ( & ) |
percent | Percent ( % ) |
dollar | Dollar ( $ ) |
exclamation | Exclamation mark ( ! ) |
You can combine these selectors and formatting options as you like.
If you wanted to see the IP address, HTTP protocol and if TLS was used in the connection, you can use the datetime_ms
,client_ip
, tls_version
and tls_cipher_suite
selectors.
# logparse -s "datetime_ms _ client_ip _ proto _ tls_version _ tls_cipher_suite" caddy.log
2024-05-26 13:42:52.601 CET 10.10.10.117 HTTP/2.0 null null 2024-05-26 13:43:50.508 CET 10.10.10.134 HTTP/2.0 null null 2024-05-26 13:44:00.654 CET 10.10.10.132 HTTP/2.0 null null 2024-05-26 02:22:43.414 CET 240e:1:1::1234 HTTP/3.0 TLS 1.3 TLS_AES_128_GCM_SHA256
If you wanted to output a CSV file, you could use the semicolon
or tab
formatting option.
# logparse -s "ts tab client_ip tab proto tab status tab uri" caddy.log
1716723772.6014807 10.10.10.117 HTTP/2.0 200 /gentoo/gentoo-distfiles/distfiles/9b/tl-hrlatex.source-2021.tar.xz 1716723830.5081363 10.10.10.134 HTTP/2.0 200 /gentoo/gentoo-portage/app-emulation/metadata.xml 1716723840.6541977 10.10.10.132 HTTP/2.0 200 /gentoo/gentoo-distfiles/distfiles/b6/libvirt-python-9.9.0.tar.gz 1716682963.4146512 240e:1:1::1234 HTTP/3.0 200 /res/browse.css
Configuration File[edit | edit source]
The configuration file allows you to change the default output format by setting the use_selectors
variable.
File: logparse.config
use_selectors="datetime_ms tab client_ip tab size tab status tab uri"
Now when you call logparse without other options, the selectors used in the config file are used.
# logparse -F logparse.config caddy.log
2024-05-26 13:42:52.601 CET 10.10.10.117 3768 200 /gentoo/gentoo-distfiles/distfiles/9b/tl-hrlatex.source-2021.tar.xz 2024-05-26 13:43:50.508 CET 10.10.10.134 446 200 /gentoo/gentoo-portage/app-emulation/metadata.xml 2024-05-26 13:44:00.654 CET 10.10.10.132 246357 200 /gentoo/gentoo-distfiles/distfiles/b6/libvirt-python-9.9.0.tar.gz 2024-05-26 02:22:43.414 CET 240e:1:1::1234 1577 200 /res/browse.css
It is possible to change the JSON mappings used by logparse. This allows for customising logparse to use a different naming scheme, add new formatting characters or change the JSON mapping to match other types of JSON files.
This is the default configuration used by logparse.
# Enable enable debug output
#DEBUG=1
# Use these selectors
use_selectors="client_ip \
space user_id \
space lsqb datetime rsqb \
space quote method \
space uri space proto quote \
space status space size"
# Set format to 'custom' to set above 'use_selectors' as default.
format="common"
# JSON selector mappings
declare -A selectors=(
# Unix epoch timestamps
[ts]='.ts'
# Convert Unix epoch timestamps to datetime format.
# https://jqlang.github.io/jq/manual/#dates
[datetime]='.ts | strftime("%Y-%m-%d %H:%M:%S")'
[datetime_l]='.ts | strflocaltime("%Y-%m-%d %H:%M:%S %Z")'
[datetime_ms]='(.ts | tostring | split(".") | .[1][:3]) as $ms | .ts | strflocaltime("%Y-%m-%d %H:%M:%S.") + $ms + strflocaltime(" %Z")'
[datetime_iso]='.ts | todateiso8601'
[client_ip]='.request.client_ip'
[remote_ip]='.request.remote_ip'
[remote_port]='.request.remote_port'
[proto]='.request.proto'
[method]='.request.method'
[host]='.request.host'
[uri]='.request.uri'
[user_agent]='.request.headers["User-Agent"]'
[user_id]='.user_id'
[referer]='.request.headers.Referer'
[accept]='.request.headers.Accept'
[accept_encoding]='.request.headers["Accept-Encoding"]'
[bytes_read]='.bytes_read'
[duration]='.duration'
[size]='.size'
[status]='.status'
[tls_resumed]='.request.tls.resumed'
[tls_version]='.request.tls.version as $version |
if $version == 769 then "TLS 1.0"
elif $version == 770 then "TLS 1.1"
elif $version == 771 then "TLS 1.2"
elif $version == 772 then "TLS 1.3"
else $version end'
[tls_cipher_suite]='.request.tls.cipher_suite as $cs |
if $cs == 5 then "TLS_RSA_WITH_RC4_128_SHA"
elif $cs == 10 then "TLS_RSA_WITH_3DES_EDE_CBC_SHA"
elif $cs == 47 then "TLS_RSA_WITH_AES_128_CBC_SHA"
elif $cs == 53 then "TLS_RSA_WITH_AES_256_CBC_SHA"
elif $cs == 60 then "TLS_RSA_WITH_AES_128_CBC_SHA256"
elif $cs == 156 then "TLS_RSA_WITH_AES_128_GCM_SHA256"
elif $cs == 157 then "TLS_RSA_WITH_AES_256_GCM_SHA384"
elif $cs == 49159 then "TLS_ECDHE_ECDSA_WITH_RC4_128_SHA"
elif $cs == 49169 then "TLS_ECDHE_RSA_WITH_RC4_128_SHA"
elif $cs == 49170 then "TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA"
elif $cs == 49171 then "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA"
elif $cs == 49172 then "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA"
elif $cs == 49161 then "TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA"
elif $cs == 49162 then "TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA"
elif $cs == 49191 then "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256"
elif $cs == 49199 then "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256"
elif $cs == 49187 then "TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256"
elif $cs == 49195 then "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256"
elif $cs == 49200 then "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384"
elif $cs == 49196 then "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384"
elif $cs == 52392 then "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256"
elif $cs == 52393 then "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256"
elif $cs == 4865 then "TLS_AES_128_GCM_SHA256"
elif $cs == 4866 then "TLS_AES_256_GCM_SHA384"
elif $cs == 4867 then "TLS_CHACHA20_POLY1305_SHA256"
elif $cs == 22016 then "TLS_FALLBACK_SCSV"
elif $cs == 52392 then "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305"
elif $cs == 52393 then "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305" else $cs end'
[tls_proto]='.request.tls.proto'
[tls_server_name]='.request.tls.server_name'
[rh_content_length]='.resp_headers["Content-Length"]'
[rh_content_type]='.resp_headers["Content-Type"]'
[rh_server]='.resp_headers.Server'
[rh_x_frame_options]='.resp_headers["X-Frame-Options"]'
[rh_last_modified]='.resp_headers["Last-Modified"]'
[rh_alt_svc]='.resp_headers["Alt-Svc"]'
[rh_vary]='.resp_headers.Vary'
[rh_etag]='.resp_headers.Etag'
[rh_x_content_type_options]='.resp_headers["X-Content-Type-Options"]'
[rh_referrer_policy]='.resp_headers["Referrer-Policy"]'
[rh_cache_control]='.resp_headers["Cache-Control"]'
[rh_accept_ranges]='.resp_headers["Accept-Ranges"]'
[rh_content_security_policy]='.resp_headers["Content-Security-Policy"]'
[rh_strict_transport_security]='.resp_headers["Strict-Transport-Security"]'
[rh_permissions_policy]='.resp_headers["Permissions-Policy"]'
)
# Allows adding various formatting characters to the output
declare -A placeholders=(
[_]=' '
[space]=' '
[tab]='\t'
[q]='\"'
[quote]='\"'
[squote]="'"
[i]='|'
[pipe]='|'
[lsqb]='['
[rsqb]=']'
[lbrace]='('
[rbrace]=')'
[lcurly]='{'
[rcurly]='}'
[colon]=':'
[semicolon]=';'
[comma]=','
[dot]='.'
[slash]='/'
[backslash]='\\'
[hyphen]='-'
[underscore]='_'
[plus]='+'
[equals]='='
[ampersand]='&'
[percent]='%'
[dollar]='$'
[exclamation]='!'
)
It can be tricky to add new mappings, and to help analyse the actual jq
filter constructed by logparse, there is a debug mode that can be enabled. Simply set the environment variable DEBUG
before calling logparse.
# DEBUG=1 logparse -F logparse.config caddy.log
jq -r . | ( if .request.client_ip == "" then "-" else .request.client_ip end ) as $client_ip | ( " " ) as $space | ( if .user_id == "" then "-" else .user_id end ) as $user_id | ( " " ) as $space | ( "[" ) as $lsqb | ( if .ts | strftime("%Y-%m-%d %H:%M:%S") == "" then "-" else .ts | strftime("%Y-%m-%d %H:%M:%S") end ) as $datetime | ( "]" ) as $rsqb | ( " " ) as $space | ( "\"" ) as $quote | ( if .request.method == "" then "-" else .request.method end ) as $method | ( " " ) as $space | ( if .request.uri == "" then "-" else .request.uri end ) as $uri | ( " " ) as $space | ( if .request.proto == "" then "-" else .request.proto end ) as $proto | ( "\"" ) as $quote | ( " " ) as $space | ( if .status == "" then "-" else .status end ) as $status | ( " " ) as $space | ( if .size == "" then "-" else .size end ) as $size | "\($client_ip)\($space)\($user_id)\($space)\($lsqb)\($datetime)\($rsqb)\($space)\($quote)\($method)\($space)\($uri)\($space)\($proto)\($quote)\($space)\($status)\($space)\($size)" caddy.log
Download[edit | edit source]
Logparse source code is available at https://git.tnonline.net/Forza/logparse