Blog/MediaWiki cron jobs

From Forza's ramblings

MediaWiki Job Queues[edit | edit source]

A wall painting depicting depicting a turtle with 12 animals painted in a circle
Astral calendar - turtle divination chart. With central golden turtle, representing the Bodhisattva , nine magic squares and symbols of the eight planets. Painting on the walls of Punakha Dzong, Monastery Bhutan

MediaWiki runs several background tasks to organise and maintain itself. These tasks are usually run in one of two ways;

  • directly when users browse the wiki.
  • by a background service, often referred to as a cron job.

The default configuration uses the first option. This makes sense on small wikis as it doesn't rely on external services. The downside is that it can introduce delays for visitors, and some tasks may be delayed as well.

The configuration setting $wgJobRunRate controls weather MediaWiki runs background tasks on each request $wgJobRunRate=1;or via a background service $wgJobRunRate=0;.

It is important that the background tasks are run as the same user as the webserver or PHP instance.

crontab -e -u wikiuser
# Runs runJobs.php once every hour
0 * * * * /usr/bin/php /var/www/wiki/maintenance/runJobs.php --maxtime=3600 > /var/log/runJobs.log 2>&1

Systemd users may want to look into systemd timers instead of old-style crontab. https://www.freedesktop.org/software/systemd/man/latest/systemd.timer.html

Full documentation of MediaWiki job queues can be found at https://www.mediawiki.org/wiki/Manual:Job_queue

An alternative to cron[edit | edit source]

Running tasks on regular internals may cause problems with long running items like transcoding of video files. Having a too short recurring period may mean too many parallell executions, while having a long period means that important tasks are delayed too much.

An alternative is to run a shell script that will loop quickly, checking for important tasks, and limiting how many long-running tasks can be run until next loop. The script has to be run as the MediaWiki user.

#!/bin/bash
# Put the MediaWiki installation path on the line below.
mw_install_path="/www/wiki.tnonline.net/htdocs/mediawiki"
runjobs="$mw_install_path/maintenance/runJobs.php"
log="/var/log/mediawiki/wiki.tnonline.net_runJobs.log"

echo "$(date +'%Y%m%d-%H%M%S')" : Starting runJobs.php service as PID: $BASHPID >> "$log"

# Wait a minute after the server stats up to give
# other pocesses time to get started.
sleep 60

echo "$(date +'%Y%m%d-%H%M%S')" : Entering job loop >> "$log"
while true; do
	# Job types that need to be run ASAP no matter how many of
	# them are in the queue. Those jobs should be vey "cheap" to run.
	echo "$(date +'%Y%m%d-%H%M%S')" : Running priority jobs >> "$log" 2>&1
	/usr/bin/php "$runjobs" --type="enotifNotify" >> "$log" 2>&1
	echo "$(date +'%Y%m%d-%H%M%S')" : Priority jobs done >> "$log" 2>&1

	# Everything else, limit the number of jobs on each batch.
	echo "$(date +'%Y%m%d-%H%M%S')" : Running standard jobs >> "$log" 2>&1
	/usr/bin/php "$runjobs" --maxjobs=20 >> "$log" 2>&1
	echo "$(date +'%Y%m%d-%H%M%S')" : Standard jobs done >> "$log" 2>&1

	echo "$(date +'%Y%m%d-%H%M%S')" : Running video transcode jobs >> "$log" 2>&1
	/usr/bin/php "$runjobs" --maxjobs=1 --type=webVideoTranscode >> "$log" 2>&1
	echo "$(date +'%Y%m%d-%H%M%S')" : Video transcode jobs done >> "$log" 2>&1

	# Wait some seconds to let the CPU do othe things,
	# like handling web equests, etc.
	echo "$(date +'%Y%m%d-%H%M%S')" : All done. Waiting for 20 seconds... >> "$log"
	sleep `shuf -i 15-25 -n1` # random interval between 15-25 seconds
done

Run this this bash script during system boot. Remember to run it as the MediaWiki user.

NOTE: Remember to stop the script before doing maintenance work or MediaWiki upgrades

On a very busy server, it might be better to split priority, normal and transcoding tasks into three separate scripts. This will ensure that priority tasks will execute even if other long-running tasks are running.

OpenRC init.d script[edit | edit source]

Here's a basic init script that allows for stopping and starting the runJobs script.

#!/sbin/openrc-run
command="/www/wiki.tnonline.net/runjobs.sh"
command_user="wiki:wiki"
command_background=true
#command_args=""
pidfile="/run/${RC_SVCNAME}.pid"
start_stop_daemon_args="--interpreted"
name="MW runJobs"

description="Mediawiki runJobs daemon"

depend() {
	use caddy php-fpm
}

Note that I am using Caddy webserver, so replace caddy with apache or nginx if you use those.

Save this file as /etc/init.d/mw-runjobs and run rc-update add mw-runjobs. To start it, simply run rc-service mw-runjobs start.

Don't forget to change command= to point to the runJobs.sh script.