Home > All, Linux > Automatically download and merge multiple PDF files with a Unix shell.

Automatically download and merge multiple PDF files with a Unix shell.

This is a simple little script which I wrote to download multiple parts of some weekly newspapers and then merge them all up into one PDF file. I have no idea why they post these broken up into many pieces instead of just one and it’s a pain to close and open multiple files just to read the paper.

To use this script, you must know the exact filenames of the PDFs you want to download and you need pdfmerge, installable with yum ($ sudo yum install pdfmerge) from the updates repo.

Example 1: http://epilogespafou.com/ (56 single page PDF files)

#!/bin/bash

# epilogespafou.com, sample structure
# http://epilogespafou.com/pdf/01.pdf
# http://epilogespafou.com/pdf/02.pdf
# ... ://epilogespafou.com/pdf/56.pdf

# check for required tools
for tool in pdfmerge seq wget; do
	if ! type $tool > /dev/null 2>&1; then
		echo "ERROR: "$tool" not found." 1>&2
		exit 1
	fi
done

# generate output filename, customize to your needs
fn=`date --date 'last saturday' +%F`.pdf
# check if output file already exists
if [ -e $fn ]; then
	echo "ERROR: $fn already exists!" 1>&2
	exit 1
fi

# define base url
url=http://epilogespafou.com/pdf
# prepare list of files to merge
tomerge=''
for i in `seq -w 1 60`; do
	# download the file
	wget -q $url/$i.pdf
	# if success add to output queue
	if [ -e $i.pdf ]; then
		tomerge="$tomerge $i.pdf"
	fi
done

# merge output queue to output filename
pdfmerge $tomerge $fn

# clean up
for i in `seq -w 1 60`; do rm -rf $i.pdf; done

In example 1, we are using the -w flag with seq (equalize width by padding with leading zeroes).

Example 2: http://foni-pafou.com/ (5 PDF files, Max. 10 pages each, 48 pages total)

#!/bin/bash

# foni-pafou.com, sample structure
# http://foni-pafou.com/pdf/1.pdf
# http://foni-pafou.com/pdf/2.pdf
# ... ://foni-pafou.com/pdf/5.pdf

for tool in pdfmerge wget; do
	if ! type $tool > /dev/null 2>&1; then
		echo "ERROR: "$tool" not found." 1>&2
		exit 1
	fi
done

fn=`date --date 'last saturday' +%F`.pdf
if [ -e $fn ]; then
	echo "ERROR: $fn already exists!" 1>&2
	exit 1
fi

url=http://foni-pafou.com/pdf
tomerge=''
for i in {1..6}; do
	wget -q $url/$i.pdf
	if [ -e $i.pdf ]; then
		tomerge="$tomerge $i.pdf"
	fi
done

pdfmerge $tomerge $fn

for i in {1..6}; do rm -rf $i.pdf; done

Stick your scripts into a cron job and you have your newspapers/magazines waiting for you in a single PDF format for your viewing pleasure.

  1. No comments yet.
  1. No trackbacks yet.