Scan processing

Matholymp has some support for processing scans of contestant scripts, using barcoded cover sheets to support automatically splitting up a PDF with scans of multiple scripts.

Matholymp provides a script mo-process-script-scans to do this processing. It expects to be run from a directory containing a file scans.cfg with associated configuration information. The examples/scan-processing/ directory in the matholymp source distribution includes a version of scans.cfg that may be used as a basis for configuring this. The other arguments are the names of multi-script PDFs to process; mo-process-script-scans creates a corresponding log file for each of those files with .log appended to its name. Alternatively, if it is run with the –watch option, the other argument is the name of a directory to watch for new multi-script PDFs having appeared; this is intended to be used with the db/queue_scan subdirectory of the registration system directory, and the PDFs must be complete before they appear in the named directory with a .pdf filename suffix.

The file pointed to by cover_sheet_key_file should exist before this command is run, with the same 20 bytes of random data (generated afresh for each year’s event) as in the file used in document generation to generate the cover sheets. The file pointed to by password_file should also exist, containing the password for the registration system account (with the Scan role) to be used to upload scripts.

Symlinks to individual uploaded scans are automatically created in db/scans in the registration system directory. The contents of this directory can be made available to coordinators through a web server to provide them with access to scans without needing a privileged registration system account.

The process for uploading the PDFs with scans of multiple scripts depends on the scanning process and systems, but an example script (in which configuration settings need to be inserted) that uses the REST interface for such uploads is watch-upload.py. A simpler version, that only uploads PDFs from a given directory in which all required PDFs exist at startup, without watching for new scans to appear, is upload.py.