The Friendly Python: Applying automation to media archive management

A recent client in the music industry wanted to transfer hundreds of assets to IMES consisting of PDF documents from the 1990s. Their operators recorded painstaking details whenever they restored an audio recording, creating a PDF for each recording, with a few dozen different fields. These fields included details about the original recordings and the restoration process, such as the artist’s name, date, producer, original audio engineer, original sound levels, tape condition, restoration engineer, restoration levels, and even specifications regarding the baking temperature, if needed.

As we dug into the project, we realized that some of these records originated from another company, and they were so old that the PDFs didn't even have fields: they were just single images with text in them. The client needed all of this information transferred into one massive Excel spreadsheet that could then be ingested into a database like FileMaker, so that their entire company would have access to the content. 

Ordinarily, scenarios like these would require multiple team members to spend hundreds of manual labor hours reviewing or tracking thousands of data points to complete the customer’s request. But using Python, we’re able to automate data mining requests like these to easily and efficiently deliver exactly what the client needs, at a tremendous savings of time and money to the client.

Python is a scripting language with which we can create customized automation, accomplishing tasks like the client scenario above, in a fraction of the time it would take to do it by hand. Task automation uses scripts -- a set of instructions performed on a computer system -- to instruct a computer to do almost anything a human data analyst can do, but much faster and without the possibility of human error.

OCR.jpg

When setting up Python to solve this client challenge, we created two separate automation processes. We wrote one Python script to read swiftly all of the data from every field in each PDF and then save it all in the database exactly as it appeared in the original document, instead of employing a crew to open each individual PDF and manually type in all the data. For most of the PDFs, this script was able to read the information from the internal fields and assign them to spreadsheet columns based on the PDF field names. Second, to address the older PDFs -- the ones that were just a single flat image -- we created another Python script using OCR (Optical Character Recognition) to scrape the text from the image, then parsed that text into the appropriate categories.

In both cases, some fields required additional processing to ensure they were in the specific format the client needed, which the automation also handled. And get this: the client is still using the automation today! The script is able to read the information from the internal fields and assign them to spreadsheet columns based on the PDF field names.

Since scripts can also send commands to control hardware, IMES can process assets around the clock on behalf of customers. Automation also enables IMES to process clients’ assets at a remarkable rate. Python enables us to work smarter and faster, so that we can service more customers with a more diverse set of needs. IMES has created its own internal script repository, and services customers with automated scripts at three IMES locations: Hollywood, CA, Boyers, PA, and Moonachie, NJ.

Bret Shefter

Bret is a Software Engineer at IMES in Hollywood, CA. He received his B.A. in English from Yale University and a J.D. from the University of California, Davis. He specializes in scripting and coding. He has designed automation within various IMES workflows to allow for greater efficiency and also enjoys working on data restoration. During his downtime, you can catch Bret in the movie Mutant Vampire Zombies from the ‘Hood from the same writer who created the infamous Sharknado films.

Previous
Previous

The painstaking, thrilling work of legacy preservation

Next
Next

Why you shouldn’t store valuable stuff in your basement or attic