What is OCR processing?
As a measure of records request complexity, the count of OCR-processed documents should decrease over time as documents become more and more “born digital.” But there’s a lot still standing in the way.
“OCR” means Optical Character Recognition – a process by which scanned documents are “interpreted” by software so that the images of characters (i.e. letters and numbers) are transformed into text that computers can read – also known as digitizing a document.
Example: if an old handwritten property deed is scanned and processed by OCR software, it can be turned into a Word or other type of electronic document that can be keyword searched, reorganized, extracted into another document, etc. Since searching for responsive documents relies heavily on finding keywords, looking automatically through electronic files without even having to open them is far more efficient than finding, organizing, opening, reading, reorganizing, and copying or scanning paper documents.
If you’ve dealt with OCR-processed documents before, you’ll know that the software can’t always interpret every character accurately, so documents rarely come out perfect and need editing – but the software is getting better all the time.
GovQA offers an add-on module called Attachment Search with OCR that optimizes the processing and searching of paper documents that aren’t machine-readable. Learn more about this GovQA capability here.
Why Paper Remains Relevant
In 2021, the body of documents and records that jurisdictions and agencies hold can run the gamut from almost no electronic files to almost no hard/paper files. While there is a general push to transition everything from paper to electronic, a host of logistical, financial, and practical issues mean paper will keep being part of the picture and necessitate continued use of OCR technology. Some of these, which we’ll explore below, are:
- Small jurisdictions/agencies have modest needs and/or resources
- Some agencies lack adequate document retention policies or practices
- Many agencies haven’t imaged or digitized older records they need to keep long-term or that they still use frequently
- Many reasons why jurisdictions still need to provide records in paper form, even if they’ve been digitized
Small (some very) jurisdictions/agencies
Small jurisdictions may not have the need or means to invest in staff or systems to digitize or image their archived and working documents – or adopt sophisticated systems to organize and store them for current needs or as insurance for possible large public records requests. There’s not much chance they’ll end up in the position of Gold Bar, Washington (pop. 2,000) which found itself mired in a public records request/lawsuit war that had the town facing insolvency and disincorporation.
Agencies with document retention policies or practices that fall short
Taken together, document retention policies and practices are like icebergs. There’s the part you can see above the water – the laws and policies stating what’s supposed to be kept and for how long. Then there’s the underwater part you can’t see – sometimes warehouses full of records that should have been disposed of but weren’t.
Document retention laws should define the body of documents you have to consider when responding to records requests. But if inadequate resources are devoted to purging records on schedule, they just pile up with each passing year – often neither being digitized if they need to be kept or disposed of if they don’t. And if documents that should have been disposed of are responsive to incoming requests, they have to be provided – adding needlessly to costs and possibly representing grounds for legal action.
It’s not necessarily under-resourced jurisdictions that fall behind in their document housekeeping. It can happen with better-resourced governments that get off track due to emerging and competing priorities – making that accumulation of old documents a potential ticking time bomb.
Agencies with working paper files
Some documents, like those related to property ownership or development, must be kept long-term or even forever. Some, such as property plats and infrastructure as-built plans, often need to be consulted to answer emerging questions and solve new problems even if they are very old.
For various reasons – including resistance to new technology, changing demands on resources that leave imaging/digitizing at the bottom of the priority list, and management preferences – many jurisdictions simply have not imaged or digitized these documents. If a records request comes in that they are responsive to, agencies can be looking at days, weeks or in some cases even years of time spent reviewing, copying and scanning/digitizing these files.
In a worst case scenario, agencies may do this scanning/digitizing and then not leverage these new resources for future use. But there are also “best-laid plans” and other situations that mean digitizing does not eliminate the need to keep paper.
As part of 2019 public records request metrics reported to the Joint Legislative Audit and Review Committee (JLARC), the WA State Dept of Archaeology and Historic Preservation reported “most of our records have been digitized; if not, we scan and send. Sometimes the scan was incomplete so we pull from Archives and re-scan and send.” So even though they’ve digitized almost everything, they still need to keep paper archives.
Providing records in "old school" form
It seems logical that if jurisdictions do a great job at scanning/digitizing their documents they’ll never have to look back. Responsive document retrieval, review and provision will all be neatly and efficiently in the digital realm. But if that were the case, OCR-processed documents wouldn’t be a complexity factor.
With its 2019 PRR metrics, the WA Attorney General’s Office reported “AGO offers all records electronically. However, requesters who are incarcerated or residents of the DSHS [Dept. of Social and Health Services] Special Commitment Center are often prohibited from or do not want to receive records electronically; therefore, the agency provides them paper records.”
The City of Lake Stevens, WA reported “Metric [as defined by JLARC] does not provide an option for records viewed only. This number (49) is included with paper” (49 out of 291)” – indicating that one-sixth of requesters wanted to come in to city hall to examine the records rather than be sent paper copies or electronic files.
Requests to review in person aren’t uncommon and underscore what is really the fundamental right in public records requests – the right to access and inspect the records, and in the larger context the right of individuals to see the workings of government.
Arizona’s public records law makes this clear: “Public records and other matters in the custody of any officer shall be open to inspection by any person at all times during office hours.” While this provision is likely over a hundred years old and doesn’t reflect how an in-person inspection request would actually play out, it does evoke the fundamental nature of public records laws. It’s just that technology has been taking us to a place where it’s much easier to just provide copies.
The City of Kennewick, WA reported to JLARC that of its 2,202 requests in 2019, 329 were provided in paper form and that 97% of the 329 were for police department records, adding “we’ve worked actively to encourage electronic records for these customers (physical records are the most costly for them and most time-consuming for us)” but that people choose hard copies anyway, coming in to the office to pay for and collect the documents.
So while it may be true that having the preponderance of records digitized doesn’t mean the end of less efficient ways to connect records with requesters, there is no doubt that digitizing makes the responsive document search and review process much faster and easier.
OCR in lockdown
As with most of the GovQA PiPRIndex complexity markers, counts are trending significantly upward with time. From 2018 to 2019, OCR-processed documents increased about 25%, with a further increase of about 13% from 2019 to 2020. The 2020 count does show a “Covid dip” (the second quarter 2020 counts dips significantly below the first quarter count, in contrast with 2019 and 2018), but the rises of the quarter three and quarter four counts are significantly higher than those of 2018 and 2019. This makes sense in the context of greatly increased numbers of PRRs in general in 2020, but also spotlights another reality – as many staff were working from home from early spring through 2020, GovQA was working for them – during lockdown they were able to successfully access and OCR-process the responsive documents they needed.
The Peers in Public Records Newsletter (formerly FOIA News) is a bi-monthly e-newsletter brought to you by GovQA. It is a collection of the latest trends in public record requests and government transparency initiatives, shared stories, live roundtables, informative case studies, and actionable knowledge that will help you calm the chaos and keep your organization compliant. Send your comments to firstname.lastname@example.org.
Subscribe to the Peers in Public Records Newsletter
© Copyright 2021. PiPRIndex. All rights reserved.