When the University of North Carolina at Chapel Hill posted 200,000-plus pages of public records on its website last month, it allowed the public to read some of the 5 million pages of emails, memos and other documents collected as part of the investigation into the school’s long-running academic fraud scandal.
What the university did not provide was a way for the public to easily search those records. The documents were posted as PDFs with no “readable” text encoded in the files, meaning readers can’t search for names or other keywords and must look through the documents one by one.
To help sift through the records, WRAL News processed the documents to make the text searchable and created an app to open that search to the public. Users can easily share what they find on Twitter using the hashtag #UNCdocs or email findings to the WRAL newsroom.
WRAL-TV Public Records Researcher/Reporter Tyler Dukes recently spoke on-air with WRAL-TV Anchor/Reporter Gerald Owens about the app he created to make the documents accessible to viewers. Dukes and NMG Investigates/Special Projects Producer Kelly Hinchcliffe also put together a story on WRAL.com to explain the process and how viewers can search the available public records.
Read the full story and watch video of the newscast segment:
Dukes and Hinchcliffe explained their process:
How we created the app
WRAL News used a Web-based service called DocumentCloud to process the documents with optical character recognition, which attempts to match images of text with their corresponding characters. OCR is never 100 percent accurate. Sometimes, letters and characters are too small, blurry or rendered in a difficult font. But it does give readers a chance to identify text in the documents.
We put the application together in a few days, based on how we thought our readers might be most interested in working with and reviewing the documents. While there still may be a few bugs, the most important thing for readers to know is that we want the application to evolve with their feedback.
We want it to be a helpful way for the public to review the public records released last month as well as additional ones we expect to see from UNC and other organizations. The best way to do that is to learn more about how our audience uses it.
Dumping hundreds of thousands of unsearchable pages in the form of almost three gigabytes of files on a website is the bare minimum of complying with North Carolina’s public records law. This application doesn’t change that, but we hope it becomes a tool we can use to make public records more accessible to the public.
If you find something of interest, or have suggestions about how we can make the documents easier for users, email us or tweet using the hashtag #UNCdocs.
Thanks to WRAL-TV’s Tyler Dukes & NMG’s Kelly Hinchcliffe for contributions to this capcom story.