Extracting Content from Academic Papers in PDF Format

We evaluated several toolkits designed for text extraction from PDF documents. In this article, we will share our findings and the rationale behind our final choice of CERMINE – the extraction tool developed by ADA Lab and CeON in ICM UW.

Exception handling basics

One of the useful concepts that came with modern languages is exception handling. In a nutshell, it is the concept of catching exceptions as they occur, instead of checking the inputs and return values of every function called.

String interning

String interning is a compiler optimization that detects that the content of the two string literals are the same, and automatically points them to the same object.