Introduction There are several ways to mine tables and other content from a pdf, using R. After a lot of trial & error, here’s how I managed to extract global exam results from an international, massive, yearly examination, the EDAIC.
This is my first use case of “pdf mining” with R, and also a fairly simple one. However, more complex and very fine examples of this can be found elsewhere, using both pdftools and tabulizer packages.
