Introduction There are several ways to mine tables and other content from a pdf, using R. After a lot of trial & error, here’s how I managed to extract global exam results from an international, massive, yearly examination, the EDAIC. This is my first use case of “pdf mining” with R, and also a fairly simple one. However, more complex and very fine examples of this can be found elsewhere, using both pdftools and tabulizer packages.

Continue reading

Author's picture


Anesthesiologist, MD, postdoc. Utter Rstats geek

Universidade de Santiago de Compostela
