We recommend switching to the latest versions of Edge, Firefox, Chrome or Safari. Using Internet Explorer will result in a loss of website functionality.

Import from PDF


1 comment

  • Avatar
    Adrian Williams

    Hi Rodi,

    Importing multi-structured data from PDF files is inherently a tricky and error-prone process. PDF files may also be images rather than contain any structured data and Dataverse does not include any Optical Character Recognition (OCR) capabilities.

    We currently do not have an item on our roadmap to extract data from PDF files but we remain open to reconsidering this if it becomes a more widespread requirement for customers.

    There are a number of third-party products available to extract data from PDF files using OCR techniques or accessing semi-structured data from the files and exporting this as text/Excel files.


    I understand that some users have also looked at using the Python node with Python modules (e.g. pyPDF2)  that can extract text information from PDF files to create a custom node.



    Comment actions Permalink

Please sign in to leave a comment.