Editor’s Note: This article was written by Michael Cartwright of Solid Documents. His company specializes in software to convert PDF to Excel and other common formats.
What is Metadata?
Metadata is data about a key piece of information such as an electronic document or file. For example, consider a video of a soccer match. The metadata would be the part of that video file that says ‘this is a video of a soccer match’ or ‘this video is called World Cup Playoffs’ or ‘this video is 5,353 kilobytes in size.’ Programs, images, documents, 3D engineering models, databases, and even entire electronic libraries each can contain metadata which help to bring them further definition, identity, and searchability.
When it comes to the Portable Document Format (PDF), metadata can be added in the form of document properties. Typically, these document properties include things like search keywords, title, author, and subject. Adding this type of information about a PDF document allows for faster, more efficient archival and document retrieval.
Some metadata is automatically generated when you first create a PDF file. Automatically created metadata includes information such as which program was used to create the document, the file size, whether or not the file has been optimized for the Web, and so on. Other metadata, such as author and keywords, is added by you.
Metadata can be stored either internally or externally. In a PDF it is stored internally, meaning the metadata is attached to the file itself and goes wherever the PDF goes.
Uses for PDF Metadata
- Search Engine Optimization (SEO): Adding metadata information such as title, author, and keywords to your PDF makes it much easier to find your document using an online search engine. For example, imagine you run a travel agency, and your document is a brochure that outlines the benefits of your service over your competitors. By including multiple keywords in your document metadata such as ‘travel abroad’, ‘cheap fares’, and ‘flight plus hotel,’ you increase the chances of a potential customer finding your document in a search.
- Database storage and retrieval: Adding metadata to your critical documents such as legal contracts, transactional records, tax records, and financial reports, ensures that these documents can be efficiently archived and retrieved using a simple search by keyword, title, author, file size, and so forth.
- Copyright information: You can use metadata to store copyright data. Adobe Acrobat allows copyright metadata to be added manually. Document metadata is an ideal location for copyright information, because it is not readily visible within the body yet it is attached to the document, thus providing proper copyright notice.
- Review document properties: You can view a PDF’s metadata to determine its properties. For example, you can check to see if the file has been optimized for rapid downloading on the Web. Also, the document properties contain information about the program used to create the PDF.
General Tips for Metadata Best Practices
- Minimize keyword ambiguity: To optimize a PDF for online locating and retrieval, it is essential that the keywords used are as clear and relevant as possible. Many words can mean the same thing, and one word can have many meanings, so choose your keywords carefully. Try to use the most commonly used synonyms. For example, cars are also referred to as autos, automobiles, vehicles, and so on. You should use the most commonly used or most commonly accepted term for cars in your keyword metadata if you want potential customers to find your document.
- Consistency: If you are using metadata for archival and easy retrieval, consistency is critical when identifying titles, keywords, authors and so on. Make sure each person who is part of the process of creating, storing, and retrieving documents uses consistent naming conventions and keyword usage. For example, when entering names, be specific as to whether to use the entire first name (‘James’) or if just the first initial (‘J.’) is acceptable.
- Prevent accidental unwanted disclosure: There are many instances, especially in the legal profession, where a document’s metadata may contain sensitive information. The metadata might accidentally be seen by third parties unless it is removed. Make sure to check and remove anything from the metadata that you do not want to share before sending an important document.
- Overstating the threat of metadata in PDF documents, Planet PDF