Metadata, plain and simple, is data that is used to describe other data. A photograph is a good example of how it works. Every time you take a photo with a digital camera, metadata is recorded with the photo. The date & time, author, equipment used, and location would all be considered different pieces of metadata. You may not be aware that a piece of data has metadata attached to it, but let me assure you, it’s there. Most things in the digital world have metadata attached to them.
Why is it important in OSINT investigations?
Metadata can be on a wide range of different items like webpages, photographs, spreadsheets, and videos, and sometimes it can be just as important as the data itself when it comes to conducting research and investigations.
Without good quality metadata, the chances of your search being good is low. If you’re able to be confident in metadata on the items you’re searching against, you will be able to find the content you actually need more efficiently.
It will help make sure you are reviewing the correct results and are investigating the most relevant information, giving you a huge advantage when searching for data online.
Just imagine you had 1 million random news articles in front of you and you were looking for any article regarding a specific incident that occurred last weekend in Texas. Instead of inspecting the 1 million articles, one by one, wouldn’t it be easier if you could search against the date the article was created or the location the article was referring to?
Well, that’s when metadata would come into play, When articles are created, metadata is created, like the date, location, etc, making it possible for us to search for those specific articles. With the use of filtering, we can search against certain dates and locations you are interested in. This could help drop the workload from 1 million articles to 100, a much more manageable number.
Without the metadata, you would have to research every single one of the posts, one by one to see if not only if it is relevant to the incident you are investigating but also if it has occurred in the correct time and place.
How is metadata created?
There are two main ways metadata is created: manually and automatically.
Manually created metadata is considered to be the more reliable and accurate way of creating metadata because the data is verified by a human eye before it is inputted, meaning there is ostensibly less room for error.
This doesn’t mean it is bullet proof as human error plays a factor. As humans, we do tend to make mistakes. In addition, depending on the data, creating the metadata can get quite tricky. It requires a trained eye to cross-check information, ensuring that the metadata added is relevant and appropriate to the data it is added to.
Automated metadata tends to be fairly basic, less rich, but it helps to remove the human error from creation. Another benefit of automated metadata is it can handle a huge quantity of data and it can be created much quicker. You would need a team of 1000s to be able to handle a massive quantity of metadata and even at that, they wouldn’t be able to create it as quickly.
Auto-tagging metadata helps to defeat the problem of time and resources as metadata is generated automatically, with little to non-user interaction.
A downside to automated metadata tagging tools is that they need to be trained to understand the user’s intent. We basically need to teach them how to do their job which could be costly upfront.
What types of metadata is there?
When looking at metadata, there are a few different types to be aware of, I will talk about 3 different types, Structural, Descriptive and Administrative.
Structural Metadata is metadata that refers to how the data is formatted and assembled. For example, consider the table of contents in a book. It helps to explain how each piece of data relates to each other. An example of structural metadata would be if an author of a series of books wrote a note in the book indicating it was the first version of more to come.
Descriptive Metadata is the most common type of metadata people are aware of. This helps us identify specific qualities or characteristics related to it such as titles, dates, locations, etc. An example of this would be if you took a photograph, and that photograph had a location of Texas and was taken with a specific camera on a specific day and time.
Administrative metadata is used to give important instructions about a file. It is used to inform on any restrictions a file might have, for example, permissions or how it was created. An example would be a file on a computer that I could access but you couldn’t. This is sometimes handled in administrative metadata that determines the permission levels.