Introducing our co-hosted PhD in partnership with PIER
By Matt Burns and Dave Ranner
It’s no secret that the tech landscape is full of terms that are constantly changing, evolving, and being used in any number of ways. The world of online imaging intelligence and forensics is no different.
From MD5 hashes to Exif, there’s no shortage of complex jargon to be wary of. To help, we’re taking a look at some of the most commonly misunderstood terms in the tech and online imaging intelligence landscape – from understanding the differences between open-source software and intelligence to navigating the realms of the dark, deep, and open web.
Read on to learn more, and make sure to come back regularly – we’re always updating this list with the latest terms we encounter.
At a glance, we’re tackling:
A large part of what we do here at CameraForensics is thanks to the amazing network of collaborators and contributors developing and improving open-source software. From our R&D efforts right through to our web crawlers, open-source software helps us identify potential challenges, find greater areas of efficiency, and reach more capabilities, to help us further our mission.
Learn more about the value of open-source intelligence as we explore the possibilities of StormCrawler.
But what’s the difference between open-source software and open-source intelligence – two terms that mean completely different things but are often confused with each other?
While open-source software refers to the distribution and use of crowdsourced tools, open-source intelligence can be used to refer to any information gained from accessing a source on the open web (more on this term later). Certain teams will also use the term ‘open-source’ to refer to these capabilities differently. For example, when developers use it, they mean software, whereas when law enforcement use it, they are referring to intelligence.
Exif or EXIF? That is the question. While it’s true that EXIF was more commonly used in previous years – referring to an official acronym (Exchangeable Image file) – it has now evolved to become a term in its own right and no longer needs capitalisation.
Being pedantic, another common mistake is users referring to Exif and metadata as the same thing, when this isn’t strictly correct. As a specific subcategory of metadata alongside the likes of XMP and IPTC, all Exif is metadata, but not all metadata is Exif.
We even made our own tool to help easily extract Exif data for analysis – learn how we did it.
Helping to facilitate image searching, duplicate detection, CBIR, and more – hashes are an essential process in image retrieval, used in the CameraForensics platform and other search platforms.
While not a term that’s commonly misunderstood, it’s often forgotten that hashes aren’t a universal term and that there are many complex processes involved – some with greater efficacy than others. It’s important to understand that there are different types of hashes that are designed for different purposes. For example, “cryptographic hashes” like MD5 and SHA are designed to verify the integrity of a file and are perfect for knowing if two files are identical copies. “Fuzzy hashes” like PhotoDNA work in the opposite way by producing very similar hashes when images are slightly modified. This makes them great for finding similar images and derivative works etc.
When you perform a search using conventional search engines, you’re not searching the internet – but an index created by that search engine of all the web pages they’ve indexed (just like you search an index of an encyclopaedia to find out which page to go to).
Let that sink in for a moment.
This means that the quality of the results that you get are dependent on the quality of the search index. Anyone can create an index with a couple of hundred web pages on it, but it won’t be nearly as comprehensive or usable as a site like Google, which indexes thousands of pages per second.
Our image index is currently based on 4 billion images with more being added all the time, meaning that any images outside of this range won’t appear in searches. It’s important to recognise this distinction to gain a comprehensive insight into how the CameraForensics platform works.
What else makes a great index? Alongside how user-friendly and rapid the interface is, the quality of data is also incredibly important – enabling users to gain access to real and valuable intelligence that can actively aid research processes on-demand – or do we mean information? Or perhaps data?
While these terms are all used to refer to facts gathered through case research, or the potential to use facts, these terms are often used incorrectly and/or interchangeably. We thought we’d help out:
We all love data. These building blocks are used in every aspect of our lives and can feature a single binary figure, be a collection of facts, or a set of data.
“In and of itself, it is rare that action can or should be taken on raw, unevaluated information on its own.” writes Lisa Palmieri, President of the International Association of Law Enforcement Intelligence Analysts.
Differing from raw data - having been somewhat cleansed and formatted - information may contain some aspects of context.
When we ascribe value, purpose, and meaning to information, we get intelligence that is rich in context. Users can make strategic decisions and take affirmative action based on data that has been converted through the intelligence cycle.
These steps – planning, collection, evaluation, collation, analysis, production, dissemination and feedback, are all designed to ensure that information gained has been appraised and transformed to not only actionable, but reliable material.
More complex still, evidence indicates that intelligence gathered has been fully reviewed and tested, and can be used to support claims made in front of a tribunal to indicate, or prove, that a theory is correct.
As a result, evidence must be rich in value, with substantial context, and can be analysed and cross-referenced to be deemed accurate and true.
Just like information, intelligence, and evidence, it’s important to understand the difference between the dark, deep, and open web, as our indexing efforts are distributed across these domains.
Think of the open web as the public face of the internet. It’s what all of us have the most experience in using daily and is available from almost all browsers.
Alternatively, the deep web is hidden away from conventional search engines and accessed via encryption, speciality software, or passwords. For example, your private email account. The deep web is larger than the open web, and those interacting with it are mostly doing so for legitimate reasons – learn more in our blog.
This is drastically different from the dark web, which features illegal and hazardous domains and whose users wish to remain completely anonymous via tools such as the Tor Project. Users of the dark web can, if they wish, access a wide range of illegal content. However, the dark web also has legitimate uses, for example by oppressed individuals in countries that control freedom of speech.
As such, users are advised to follow advanced privacy practices if they wish to access the dark web, such as taking extreme precautions for anonymity.
It’s important to note that illegal activity and online harm exists on all three of the open, deep, and dark web. The only thing that differs is who can access it.
We’re committed to providing online intelligence investigators with the tools and support needed to source and identify victims of exploitation and abuse online. Equipping others in our community with the knowledge and education makes further action possible, helping us to make a greater worldwide impact.
Want to learn more? Visit our blog glossary of common digital image forensics jargon?