Digital Fingerprint is nothing more than a virtual fingerprint, and therefore a set of data characteristic for a given device that can identify the user when re-entering a given page. In this regard, they resemble cookies, but are much more advanced, and the user can't delete them. Specially prepared algorithms ask questions about given device and collect information from the browser, operating system and even individual programs.
How does fingerprinting work?
Fingerprinting in computer science is a procedure based on carefully created algorithms. They allow the processing of large data sets (such as computer files) into much shorter bit strings, which are somewhat a digital fingerprint. This makes it possible to faithfully reproduce the original data set. More advanced algorithms allow the creation of a fingerprint of the entire device, based on the fingerprints of its components.
Digital fingerprinting techniques are based on the assumption of virtual uniqueness, which means that the probability that the algorithm will create the same fingerprint for another data set must be minimal. Often, given document files differ only in their nuances - the correct fingerprint algorithm should guarantee that the fingerprints generated by it have the desired level of certainty. If it happened that the two files gave the same fingerprint, we would have to deal with a collision and unambiguous identification of data would be impossible. Therefore, virtual fingerprints are usually at least 64-bit long - this is to guarantee their virtual uniqueness.
Rabin algorithm and cryptographic methods
The prototype of today's fingerprinting solutions was the Rabin algorithm, equipped with accurate mathematical collision probability analysis. It is an asymmetric cipher, based on the difficulty of calculations. It requires prior determination of the w-bit "key" and selection of r and s strings without its knowledge, so it can guarantee uniqueness. The Rabin method, however, does not protect against malicious attacks, you can easily discover the key and use it to modify files without changing your fingerprint.
Cryptographic algorithms such as MD5 and SHA were another precursor for fingerprinting. Performing them takes much more time than the Rabin algorithm, but they provide greater security in the event of malicious attacks. However, they lack proven guarantees of collision probability.
Fingerprinting in practice
The tool installed on a specific website asks the browser a series of questions - about installed plugins and software, the size and resolution of the screen, graphics card, time zone, list of supported fonts and many other variables.
Digital fingerprints have emerged to enable comparison and transmission of large data sets, but also as solutions that ensure copyright protection, fraud detection and tracking of criminals. Digital fingerprints can also be used to deduplicate data. Most often, however, fingerprints are used to identify a given user or his device - just like ordinary fingerprints, they uniquely identify people.
Thanks to digital fingerprints, a web browser or proxy server can check if the source file has been modified by creating a fingerprint and comparing it with a previously downloaded copy.
The HashKeeper database, run by the National Drug Intelligence Center, has a fingerprint database of "good" and "bad" computer files and makes it available in law enforcement applications.
Fingerprinting is also used in online payment systems such as card not present to detect and block those users who are trying to pay with stolen cards.
Virtual fingerprints are also often used in anti-fraud systems. An example here may be their use by our website. The algorithm created (and systematically improved) by experts employed by TrafficWatchdog consists of over 200 different variables and allows to identify and block fraud publishers and other online fraudsters.
Disadvantages of Fingerprinting solutions
There are no perfect technologies, so creating digital fingerprints also has some disadvantages. If the algorithm used in it is stable, and therefore not very susceptible to changes introduced on the device, it may be associated with low collision resistance. These types of solutions are usually used on sites with low traffic. However, if the algorithm ensures the high uniqueness of its fingerprints, it probably consists of many variables, which significantly shortens the life of the fingerprints it creates. This problem is solved by combining many types of fingerprints with each other, but you should be moderate - using too many different solutions will also reduce stability. The biggest difficulty associated with the methods using the fingerprinting technique is the constant determination of the golden mean so that the system is stable, but also provides a variety of fingerprints created by it.