Diddy blud OpSec guide
Sections
[SX0] preamble
[SX1] tactics without running the forum
[SX2] tactics If LEA controls the forum
[SX3] aftermath: once they have identifying information
[SX0] preamble:
Law Enforcement Officers (LEOs) from Law Enforcement Administrations (LEAs) have a multitude of tactics and tools to try to catch people visiting CP forums ("Pedo-boards") or downloading on topic content.
This piece tries to give a glimpse on these, it does not try to be exhaustive, but shows the variety of what has been observed in the wild and has been confirmed or suspected to be from LEAs.
I write suspected because some of these were not positively identified as part of a live active LEA arsenal but could also have been used
- by blackmailers whom in turn could hand over information to LEA
- by vigilantes (Pedo Hunters) whom could try to locate us and conduct a campaign of destruction
- by researchers doing studies that will lead to LEA tools.
It will not try to address personal plausible deniability, the arrest interrogation techniques to obtain keys or other information about the techniques for taking over a server ...
[SX1] Without running the forum themselves:
1) metadata information : When a user shares one or several files, part of the file is the data that is used to playback/display the content, part of the file is information describing the data. Some software will include identifying information: GPS coordinates, computer username, software license number, phone identification number, hardware model and serial number, date and time of capture with timezone, operating system unique id, cloud software sessionID, source files hashes, etc. EXIF is the de-facto standard container holding these for pictures and movies.
Law enforcement administrations and child protection services "crawl" (they explore) CP sharing sites to download any content, and will gather basic and extended information to tag the content in their investigation database and, if the LEA is equipped, a "Secure Digital Evidence Vault"
illustration of what could be in the database entry:
- file names list starting with first observed/oldest name
- file size
- first observed date
- rarity
- checksum of the file (give the file a unique identifying string, that should be unique among all other files, to be able to quickly assess if it was already shared, if it was a variant, etc and to access quickly its detailed information in a database)
- locality assessment
- language assessment
- list of text strings visible
- list of audio words spoken
- metadata container, including
... date of capture of the data
..."capture/video taken" dates and hours
...camera model and ID
...GPS coordinates
...creator username
They usually won't care much about the little details but still they most likely grab the full container
locality and language assessment will be based on metadata as well as visual and audio clues : a book cover , wall illustrations, furniture, makeup and other products brands, names on tatoos.
They use AI models to get a language and location along with words and texts. Other models will give size and location of skin marks on a body, estimate ages, and these will be used in investigations to confirm. We live in the AI era. Paranoid countries spend a lot of money funding research, and it's not for the good of the people. They promote mass robotic examination rather than precision pincers investigation. But we digress.
If metadata are precise enough to geographically identify the producer who created the content, they will transmit the information to local law enforcement contacts, and those will in turn move to build a judiciary case, depending on the "urgency". If the content was created recently, the case is urgent and might be built in a matter of weeks, or even days if the content shows a child in high distress (hurtcore of babies/toddlers, kidnapping etc)
International cooperation means pictures, videos and texts are shared between countries for incremental analysis.
2) personal information shared through private messaging: agencies will send bait messages in forums, to initiate private discussions. In the private discussions, persons specialized in manipulation will try to extract private information from other users, like country/city of residence, phone number or messaging information, clear email , etc. usually it comes with the promise of fake encounter and sex, but it might be more insidious: unseen material to be paid in bitcoins or other traceable crypto currency.
3) active gathering of personal information through forum chat topics: agencies might try to gather information through topics that might not be damning taken alone, but may very well be when all information is cross-referenced. This is "Probing".
Examples:
on topic
Hot destinations in south east asia to have sex with children and how to engage locals ?
Have you ever made love to your brother or sister ?
Are you excited by your daughter ?
Would you engage in conversation with a girl sitting next to you in a plane ?
Do you meet many children in your dayjob ?
Would you babysit your neighbours daughters ?
This is used to gather targeted data for social engineering.
They might also probe personal interest off topic:
What are your favorite movies ?
What are you listening right now ?
These topics may be started by people just trying to make a forum alive, but are very valuable for LEAs.
4) post of dangerous content: LEA will post content with alluring preview and title (hoarded content, etc)
there are 5 kinds
- hosting site infected
- archive infected, not content itself
- content infected
- fake content that leads to entrapment
- indirect trap marquee
the goal is always the same: retrieve identifying information of the local computer/user, then send the information back to the sender. "deanonymization" is what the art is called.
If the hosting is infected: it will deploy malware when the user reaches the host or tries to download the file, by either relying on poor setting of the user environment (leaving javascript enabled) or by leveraging a zero day Tor Browser exploit that uses an unpatched vulnerability allowing to bypass the user environment settings (like activating javascript when it should not be activated, for example by making noscript forget to parse the page content, this has been done in the past by spoofing the page type), hence why people say to disable javascript entirely from within about:config in TB and not just rely on the safest setting, this shuts down the javascript engine at the lowest level.
If the archive is infected: it will deploy when the user tries to decompress the content, often the content is fake and will show a checksum error/wrong password
If the content is infected: it will deploy when the user tries to read the decompressed content or, if the operating system tries to build previews, might deploy when the operating system tries to create a thumbnail
If it is a fake content : the file will just open a stream from the LEA servers, and if the whole computer is not passing through TOR this will open an unprotected channel, damning if the user is not using a proxy chain for his media.
if it is an indirect trap marquee: the content will be accompanied by another file bookmark or "readme" style that will direct or ask to reach a compromised website address/contact a compromised email or the video will display said address/email in its video stream as an intro, overlay or outro.
The goal is always to get identifying information: usually an IP address (identifying address, like a postal adress for the Internet), accompanied preferably by a MAC address (in broad strokes, identifying the network hardware), user name, OS, license, geolocation if the system is a mobile system or the user allows tracking information... When the user has not forfeited his real identity and address in broad daylight, the data outside of IP and MAC are used to try to be able to more uniquely identify the user (IPs can be shared, MAC are not unique... but combination of IP+MAC + ... is very identifying, geolocation can be spoofed so it's not a damning evidence but combined with the rest gives strength).
The sensor code, if there is one, any can act like a virus, trying to elevate itself high enough in the system authority rings to be able to capture as much personal information as possible.
These may leave remnants, dormant, that will frequently try to contact "home" to make sure they identify you when you are at the weakest, they might even try to go to the top chain of communication, to be between the user's computer and the first proxy/vpn connection.
(blackmailers and secret services also deploy keyloggers and media grabbers, sending all keystrokes typed, screenshots, webcam shots, microphone samples, etc, not the common threat but it can happen)
This is why it is recommended to be careful when downloading from a new member, to never click on links that lead to hosts not in the recommended hosts of the forum, to not follow sites and contact email displayed in advertising CP content, to decompress and view on another system or virtual machine not connected to the internet and preferably use forgetful systems.
5) dangerous link in a post: a link posted in chat or other sections, may lead to a compromised site. Which will act like a compromised hosting site for content described above. One should tread very carefully when a link is posted, because two letters inverted from a known clearnet site could lead to a trap.
6) content host snitching: if the user uploads a file to a host, LEA can ask the host all the information they have on the uploader of the content. If the user uploaded outside of TOR to bypass a torblock (blacklisting of TOR network IPs), the information transmitted by the host will be used to further identification efforts (if it is a end user IP, the information is damning). They can also ask for all downloaders information for said content. It is important to understand the risks associated with each host (dl.free.fr for example is the file exchange service of an ISP, so any client of the ISP not doing a thorough enough compartmentalization might pass through his cookies damning identifying information, even if a vpn is used).
The worse snitch is Google, they actively monitor all content transmitted through their services before encryption, using AI models to identify potential unknown CP content ("child pornography" tag with high enough probability). This encompasses google drive content, gmail, and all other potential services (cloud AI etc). They have dedicated personnel handling these, and they will snitch on their customers to the local LEAs.
Microsoft is known to snitch as well.
If a user sees his account on service snitches blocked after a CP integration, even if it's just in a draft unsent, then the user has been identified and the personal information if any is available is sent to authorities
some notable other sources of busts, not generated by forums themselves but users behavior outside of the forum:
7) cloud drives sync: cloud storage providers can and some will systematically analyze content synchronized to their systems and will send to LEAs any leads on CP stored within their services
8) Peer to peer: LEA monitors peer to peer exchanges on known on topic content. They integrate into the sharing process to try to identify users sharing CP.
9) Webcam : LEA can use young-looking agents to try to trap users, trying to get personal information or setting up real life meeting.
[SX2] If LEA controls the forum:
1) They are monitoring any private conversation and have access to users passwords:
they will try to identify per user: times of activity, languages spoken, mail addresses, crypto currency wallets, passwords used, number of contacts, if producer or not, any identifying information (City,phone number, etc). They will try to access the mails with the passwords used on the site, they will try to match the passwords and mails with banks of password hashes they have access to from snitch providers (high profile cloud software like adobe creative cloud and the like, since they are used to create content) in the country they managed to identify (this has to be done locally usually, except if the agency is American, they have global reach i believe or other international cooperation members)]
2) VIP registration of payment: if there is a paid VIP access, they will use traceable means, or may have a trace of all previous transactions in the site database
3) They will try to infect users: using the same kind of technology as infected hosting sites , they will try to extract identifying information. The site will be used as a platform to test new exploits, trying to break people's defenses
4) If the user is a high enough profile target, they might try to ensnare him.
to ensnare: they will try to deanonymize the user by using advanced network techniques and exploit VPN vulnerabilities to try to get to the precious source IP or at least source vpn/proxy.
Deanonymizing network traffic is difficult, and a continuous stream of information exchange is preferable. They will manipulate packet (chunks of data) sizes, fabricate artificial latency patterns, to try to localize the user among the network connections. The goal is to trace back the user to his entry point in the network through traffic correlation.
As help in the task, the international LEAs have their own server nodes (it is indeed highly suspected states run nodes in the TOR network). They can try to overload the others to skew the valuable traffic more heavily onto their own, capturing more data from their targets.
They will also try to exploit TOR software weaknesses to try to make the j*b easier.
Technique like deep-learning are used to get a likelihood of a traffic packet string to be from the "compromised site-to-target user" communication exchanged. Deep-learning is just a matter of brute-force parameters alterations in neural network analysis to try to identify hidden correlations. If the communication pattern is properly finger-printable, the results are more statistically precise.
If the entry point is compromised, and traffic is identified as from the target, they have ensnared the asset.
[SX3] aftermath: once they have identifying information
they either:
1) have real source information, usable as is (IP linked to an end-user internet connection, username/location clear enough, GPS coordinates )
2) have transitory information, requiring further investigation (proxy/vpn IP)
If they have real source of information, cases are put in order of urgency and will be processed depending on the user profile (if he's a suspected politician, member of a British royal family, high level financial person etc), the threat (known terrorist, planning a kidnapping, active producer, etc), political agenda and the country agency's blood-thirst (astute if in need of funding)
If they have only transitory information, and the user is of enough interest, they will try to get identifying information from the proxy/vpn holders or other sources.
- some are willing snitches, they give information, case closed. They defend such treason by explaining the user acted against the "terms of service"... some might even be run by LEA !
- some are unwilling snitches, they give flood of information (dump of system data), further investigation is then required to gather the needed information
- some don't comply, the agency might proceed to a seizing of the servers, dump of system data.
When a dump of system data is done, if the host keeps logs of source IPs or account IDs that match the traffic (reaching TOR entry point at times compatible with visible activity) , then a case might be built. Otherwise, they will need to cross this information with ISPs of suspected location, to identify whom among their customers in the estimated area accessed the services.
An important tool used to make sure the right person has been identified, or identify hard-to-break people's defense is to exploit social engineering.
Social engineering is based on a global gather of social networks exchanges and open clear forums that will be put to test with personal information gathered on the CP forums then cross-referenced with extended sources of information.
Each message exchanged on the clear will have tags attached, determined by an AI. This will range from "plane travel", "strawberry positive", "family of 4"+"daughter 10yo"+"daughter 5yo", "C programming language litterate", etc
coming from Twitter, Facebook, medical forums, tech forums, etc
When LEA has conducted an intensive probing campaign, gathered personal information by PM, or seized the CP forum database, they will do a similar tagging j*b.
Both sources of information are then crossed to give possible candidates and likelihood of match (this takes a LOT of personal information, but some are weighted very heavily, for example liking "Pulp Fiction" or "ice cream" will have much less weight than "lives in Washington D.C." ).
If likelihood is high enough, and candidates few enough or the target of high enough value, they will proceed to extended cross references.
This will try to match similarities in text in for example
- Vocabulary used
- smileys habits (types and frequency, it becomes second nature to people, smile :-) )) ;D (^-^; )
- Spelling mistakes
- Punctuation use and text formatting
(See "writeprint" in wikipedia)
And match chronological information with their intelligence databases ( taxes, school directories, insurance, credit, medical, airlines, car registration, credit card transactions... )
If they know the user lives in DC, is 40 to 45, democrat, from a family of 5, with a family of 4 of his own, that he visited Thailand this year, has a master degrees in a scientific field, bought a car in February, streamed a specific netflix series at a specific date, writes two smileys per paragraph, makes the your->you're mistake 50% of the time...
it is obvious the likely candidates are severely reduced in number.
This is of particular danger for regulars, people who keep personas across boards and those who are a little too chatty and also writing truths.
There are humans behind investigations, and humans are cunning.