Jelly Beans Project
This is the official site for the findings, code, and data of the Jelly Beans paper. A draft version of the paper can be found here (PDF). The input domains used were the Tranco top 1 million (code NYJW), available here.
Data
- Input Domains (17MB) - the Tranco list in alphabetical order
- Crawl Status (95MB) - when each domain was crawled and whether that crawl was successful
- Crawl Status Table - how to decode numerical crawl status
- Search Elements (259MB) - the HTML elements or URLs used for search on each website
- Leakages (2.6GB) - for each domain the leakages across URL, headers, and data
- Privacy Policy Links (18MB) - links to privacy policies for each domain where possible