Jelly Beans Project

This is the official site for the findings, code, and data of the Jelly Beans paper. A draft version of the paper can be found here (PDF). The input domains used were the Tranco top 1 million (code NYJW), available here.

Data

Input Domains (17MB) - the Tranco list in alphabetical order
Crawl Status (95MB) - when each domain was crawled and whether that crawl was successful
Crawl Status Table - how to decode numerical crawl status
Search Elements (259MB) - the HTML elements or URLs used for search on each website
Leakages (2.6GB) - for each domain the leakages across URL, headers, and data
Privacy Policy Links (18MB) - links to privacy policies for each domain where possible

Code

The code is available as a zip here (10MB). We have licensed it under the 3-Clause BSD License. We will upload the code to GitHub after the paper is accepted to preserve author anonymity. We ask the reviewers not to distribute this code until after the publication deadline.