Bulk Data Files

Shodan provides a few different datasets as bulk data files:

banners-daily, banners-hourly: contains all the banner/ service information that crawlers collected during a given day/ hour. Each file is compressed using zstd and contains a single JSON-encoded banner per line. The most recent 30 days are always available for download. This data powers the /shodan/host/set of API endpoints. Visit the data dashboard to get a sense of what the latest banners-daily file contains.
raw-daily: the legacy dataset containing the banner data. It’s formatted using gzip. We continue to support this dataset but for new projects we recommend the `banners-daily`/ banners-hourlydatasets.
dnsdb: DNS data gathered using OSINT techniques. This data powers the /dns/domain/ endpoint of the API.
internetdb: SQLite database file that contains minimal service information but is small enough to fit into memory. It powers the InternetDB API.
cvedb: SQLite database containing information about the CVEs published to NVD. It powers our public CVEDB API.
internet-scanners: contains a list of IPs that have been observed scanning the Internet within the past 24 hours. This data is used to add the scanner tag on the banners.
ping: a CSV containing the results of a ping sweep of the Internet.
whoisdb: MMDB database file containing Whois information for all IPs on the Internet.
routesdb: database files in MMDB, Sqlite, Jsonl contain Internet Routing Registry for all IPs on the Internet.

Bulk Data API

The Bulk Data API methods provide a programmatic way to discover and download all the raw data files that Shodan generates. And the data itself is stored in the cloud for optimized delivery across regions. The current methods for the API are documented on the developer website.

The /shodan/data method returns a list of available datasets and metadata about them:

[
    {
        scope: "monthly",
        name: "internetdb",
        description: "Minified database containing network information about all IPs on the Internet"
    },
    {
        scope: "monthly",
        name: "dnsdb",
        description: "DNS data for active domains on the Internet"
    },
    {
        scope: "daily",
        name: "banners-daily",
        description: "Data files containing all the information collected during a day"
    }
]

The /shodan/data/{dataset} method returns a list of URLs that can be used to download the files within a dataset. For example, the below shows part of the response for the /shodan/data/raw-daily request:

[
    {
        "url": "https://...",
        "timestamp": 1611711401000,
        "sha1": "5a91f49c90da5ab8856c83c84846941115c55441",
        "name": "2021-01-26.json.gz",
        "size": 104650655998
    },
    {
        "url": "https://...",
        "timestamp": 1611655444000,
        "sha1": "ea29acc25fc154ac64dde0ab294824ae7f1f64c9",
        "name": "2021-01-25.json.gz",
        "size": 152517565458
    },
    {
        "url": "https://...",
        "timestamp": 1611540775000,
        "sha1": "aed18f2a952df7731fec447d81ead8a96907000d",
        "name": "2021-01-24.json.gz",
        "size": 161275556509
    },
    ...
]

Downloading the Data

The Bulk Data API files are hosted on Backblaze B2 which supports the ability to download the data in chunks which means you can use multiple connections to download a single file. It will significantly speed up the downloads if you can take advantage of that, especially as the bulk data file sizes continue to increase. The recommended tool for downloading the data is aria2c. The following is a sample command using aria2c that downloads a file with 4 concurrent connections to the server:

aria2c -x 4 -s 4 -o filename.json.gz http://<bulk-data-url>

The aria2c process will pre-allocate the entire data file and then fill in the data as it is downloaded.

Quickstart

If you’re just getting started and want to try out the Bulk Data API then check out the Shodan CLI. It supports all the Bulk Data API methods. For example, to get a list of the available datasets:

shodan data list

To get a list of files for a dataset:

shodan data list --dataset=banners-daily

And then to download a specific file within a dataset:

shodan data download internetdb internetdb.sqlite.bz2

However, the shodan data download command downloads the data using a single connection which will be significantly slower than using a tool such as aria2c. Below is the equivalentaria2c command to download the InternetDB SQLite file using 4 concurrent connections:

aria2c -x 4 -s 4 -o internetdb.sqlite.bz2 https://f001.backblazeb2.com/b2ap...

Useful Links

Datapedia: https://datapedia.shodan.io/
aria2c: https://aria2.github.io/
Developer documentation: https://developer.shodan.io/api
Postman collection for REST API: https://www.postman.com/shodanhq/workspace/shodan/folder/5677612-ed460277-6845-4a40-9f5e-ba803cfa9f74
Shodan CLI and Python library: https://github.com/achillean/shodan-python