Working with the Green Web Open Datasets
The Green Web Foundation regularly publishes a dataset of green domain names, and who hosts them. We refer to this as the green domains dataset.
This data closely follows the data available over our Green Web check API, and generally speaking, analysis you might use the Green Web check API for, you can use the published datasets for, without needing to hit the API for each check.
Understanding the Green Domains dataset
Every check of a website in the The Green Web Foundation platform is recorded in a table called greenchecks. As of Feburary 2025, this table is more than 5 billion rows long, so is rather unwieldy to work with.
For this reason, the dataset we publish contains a single smaller table, greendomains, listing the
domains that have shown up as 'green' in the last twelve months since the snapshot was taken.
The columns for the table are documented below:
| Column | Description |
|---|---|
id |
the id of the last check |
url |
the domain checked |
hosted_by |
the organisation hosting this site |
hosted_by_website |
the website of the company providing the hosting for this site |
partner |
Deprecated: a status depending on whether domain belongs to one of the web green web partner organisations |
green |
Does this is count as a green domain? 1 for yes, 0 for no. See our documentation page for more details on basis we use to decide on counting a domain as "green" |
hosted_by_id |
the id of the hosting company |
modified |
the time and date of the last check of this domain |
created |
the time and date when this domain was created in this system, taking into account the 12 month rolling cut-off window of this snapshot. If a domain was first seen more than 12 months earlier than the date of the snapshot, but has not had a check run against it the last 12 months that showing as green, it will not be included in this snapshot. |
Example uses of this dataset
Because this data provides similar data to the greencheck API, this dataset can work like an offline cache, where making API calls for each check either would either be too slow, or you want to avoid making network requests against Green Web servers that disclose which domains are being checked. We've listed some examples of usage below:
- running local checks for privacy - a build of the privacy protecting search engine searx, uses this, to avoid needing to leak information
- checking domains as part of development workflow - tools which consume the green web foundation's green check API, like Sitespeed.io, or Website Carbon, can use this to avoid being reliant on the Green Web API for running checks
- running analysis to understand how the proportion of web running on "greener" hosting changes over time - The HTTP Archive Web Almanac uses this dataset in its bi-annual "State of the Web" report sustainability section, presenting various visualations based on this data.
Running custom queries on the Green Web Foundation datasets
This dataset browser is based on Simon Willison's open source Datasette project. This means that if you know SQL, it allows you to link to the results of running custom SQL queries against the latest green domains snapshot. See more in the relevant datasette documentation.
There is a timeout limit, so if you need to do complicated, expensive queries against a specific snapshot, we recommend downloading the snapshot to work on your own computer or infrastructure.
Licensing of the data
This dataset is released under the Open Database Licence.
Getting support with using the the Green Web Foundation datasets
We provide limited, free support for using the Green Web Datasets we publish, and are happy to provide advice or answer questions about this data if you want to use it in classes or research.
If you're interested in further analysis about the shift of the web away from fossil fuels, the Green Web Foundation has data going back to 2009, and we're happy to collaborations. Get in touch at hello@thegreenwebfoundation, or visit our contact page for more ways to reach us.