Working with the Green Web Open Datasets

The Green Web Foundation regularly publishes a dataset of green domain names, and who hosts them. We refer to this as the green domains dataset.

This data closely follows the data available over our Green Web check API, and generally speaking, analysis you might use the Green Web check API for, you can use the published datasets for, without needing to hit the API for each check.

Understanding the Green Domains dataset

Every check of a website in the The Green Web Foundation platform is recorded in a table called greenchecks. As of Feburary 2025, this table is more than 5 billion rows long, so is rather unwieldy to work with.

For this reason, the dataset we publish contains a single smaller table, greendomains, listing the domains that have shown up as 'green' in the last twelve months since the snapshot was taken.

The columns for the table are documented below:

Column Description
id the id of the last check
url the domain checked
hosted_by the organisation hosting this site
hosted_by_website the website of the company providing the hosting for this site
partner Deprecated: a status depending on whether domain belongs to one of the web green web partner organisations
green Does this is count as a green domain? 1 for yes, 0 for no. See our documentation page for more details on basis we use to decide on counting a domain as "green"
hosted_by_id the id of the hosting company
modified the time and date of the last check of this domain
created the time and date when this domain was created in this system, taking into account the 12 month rolling cut-off window of this snapshot. If a domain was first seen more than 12 months earlier than the date of the snapshot, but has not had a check run against it the last 12 months that showing as green, it will not be included in this snapshot.
Note: Columns marked as "deprecated" will be removed in future versions of the dataset. They may not be available after October 2025.

Example uses of this dataset

Because this data provides similar data to the greencheck API, this dataset can work like an offline cache, where making API calls for each check either would either be too slow, or you want to avoid making network requests against Green Web servers that disclose which domains are being checked. We've listed some examples of usage below:

Running custom queries on the Green Web Foundation datasets

This dataset browser is based on Simon Willison's open source Datasette project. This means that if you know SQL, it allows you to link to the results of running custom SQL queries against the latest green domains snapshot. See more in the relevant datasette documentation.

There is a timeout limit, so if you need to do complicated, expensive queries against a specific snapshot, we recommend downloading the snapshot to work on your own computer or infrastructure.

Licensing of the data

This dataset is released under the Open Database Licence.

Getting support with using the the Green Web Foundation datasets

We provide limited, free support for using the Green Web Datasets we publish, and are happy to provide advice or answer questions about this data if you want to use it in classes or research.

If you're interested in further analysis about the shift of the web away from fossil fuels, the Green Web Foundation has data going back to 2009, and we're happy to collaborations. Get in touch at hello@thegreenwebfoundation, or visit our contact page for more ways to reach us.