Public Utility Data Liberation (PUDL)

Catalyst Cooperative
United States of America

About

Launched: 2017
Record Updated: Jun 03, 2025
Open scholarly dataset
Web archiving system
The PUDL Project (pronounced puddle) is an open source data processing pipeline that makes US energy data easier to access and use programmatically. PUDL provides free, up to date, version-controlled and analysis-ready energy data products.
Hundreds of gigabytes of valuable data are published by US government agencies, but it's often difficult to work with. PUDL takes the original spreadsheets, CSV files, and databases and turns them into a unified resource. This allows users to spend more time on novel analysis and less time on data preparation.
The project is focused on serving researchers, activists, journalists, policy makers, and small businesses that might not otherwise be able to afford access to this data from commercial sources and who may not have the time or expertise to do all the data processing themselves from scratch. PUDL data supports a number of other open-source energy modelling and analysis tools, such as PyPSA-USA, PowerGenome, and OpenGridEmissions.

Mission

Rapid and equitable decarbonization requires free, accessible and open-source energy data. Government agencies all collect and publish primary US energy system data. However, existing data formats and known quality issues require extensive data pre-processing. Commercial products provide analysis-ready data, but opaque processing techniques and re-use restrictions limit their usefulness, and their high cost makes them inaccessible to many stakeholders working on critical environmental justice and energy transition issues.
Our long-term vision is an open-source PUDL ecosystem that: 1) provides equitable access to high-quality energy data; 2) serves as a community repository of well-documented, reproducible data cleaning and analysis pipelines essential to energy system research and advocacy; 3) is accessible to users working with different tooling; and 4) is an inclusive and welcoming environment for users and contributors.

Key Achievements


  • Recipient of an NSF Pathways Towards Open-Source Ecosystems (POSE) Phase I grant in 2024.

  • With support from GridLab, collaborated with the developers of the open-source energy system models PyPSA-USA, GenX, and GridPath to develop open energy data inputs that serve modellers' needs.

  • Launched a new user portal using AG Grid, Flask, DuckDB, Parquet files in the AWS Open Data Registry, and Web Assembly. This portal makes it possible for users to easily explore billions of rows of energy data stored in Parquet and export subsets to CSV.

Technical Attributes

Maintenance Status

Actively Maintained

Open Code Repository

Implemented

Technical Documentation

Implemented

Code License

Implemented

Open API

Implemented

Open Data Statement

Implemented

Technical Attribute Statements

Programming Languages

  • python

Technology Readiness Level

  • Actual system proven in operational environment

Code Licenses Used

  • MIT License

Content Licensing

  • Creative commons licenses

Standards

Metadata

  • DataCite metadata schema
  • JSON

Persistent Identifier

DOIs

Integrations

  • Creative Commons Licenses
  • DataCite
  • Zenodo

Community Engagement

Code of Conduct

Implemented

Community Engagement

Implemented

Community Statements

Community Engagement Activities

  • Blogs
  • Conference participation
  • Development sprints
  • Mailing lists and discussion forums (including Slack)
  • Social media
  • Staff roles with responsibility for community engagement
  • User research
  • Webinars and training

Engagement with Values Frameworks

  • Contributor Covenant Code of Conduct for Open Source and Other Digital Commons Communities

Policies & Governance

Policies

Privacy Policy

In Progress

Governance Structure & Processes

In Progress

Policy Statements

Board Structure

Board of directors

Board Level

As a democractic worker-owned co-operative, we operate with one vote per member. Currently, all members of Catalyst Cooperative serve on the board. The board has absolute control over the organization, a fiduciary responsibility to the co-operative, and is primarily responsible for ongoing financial planning and decision-making, approving policy changes, and hiring. According to the Catalyst Cooperative bylaws, the board is mandated to serve the following purposes:
"The Cooperative shall strive to perform data and policy analysis in support of creating a more just, livable, and sustainable world for all humanity with focus areas including climate and energy policies, electric utility regulation and finances, and land-use, transportation, and housing policies.
The Cooperative shall strive to provide members with fulfilling work that is equitably compensated, while also allowing them the flexibility to pursue other paths in life. The Cooperative may produce free or low-cost and thus more widely available data products, applications, and policy analyses to help educate the public, inform public policymaking, and level the playing field between profit-maximizing and non-commercial interests.
The Cooperative may also have as one or more of its purposes any purpose or purposes permitted for cooperatives under Colorado law. The Cooperative shall be operated on a cooperative basis for the mutual benefit of its members."
See our bylaws for more information.

Community Governance

  • Ad hoc

Additional Information

Organizational History

Electric utilities report a huge amount of information to the US government and other public agencies. This includes yearly, monthly, and even hourly data about fuel burned, electricity generated, operating expenses, power plant usage patterns and emissions. Unfortunately, much of this data is not released in well documented, ready-to-use, machine readable formats. Data from different agencies tends not to be standardized or easily used in tandem. Several commercial data services clean, package, and re-sell this this data, but at prices which are too high to be accessible to many smaller stakeholders.
PUDL's founders were working to make the case for early retirement of non-economic coal plants in Colorado, and they saw the need for analysis-ready data that public interest advocates could use. In 2017, PUDL grew out of their early data cleaning efforts.
We created Catalyst Cooperative to help more people bring data and analysis to the fight for clean energy and a stable climate, and to provide an organizational home for PUDL. To this day, PUDL remains a core part of the organization's work, but we also collaborate with researchers, policy makers, climate advocates, journalists, non-profits and mission-aligned business to address data analysis and data engineering challenges. PUDL development is supported by foundations such as the Alfred P. Sloan Foundation and the Mozilla Foundation, as well as by organizations such as RMI and GridLab.

Organizational Structure

Business or Ownership Model

Other

Full-time Staff

6-10

Volunteers

0

Shareholders

Yes

Current Affiliations

RMI
GridLab

Funding

Primary Funding Source

50% grant-funded, 50% client contracting

Financial Reporting Level

  • Provider

Funding Needs

Typically, open source projects rely on a blend of grants and donations to keep their lights on. Grant funding remains an essential resource for us in supporting the development of new tooling and datasets. However, there is plenty of other work that we do on PUDL that needs more consistent funding, but that doesn't neatly fit into traditional RFPs or client-specific projects.
We need long-term operational funding to support ongoing maintenance and development of PUDL, including:
- Monthly archives of all our raw input datasets.
- Quarterly versioned data releases.
- Computing resources for continuous integration testing and nightly data builds.
- Maintenance of software dependencies, documentation, and example notebooks.
- Distribution of PUDL data to the AWS Open Data Registry, Zenodo, Kaggle, and web interface.
- Immediate support for bugs and data quality issues
- Incremental adaptation of the PUDL infrastructure.