Dataverse

Harvard University Institute for Quantitative Social Science (IQSS)
United States of America

About

Launched: 2007
Record Updated: Oct 04, 2024
Repository software

The Dataverse Project is an open-source web application to share, preserve, cite, explore, and analyze research data. It facilitates making data available to others and allows you to replicate others' work more easily. Researchers, journals, data authors, publishers, data distributors, and affiliated institutions all receive academic credit and web visibility.

A Dataverse repository is the software installation, which then hosts multiple virtual archives called Dataverse collections. Each Dataverse collection contains datasets, and each dataset contains descriptive metadata and data files (including documentation and code that accompany the data). As an organizing method, Dataverse collections may also contain other Dataverse collections.

Mission

The mission of the Dataverse Project is to revolutionize the way data is managed and shared by automating tasks traditionally carried out by professional archivists. Our goal is to empower data creators by providing services that allow them to receive proper credit for their data while ensuring long-term preservation. We aim to eliminate the dilemma researchers faced in the past, where they had to choose between control and credit or preservation. With the Dataverse Project, we break this dichotomy by creating a Dataverse collection on your website that maintains your branding and URL, offers academic citation for the data, and provides full credit and visibility. Dataverse in addition to what is mentioned above, also has fully-fledged support for quality control, e.g., in the form of curation workflows. Simultaneously, our Dataverse repositories, backed by institutions, guarantee long-term preservation.

Key Achievements

  • The Dataverse UI is being redesigned as a single-page application (SPA) in which internal and external Dataverse services will be delivered solely through improved APIs. This change will improve UI responsiveness and empower the Dataverse community to develop their own UIs and applications using extended Dataverse API endpoints. Read more about our roadmap for the Dataverse infrastructure redesign in: Restructuring the Dataverse UI as a Single-Page Application: https://docs.google.com/document/d/19pbENuYyHErEmblbFGQ47_uJpTfqVKbn9O0QftVqeeU/edit#heading=h.9b7lzr4a7odc
  • Generalist Repository Ecosystem Initiative (GREI). Multi-year grant starting in 2022 to develop collaborative approaches for data management and sharing through inclusion and enrichment of generalist repositories in the NIH data ecosystem, including the Harvard Dataverse Repository. The announcement has details of areas being explored: https://datascience.nih.gov/data-ecosystem/generalist-repository-ecosystem-initiative
  • CAFE GRANT:The BUSPH-HSPH Climate Change and Health Research Coordinating Center (CAFÉ) is a three-year cooperative agreement between the Boston University School of Public Health, the Harvard T.H. Chan School of Public Health, and the National Institutes of Health. CAFÉ aims to Convene, Accelerate, Foster, and Expand the climate and health community of practice, both in the US and globally. This collection serves the climate and health COP as a repository for datasets of any kind that enable broad, interdisciplinary research in the area of climate and health: https://github.com/Climate-CAFE/
  • Technical Attributes

    Maintenance Status

    Actively Maintained

    Open Code Repository

    Implemented

    Technical Documentation

    Implemented

    Open API

    Implemented

    Open Data Statement

    Implemented

    Open Product Roadmap

    Implemented

    Technical Attribute Statements

    Technology Readiness Level

    • Actual system proven in operational environment

    Code Licenses Used

    • Apache License, Version 2.0

    Content Licensing

    By default, all datasets added in a Dataverse repository are granted the CC0 Public Domain Dedication. The Dataverse software uses the CC0 waiver by default for all datasets (4.0 and on) because of its name recognition in the scientific community, making it a familiar option for data (for which in general copyright does not apply), and is in use by repositories as well as scientific journals that require the deposit of open data. For more information on the CC0 waiver, please visit the Creative Commons website (https://creativecommons.org/share-your-work/public-domain/cc0). Data depositors can opt-out of using the CC0 waiver for their datasets, if needed.

    Standards

    Hosting Options

    • Through third party vendor only

    Service Providers

    Integrations

    RServe
    Binder
    Whole Tale
    OSF
    RSpace
    GitHub
    GitLab
    Renku
    OJS
    Archivematica
    RedCap
    iRODS
    Duracloud
    Dropbox

    Community Engagement

    Code of Conduct

    Implemented

    Community Engagement

    Implemented

    Contribution Guidelines or Fora

    Implemented

    Community Statements

    User Contribution Pathways

    • Contribute to code
    • Contribute to documentation
    • Contribute to education or training
    • Contribute to working groups or interest groups

    More About Community Engagement

    Community Engagement Activities:

    Dataverse engages extensively with its user community through a variety of media.

  • Monthly Community Calls are held on Zoom to discuss upcoming releases, development contributions from the community, and other topics relevant to our community: https://dataverse.org/community-calls.
  • Dataverse Community and Developer Mailing Lists provide open fora for community members.
  • Annual Community Meetings provide an opportunity to engage around themes such as sustainability of services and infrastructures and Indigenous Data Sovereignty: https://dataverse.org/events.
  • The Global Dataverse Community Consortium (GDCC) is dedicated to providing international organization to existing Dataverse community efforts and will provide a collaborative venue for institutions to leverage economies of scale in support of Dataverse repositories around the world: https://dataversecommunity.global/.
  • DataverseTV highlights fantastic video content from the Dataverse community: https://dataverse.org/dataversetv.
  • Policies & Governance

    Governance Summary

    Dataverse is developed at Harvard's Institute for Quantitative Social Science (IQSS), along with many collaborators and contributors worldwide. Harvard has two governing boards, the Board of Overseers and the Harvard Corporation. There is also a Global Dataverse Community Consortium.

    Policies

    Commitment to Equity & Inclusion

    Implemented

    Privacy Policy

    Implemented

    Web Accessibility Statement

    In Progress

    Governance Structure & Processes

    Implemented

    Policy Statements

    Board Structure

    • None

    Community Governance

    • Ad hoc

    Additional Information

    Organizational History

    The Dataverse Project is being developed at Harvard's Institute for Quantitative Social Science (IQSS), along with many collaborators and contributors worldwide. The Dataverse Project was built on our experience with our earlier Virtual Data Center (VDC) project, which spanned 1997-2006 as a collaboration between the Harvard-MIT Data Center (now part of IQSS) and the Harvard University Library. Precursors to the VDC date to 1987, comprising such entities as pre-web software to automatically transfer cataloging information by FTP to other sites across campus automatically at designated times, and before that to a stand-alone software guide to local data.

    Organizational Structure

    Business or Ownership Model

    Fiscal sponsorship (academic institution)

    Current Affiliations

    • Funded by Harvard with additional support from the Alfred P. Sloan Foundation, National Science Foundation, National Institutes of Health, Helmsley Charitable Trust, IQSS's Henry A. Murray Research Archive, and many others.

    Funding