CIO.com

public data

By Cameron Hashemi-Pour

What is public data?

Public data is information that can be shared, used, reused and redistributed without restriction. It encompasses a range of formats and sizes such as data sets and statistics, as well as both processed structured data and raw unstructured data. Public data is typically kept and accessed on corporate or government websites, and also stored at businesses and other data providers.

There are many reasons to publicly share data. These include protecting the public when sharing criminal data, transparency in the case of government entities that serve a general populus, and advancing new technologies in the case of artificial intelligence (AI) and machine learning (ML).

Ideally, industries can use public data that's relevant to their needs, for purposes such as to better target customers. For example, in the tech sector, if relevant public data is easily accessible, enterprises can use it to train AI and ML models to analyze information and glean insights.

Examples of public data providers and repositories

Providers of public data sets and statistics include both government-affiliated and nongovernment sources. In the U.S., the Freedom of Information Act guarantees that various types of data can be shared publicly, including environmental information and real estate and driving records. Some providers or repositories of public data include the following:

How public data is different from open data

The terms public data and open data are often used interchangeably. However, open data is more accessible compared with public data. Only a small percentage of all public data in existence is considered open data.

Open data is typically prepared and presented in structured formats and available to anyone on government websites. For example, the World Bank's website touts its data sets as open data that's preformatted, structured and lacking restrictions. Meanwhile, public data encompasses both open data and data that's unstructured -- or public yet less accessible.

How public data is different from private data

Private data dictates that certain information or whole data sets are made available only to designated individuals. Private data often contains information about people or businesses that would be too sensitive to share openly or downright detrimental when in the wrong hands.

Private data about individuals can include medical information, financial and bank records, Social Security numbers, and other forms of government identification. For businesses, private data regarding customers or employees can only be shared with specific individuals.

In certain cases, aspects of individuals' private data can be made public as long as personally identifiable information remains private. For example, transcripts of phone calls and text messages can be available to government entities, especially if they pertain to government business. These calls can be anonymized, and their metadata can be used in public data sets if necessary.

To protect and govern the use of private data, data privacy is now a relevant topic. Laws are being implemented to ensure its effectiveness.

Learn how data anonymization best practices protect sensitive data and explore the top five U.S. open data use cases from federal data sets.

19 Jun 2023

All Rights Reserved, Copyright 2007 - 2024, TechTarget | Read our Privacy Statement