Where can I find data?

Where can I find data?

April 8, 2021

A starting point

It may feel intimidating at first when working with data. It is generally much easier to learn core concepts if you already have an understanding of how a dataset was constructed or are at least somewhat interested in what it has to say.

Before we get there, however, let’s brainstorm some ways in which people get data in the first place.

Surveys

Surveys remain a primary way for organizations to quantify the world. Although use cases vary (e.g., understanding why people like to travel or determining if current customers are happy), approaches are similar.

Agree on what you want to know, come up with questions that help reach that goal, put into a survey questionnaire, and attempt to get people to complete it. Often times surveys are preceded by one-on-one interviews or focus groups that help uncover important themes that can then be validated more widely through a formal survey.

There are many survey services available with varying levels of support, functionality, and pricing. Survey Monkey is a good starting point and has a free tier. Qualtrics is great when you require more flexibility and expect more respondents to complete it. Although Google Forms doesn’t look as pretty, it is also an effective and free tool to distribute basic online questionnaires.


Customers, members, or users

Customers, members, or users are another source of potential insights. Administrative records include information collected through normal business operations and can reveal data such as who the customer is, what they look like, and what their purchase or engagement history has been.

Beyond formal interactions, custom behavior is now easier to track than ever if you have a digital property like a website, app, or social media channel. The hope is to use such information to improve the existing user experience or identify new product opportunities.

Of course such tracking, and your response to what is uncovered, raises a set of ethical and legal questions that we’ll save for another day.

Google Analytics is a very common (and free) tool to monitor your site’s performance and your visitor’s behaviors over time. You simply add a javascript tag to the top of your webpages, sit back, and watch the real-time metrics start coming in.


Secondary data sources

There is also a wide world of secondary data sources that exists beyond your organization’s servers.

Publicly available information is often released by government or international organizations. Topic such as employment data from the U.S. Bureau of Labor Statistics or economic trends from IMF World Economic Outlook are good examples.

Some companies create paid data services in which they sell curated insights such as those available from the SHRM Compensation Data Center, which is powered by salary.com.

Web scraping relies on using software such as Python or R to programmatically connect to specific web pages, find content of interest, and then store it in useful formats. Some sites have terms of use or other safeguards in place to prohibit such data gathering.

As an alternative, more companies are developing public or gated APIs (Application Programming Interfaces) that enable others to access their data assets in a controlled way.

Finally, there are a growing number of tools that aggregate diverse datasets from which you can experiment and learn. These include Kaggle and Google Dataset Search.


Next steps

Regardless of where you find your data, the next steps will be understanding what you are looking at and documenting what you’ve found so that others can also use the data resource with confidence.

This article is part of an online textbook of data fundamentals to help individuals and teams build confidence with data.

Subscribe for Updates

search
Or create a free DataKwery.com account

Related Courses

Related Learning Paths

Coursera
University of Michigan