New to Google Search: Dataset Schema

Google recently announced the release of its Data Schema integration, which makes it easier for users to view data visually in search engine results pages (SERPs). With the new schema.org integration, users will now be able to see published datasets from pages in the SERPs in the form of charts, graphs, and tables in the search results, rather than having to navigate to the page itself to see the data.

In a statement, Google explained a bit about the purpose behind the new schema.org integration and how it will work: “News organizations that publish data in the form of tables can add additional structured data to make the dataset parts of the page easier to identify for use in relevant Search features. … News organizations add the structured data to their existing HTML of a page, which means that news organizations can still control how their tables are presented to readers.”

According to Search Engine Journal, Google has been working with 30 of the world’s top data journalists to roll out the feature, including ProPublica, which expressed excitement about the new feature’s potential usefulness for data visualization.

The data schema feature is still in a Pilot testing phase, but in its feature announcement, Google provided a visual of what the schema integration will look like when it’s completely rolled out:

Image via The Keyword

From this visual, we can see that instead of just showing relevant search results for a data-related query, users will be able to see the data itself in the SERP, without having to actually navigate to the source of the data.

Why searchable datasets are important to Google

 

The internet is full of databases of all kinds, from local and national governmental databases to scientific and anthropological databases. Easy and free access to this data is a key part of an open internet, which Google has loudly advocated for. With the release of Google’s Dataset Search feature, anyone from data journalists, to scientists, to your great Aunt Sally can have access to all of the data they need for their professional work, or for late-night games of trivia.

Dataset Search is similar to Google Scholar in that it compiles all of the data from across the web, regardless of where it’s hosted, as long as the host page has formatted the data according to the dataset guidelines, which makes it possible for Google and other search engines to crawl the data and catalog it in the repository.

“The purpose of this markup is to improve discovery of datasets from fields such as life sciences, social sciences, machine learning, civic and government data, and more,” Google said in a press release about the data schema roll-out.

Webmasters and marketers will note that this new feature is very similar to the recently announced featured snippets tool, which shows highlighted organic search results to users in SERPs in an effort to answers users’ queries more quickly and efficiently. Thus far, snippets haven’t been based on structured data formats, but it will not be surprising to see snippets move in the direction as this new data schema feature improves and Google makes rich search results more accessible for all types of queries.

How to get your visual data into SERPs

You can tell Google’s crawlers that you have visual data that might be useful in its results pages by practicing good SEO habits. When you publish data, include supporting information like names, descriptions, creators, distribution formats, and sources as structured data. Google finds visual data to include via schema.org and by crawling the metadata you include on your pages.

Google wants to make it easy for anyone who publishes data on the web to format their information according to schema.org standards. The more data collectors and publishers who adhere to these standards, the wider range of datasets will be available in rich search results in the future.

A few years ago, Google announced the roll-out of visual data for science datasets in search results. Now, with this new update, Google will support visual data for all types of data, including environmental and social sciences, as well as government and news organizational data. As more types of data repositories restructure their data according to these Schema standards, the variety of types of searchable data will grow and continue to increase in usefulness for all users.

A few examples of things that may count as a dataset in a Google search include the following:

  • Organized groups of tables
  • CSV files with data
  • Tables with data
  • Images with captured data
  • Collections of files that, grouped together, would qualify as a significant dataset
  • A proprietary-format file containing data
  • Structured objects with data that could be processed using a special tool
  • Machine learning files
  • Any other dataset that could be visually represented

As mentioned above, the feature is still in pilot, so you might not see rich results in SERPs for your datasets just yet. However, Google does recommend formatting your datasets as structured data so when the feature is rolled out to all users, your site will be ready to appear in search results. Dataset Search currently supports multiple languages and more languages are in the works to be rolled out soon. If your country or language isn’t currently supported, keep checking back. As the tool’s capabilities increase, so will the worldwide access to datasets.

Google guidelines for datasets

To prepare your datasets to appear in rich results, Google recommends following its structured data guidelines. It’s important to note that Google does not guarantee that your data will show up in SERPs, but adhering to these guidelines and best practices will improve your chances of being featured.

There are three supported data formats you can use to be eligible to appear in rich results:

  • JSON-LD (which is recommended by Google)
  • RDFa
  • Microdata

Ensure that you do not use robots.txt, noindex, or any other blocking method to prevent Googlebot from crawling your structured data. In addition, you should follow Google’s general webmaster quality guidelines to be sure that your data is improving the user experience. These guidelines include things like ensuring that your content is original and up-to-date, and not irrelevant or misleading, in addition to not promoting or endorsing illegal or harmful activity.

 

You can test your pages for structured data compliance with the Structured Data Testing Tool, which will alert to you to errors so you have the best chance of appearing in search results. Violating these guidelines may make your data ineligible for inclusion in rich results or negatively affect your search rankings. If you try to manipulate rich results with spam, you will be penalized by Google, and your pages could possibly be removed.

Sitemap best practices

In addition to the structured data guidelines, Google encourages data publishers to adhere to sitemap and source and provenance best practices detailed on Google’s developer blog. You can make it easier for Googlebots to find your data by using a sitemap file and sameAs markup to indicate how datasets appear on your site. If you have canonical (also known as “landing”) pages for each dataset, Google recommends that you add structured data for each dataset that appear on these canonical pages. You can use the same markup to link back to your canonical pages if you add structured data to several different copies of the same dataset in different areas of your site. Google doesn’t require that you markup every single mention of the same dataset, but if you do, it’s best practice to use sameAs.

You can view Google’s full dataset guidelines here.

Stressed about structured data? We can help.

As this new dataset schema feature improves and rolls out to all Google users, it will be vital for all sites that publish data to ensure that their datasets are correctly structured according to Google’s guidelines so they’ll have the best chance of being included in rich search results. The more dataset for which publishers are willing to provide structured data for Google to crawl, the better rich search results will be for all users. It’s a good idea to follow Google’s dataset guidelines carefully and test your site to ensure that your datasets are meeting the schema.org formatting standards. If you’ve done this and you’re having trouble seeing your published data in Dataset Search Results, check out Google’s Developer site for instructions and help.

Better access to data is an awesome idea in theory, but collecting data takes a lot of time, not to mention all the time it takes to create content to support that data and manage all of that content on your site. Who has time to worry about whether your metadata is correctly formatted to allow Googlebots to crawl your site and find your data? Fortunately, you’ve got experts in your corner. Big Leap specializes in helping small businesses improve their search engine rankings through smart SEO strategy and consistent support. If you’re interested in getting your datasets cataloged in Google’s new rich data repository but don’t know where to start, schedule your free Big Leap SEO consultation today and start doing more with your data.

Meg Monk
Meg Monk is a freelance writer and content strategist based in Salt Lake City. When she's not writing about marketing strategy, she's camping in Utah's mountains in her 1976 Airstream or planning her next international trip - 29 countries and counting! You can find more of her work at megmonk.com.
shares