Google Is Now Indexing CSV Files

Google quietly updated their Google Search Central documentation to note that they are now indexing .csv files.

This opens up a new way to get crawled or if a publisher doesn’t want their .csv files crawled, it may mean updating robots.txt to exclude those files.

Comma-Separated Values (CSV)

Comma-separated values (CSV) files are text files that save data in a tabular format that can be displayed as a spreadsheet.

CSV files contain data in plain text, which means that the CSV files do not contain style elements like fonts nor does it contain images or active links.

They are useful for doing things like uploading a list of URLs for crawling to software like Screaming Frog.

But they are also useful for organizing data in a spreadsheet.

CSV File Indexing Is New

Google’s ability to index CSV files is a new functionality because a “filetype” search on Google for CSV files does not currently return CSV files.

Searches like the following currently do not return CSV files:

  • filetype:csv site:.gov
  • filetype:csv site:.edu
  • filetype:csv site:.com

Google Has Already Indirectly Used CSV Files

Something curious about the indexing of CSV files by Google is that Google’s Dataset search appearance already used CSV files but apparently only when described with structured data.

Dataset structured data documentation on Google’s old Developer documentation (viewable on Archive.org)  states that CSV files are an acceptable standard for appearing in dataset search features.

The use of tabular data as a search appearance goes back to 2018, when Google announced that they would be showing that kind of data in search when the data is accompanied with structured data.

According to the original documentation:

“Datasets are easier to find when you provide supporting information such as their name, description, creator and distribution formats are provided as structured data…

Here are some examples of what can qualify as a dataset:

  • A table or a CSV file with some data
  • An organized collection of tables
  • A file in a proprietary format that contains data
  • A collection of files that together constitute some meaningful dataset
  • A structured object with data in some other format that you might want to load into a special tool for processing
  • Images capturing data
  • Files relating to machine learning, such as trained parameters or neural network structure definitions
  • Anything that looks like a dataset to you”

Google updated the above documentation in 2022 and redirected it to the new Search Central Documentation.

The updated documentation makes it clearer that Google relies on the structured data to use CSV files in their dataset search appearance.

But will this change mean that Google will eventually crawl CSV files and use those for search appearances (in addition to tabular data notated in structured data)?

This is what the current documentation explains today:

“Datasets are easier to find when you provide supporting information such as their name, description, creator and distribution formats as structured data.

Google’s approach to dataset discovery makes use of schema.org and other metadata standards that can be added to pages that describe datasets…

Here are some examples of what can qualify as a dataset:

A table or a CSV file with some data…”

Google Indexing CSV Related to Recent Update?

The definition of a core algorithm update is when Google makes “significant” and “broad changes” to their core algorithm.

It may be a coincidence that the indexing of CSV files and the core algorithm update happened at virtually the same time.

But it may bear considering whether Google has improved their crawling engine to be able to index  CSV or if that capability was already there.

Read the updated list of a indexable file types:

File types indexable by Google

Read Google’s Search Central Dataset Documentation:

Dataset (Dataset, DataCatalog, DataDownload) structured data

Featured image by Shutterstock/Jane Kelly

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Web Times is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – webtimes.uk. The content will be deleted within 24 hours.

Leave a Comment