← Back to main

About Wikidata Walkabout

Welcome to Wikidata Walkabout, a site that provides a drill-down interface for exploring the data of Wikidata.

This site is maintained by Yaron Koren (yaron57@gmail.com), and runs on the open-source software Anvesha, which was created by Yaron Koren and Sahaj Khandelwal (though mostly Sahaj). The site, and software, were both first released in September 2020.

Wikidata is a fantastic resource - history may show that it was the most important website created in the 2010s. However, it is still difficult to navigate, and to query. "Navigating" Wikidata means simply clicking from one item to the next - there is no aggregation on the site, and no way to see an overview of anything. Querying Wikidata, meanwhile, requires going to the Wikidata Query Service page and typing in SPARQL. SPARQL, while a powerful query language, is not easy to learn; nor is it that easy to keep looking up the Q and P values of the entities that are needed for any specific query. We believe that this kind of direct SPARQL querying will always remain in the domain of a relatively few specialists, and not something available to the masses. That is where we hope Wikidata Walkabout can come in useful.

Wikidata Walkabout makes heavy use of two properties in Wikidata: "instance of", which defines which items belong to which classes; and "properties for this type", which defines the set of filters that are made available for each class.

There are various uses for this site:

There may be one more benefit of Wikidata Walkabout, though the software was not written with this purpose in mind: to flatten Wikidata's class structure. There is currently what appears to be a major overusage of classes, i.e. "instance of" values, on Wikidata, to store information that should instead be handled through other properties, and perhaps this site can help play a role in changing that.

To take one example, there are about 9,000 items that are an instance of the "city" class. Obviously there are many more cities than this in the world: items for other cities are unfortunately spread out among many other subclasses and sub-subclasses of "city", including "city of the United States" (10,000 items), "urban municipality of Germany" (2,000 items) and "big city" (3,000 items). In some cases, like for Germany, there may be a valid reason to use a class other than "city". For most cases, though, "city" should be used, with additional properties used to cover the specifics of the city's country, population, etc. Or perhaps even better yet, a more general class like "human settlement" should be used for all of them. (The "human settlement" class already has over 600,000 items, although most of these are for places with less than 300 people.)

This is hardly unique to items about cities: for almost every class, there are superclasses and subclasses that can theoretically hold some or all of that same data, and often thousands of items do end up in these alternate classes. (Interestingly, the one big exception is the "human" class; nearly all of the millions of people with an entry in Wikidata belong to this class. Clearly there has been some supervision to get the data on humans into this pristine state.) The overall profusion of classes is a problem because it makes querying difficult, both for aggregating applications like Wikidata Walkabout, and generally for any SPARQL querying. It also makes editing more difficult than it should be, because editors can never be sure which classes to use. Hopefully, Wikidata Walkabout can play a part in reducing the number of classes and making it simpler to choose an "instance of" value.

Some current limitations of the software, and thus of the site:

We hope you enjoy using this site.

Yaron Koren
Sahaj Khandelwal