Video wird geladen...
Video konnte nicht geladen werden
Explore Wikipedia through a data map. Pages are grouped by semantic similarity, for topic clusters. Hover to see details, zoom to explore fine-grained topics, click to go to a page. Search page names to find interesting starting points for exploration.🧵
55,703 Aufrufe • vor 1 Jahr •via X (Twitter)
10 Kommentare

All of this is really just a tech-demo for the tools backing it: Toponymy for creating topics and topic labels, and DataMapPlot for creating the interactive visualizations.

It does provide a novel way to explore Wikipedia though. You can see the scope of all of English language Wikipedia at once. There are surprising clusters (Every Polish village; Japanese railway stations; etc.), dense topics, and surprising connections to be found.

But most importantly you can build this yourself using open source tools. A notebook with full end-to-end code is here: You can use the same tools and techniques to build a map for your own data.

Thank you to all the people who contributed to DataMapPlot and Toponymy! Toponymy is still very much in development, so please check it out, and if you have ideas or features to add consider contributing. Documentation for Toponymy is coming soon.

Special thanks to @JayAlammar and @Nils_Reimers from @cohere for providing embedding vectors for all of Wikipedia.

For even more wikipedia vectors @nomic_ai just released vectorization and a data map for all of Wikipedia in all languages!

Scan any documents, convert images into text, PDF files, etc. 👍

I'm browsing this and having the biggest nerd joy possible, thank you for this

It's missing filtering capabilities, e.g. by geography, and the search capabilities are limited: I did a search for "doughnut economics" and it didn't render any results, whereas a search in Wikipedia does:

The search is just exact string match on page names. This is less powerful, but has the advantage of being easy to run entirely client-side for responsiveness. Filtering is possible in datamapplot ( but I don't have the relevant data to enable it.

