César A. Hidalgo's banner
César A. Hidalgo's profile picture

César A. Hidalgo

@cesifoti58,701 subscribers

Professor at Toulouse School of Economics & Director of the Center for Collective Learning. Founder Datawheel & https://t.co/2d5FR0ZOwx Latest book: The Infinite Alphabet

Videos

cesifoti's profile picture

Shorter clip.

César A. Hidalgo

16,170 次观看 • 8 天前

cesifoti's profile picture

**New DataViz Project** Curious about academic impact? Tired of rankings? Today we are introducing Rankless ( a new data exploration platform that can help you explore the academic impact of thousands of universities. All universities produce impact that is specific to certain topics and geographies, but rankings flatten that information. Rankless wants to change that. Consider a comparison between the University of Utah and the University of Vienna, two universities ranked similarly in the Shanghai ranking. These universities differ in their geographical and topical footprint. The University of Utah specializes in Neuroscience and Medicine whereas the University of Vienna specializes in Physics, Astronomy, and Environmental Sciences. Their geographic impact is also quite different. Utah receives a large fraction of citations from medical centers in the U.S., Canada, and Israel, whereas Vienna receives many citations from technical institutes in Austria, Germany, and Hungary. These differences are easy to explore in Rankless but hard to see in rankings. Rankless was developed by a talented team at the Center for Collective Learning at Corvinus University (Corvinus University of Budapest). It was brought to life by Endre Mark Borza (Endre Mark Borza), a Hungarian economist and data engineer at CCL with the help of Máté Barkóczi, a Hungarian designer form MOME, and Veronika Hamar executive director at CCL. By moving beyond rankings, Rankless offers a fresh perspective on how universities influence each geography and topic, emphasizing diverse forms of impact and providing a richer understanding of academic influence. To learn more visit

César A. Hidalgo

124,686 次观看 • 2 年前

cesifoti's profile picture

New PNAS paper. Historical GDP per capita data is scarce, but data on the places of birth, death, and occupations of famous individuals is abundant. In this paper we estimate the historical GDP per capita of hundreds of regions in Europe and North America using a machine learning model that leveraged data on about 500k famous biographies. Our estimates more-or-less quadruple the availability of historical GDP per capita estimates for the last 700 years. So why use biographies to augment historical GDP per capita data? Biographical data contains information about people who might have contributed directly to economic growth, like James Watt, or that were attracted to wealthy places looking for patrons, like Michelangelo. So we--mainly Philipp (Philipp Koch)--used this data to construct hundreds of features describing each European region. Then, we trained a machine learning model to find the features that explained most of the variance in a cross-validation test, where we split regions multiple times into a training set and a test set. On average, the model explained about 90% of the variance in GDP per capita of the regions it had not seen during training. But we wanted to go further, and Philipp really went to town by looking at different ways to validate our estimates. We found our estimates correlate positively with historical measures of wellbeing, church building activity, urbanization, and body height. We also used these measures to reproduce the basic Atlantic trade result of Acemoglu, Johnson, and Robison and to explore the economic consequences of the famous Lisbon earthquake of 1755. But what I personally loved most about this project, other than working with Philipp Koch and V, is that it shows that we can use machine learning methods not only to explore the future, but the past. There is a bright and growing future in the use of machine learning for economic history. Hope you enjoy the paper and the data. You can find links to the paper and a data exploration tool in the first comment.

César A. Hidalgo

54,324 次观看 • 1 年前

cesifoti's profile picture

*New Paper on AI & Democracy* Imagine two approaches to democracy. The one we have today, where citizens choose a professional politician to represent them and others. Or an augmented form of democracy, where each citizen controls a personalized AI that helps them participate in thousands of nuanced decisions. This second approach is the idea of Augmented Democracy I introduced six years ago at TED. In our latest paper we explore a simplified version of Augmented Democracy by combining off-the-shelf LLMs, such as ChatGPT, with data collected using a collaborative government program builder. This was an online game where people build a personalized government program using proposals extracted from the programs of the candidates of the 2022 presidential election in Brazil. So how accurate are these augmented forms of democracy? Imagine a user who gave us 40 answers. We can use the first 20 to fine-tune a model that we can test using the 20 answers the model didn’t see. We can then compare the accuracy of these predictions with the ones obtained by a “bundle” rule, which assumes that users that self-reported to be from the left or right always chose the proposals from the candidate that shares their political identity. This showed us that LLMs were more accurate at predicting policy preferences than the bundle rule, meaning that the preferences captured in the participation data were more nuanced than a left-right axis, and that the LLMs can capture some of that nuance. Also, the LLMs can choose among policies coming from the same candidate, which is something that we cannot do using a bundle rule. But can these LLMs help us complete the aggregate preferences of the population? Direct or unbundled forms of participation can result in incomplete data when people answer only a fraction of all questions. In our paper, we simulate this incompleteness by sampling the full dataset. We ask how close we can get to the full dataset by using a random sample, or a random sample augmented by predictions made by these LLMs. Overall, we find that LLM-augmented data gets much closer to the full dataset than a pure random sample. These results do not mean that augmented democracy technology is ready, but they means we are in a much better place to continue exploring this idea than six years ago. This paper was a collaborative effort with Jairo Gudino, PhD student at CCL at the University of Toulouse Capitole and Umberto Grandi from IRIT also at the University of Toulouse Capitole. We hope you find these results insightful!

César A. Hidalgo

26,912 次观看 • 1 年前

没有更多内容可加载