Magis

Share this post

SimCity & Data Commons

magis.substack.com

SimCity & Data Commons

Alex Izydorczyk
Nov 15, 2022
3
1
Share this post

SimCity & Data Commons

magis.substack.com

Simcity was my favorite video game growing up. I began to appreciate the complexity of the game just as Simcity 4 was released, so I grew up playing that iteration. The naive 11 year-old believed that SimCity was an accurate representation of the job of a government: I literally believed the mayor of Winnipeg could look at the city’s economy, energy, climate, and infrastructure in one glance. If that were the case, even if poor governing decisions were made, it would be easy to see the results and optimize them. Something about this idea guided my future interests in statistics, economics, and operations research.  

This begs the question, what would it take for the real world to be modeled and represented by the sort of data available in Simcity? Would it be possible for businesses and governments to actually view the economy as Simcity players do? 

The problem is not an engineering one

1
. The amount of data required for such a representation is not particularly large, when compared to the amount of data processed in other fields such as ad-tech, astronomy, or weather forecasting. Recent advances in data warehouse technologies, along with the modern data stack, have created a wealth of tools that researchers and data scientists could use to process the relatively modest amount of data such an effort would entail. 

The problem is also not one of new measurements or surveying - the data for such a representation of our economies already exists. Government agencies in the United States, alone, publish hundreds of thousands of survey results per year across economics, climate, energy, and many other topics. Further, corporations themselves have tremendous amounts of data, measuring what the economy is doing in real-time

2
. There might be some challenges in finding ways to incentive structures to access corporate data but even the public data is valuable
3
.  

The missing piece is the integration of datasets and a user interface allowing for the manipulation of that data.

The integration of datasets is most important. Take a seemingly simple question: is heart disease prevalence associated with counties that have projected temperature increase in the United States?

4
This question could easily be the subject of an academic research paper and take months of data analysis to obtain. In practice, most of this work would be around finding sources for the data and finding common identifiers between datasets to be able to obtain the data. Most likely, a graduate student would be doing this work, which consists of little more than finding reading methodology reports, data dictionaries, and stitching together code to join things. 

The Data Commons project that aimed to solve this integration issue. Founded by R.V. Guha, the creator of schema.org and Google fellow

5
, the Data Commons Project integrates data from public domain sources in a knowledge graph. That knowledge graph powers what you see in Google searches when you type “Population of India” into the Google search bar
6
. This data is also now available in Snowflake as my contribution.

The user interface problem is also a significant issue. There is a shortage of technical talent that is able to work with data. Further, those with the technical skills to manipulate data often are not the same people that have the most context about the questions to be answered. In academia, this problem is often solved by having graduate students do the data analysis while professors and researchers pose questions. In industry, data scientists and data analysts are often creating dashboards while business leaders are making specific requests. Data Commons aims to solve this issue as well. I have been working on a contribution here as well in the context of Streamlit.

Thanks for reading Magis! If you try the Streamlit App, the data, or are interested in working on this, please reach out to me.

1

Or rather, there is no engineering risk. Building such a system is difficult to execute on, but there is no question whether it could be done.

2
Magis
Datanomics
Pick up a macroeconomic textbook, and you’ll read chapters of theory, mathematical equations, and models of rational behavior. Heavy on mathematical and social sciences, such books are light on one essential thing: strong evidence that the theories, equations, and models correspond to the real world…
Read more
a year ago · 9 likes · 2 comments · Alex Izydorczyk

3

Byrne Hobert made a great recent point about corporations waking up to the value of first party data. I highly recommend reading:

The Diff
Kroger / Albertsons: Buying Data in Bulk
Welcome to the free weekly edition of The Diff! This newsletter goes out to 45,960 readers, up 436 since last week. In this issue: Kroger / Albertsons: Buying Data in Bulk Meme Shorts in Crypto Unfiring Will Expensive Food Cause Expensive Food? The Outrage Cycle…
Read more
5 months ago · 19 likes · 2 comments · Byrne Hobart
4

Answer at Data Commons

5

http://www.guha.com/

6

You’ll find if you hit the “Explore More” button, it takes you to datacommons.org

1
Share this post

SimCity & Data Commons

magis.substack.com
1 Comment
Paweł Machnik
Feb 20

I admire your that you aim so grandiose endeavour! Good luck! I'm subscribing to learn more about the progress :)

Ps) Haven't heard about streamlit before, thanks!

Expand full comment
Reply
TopNewCommunity

No posts

Ready for more?

© 2023 Alex Izydorczyk
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing