At Teleport we’re all about helping our users make sense of data. We try to go the extra mile to make sure we’re aggregating multiple sources to get more reliable data and work hard on visualizations to help our users make sense of all those cities out there.
I mean, just look at this!
Despite our best efforts, every now and then we are contacted by our users about our numbers being off. That sort of feedback is extremely valuable to us as it allows us to zoom in on problematic areas and work on improving our data. But just to keep you on your toes—every now and then, someone is wrong on the internet!
Here’s how it works.
Two friends, Mark and Jeremy, are bored and wondering about which city has better internet, London or Bucharest, and how’d they compare to the rest of the world. Both try to use their current knowledge and beliefs to come up with a ranking.
What sort of data would they need to create such a ranking?
- An idea of what is the average internet speed in London
- An idea of what is the average internet speed in Bucharest
- Global minimum and maximum speeds to put their beliefs about London and Bucharest in context
In addition to average speeds, they also have an idea about how confident they think they are about these numbers.
Mark and Jeremy happen to live in London and have a 30 Mbit/s connection. Jeremy believes that most of London has internet with the same speed. He has seen a few ads promoting the same connection they have, and is quite confident that most people have upgraded to it.
He’s also quite sure that this must be close to as good as it gets anywhere in the world and is pretty sure that the most any city in the world could have is 40 Mbit/s on average. Unfortunately, he has no idea where Bucharest is. Sounds like some dodgy place which might not even have internet, and if it does, it’s probably dead slow. Jeremy’s beliefs are depicted below:
- London has internet speed of 30 Mbit/s and Jeremy is confident that’s the right number, hence the narrow distribution of possible values (dark blue)
- Global maximum is 40 Mbit/s (red) and global minimum is 0 (light blue)
- Bucharest could have anything from 0 to 5Mbit/s but definitely not more (green)
The height on the curves at particular speed values indicate how strongly Jeremy believes the value to be true. For example, the highest value for London for Jeremy is 30, but values like 28 or 32 would not be impossible either. For Bucharest, his knowledge is lacking, so any value is equally likely as the other (at least based on the prior knowledge Jeremy has).
Mark agrees with Jeremy about London having 30 Mbit/s on average. But he has his doubts and thinks it’s possible that it could be significantly more or less—his confidence in this value is lower. He has also watched the news and knows that in Asia, internet is crazy fast so the global maximum could be 50 Mbit/s. Of course he might have misheard and it could be quite a bit more or less. He’s not really sure. But he does know that Romania is in the EU and must have reasonably good internet. He’s confident it must be somewhere in between 10 and 20 Mbit/s.
All of Mark’s beliefs are summarized below. His uncertainty about London and global maximum are reflected in the values being more spread out and clearly differentiate his ideas from Jeremy’s.
Scores
Based on their beliefs, Mark and Jeremy proceed to calculate the scores for internet speed for London and Bucharest. The score in this case would be a number of the scale from zero to ten. The formula for this is not important, but involves the person’s beliefs for the particular city and global minimum and maximum values.
Jeremy obtains scores like these:
As with the prior beliefs, the scores also have a level of confidence associated with them. The most likely score for London in Jeremy’s mind would be around 7.5 out of 10. Slightly lower and higher scores would be legit as well. For Bucharest, the score is anything from 0 to 1.3.
Mark’s scores look slightly different.
His prior beliefs were quite flexible, so his scores have a wider range as well. The most reasonable score for London for Mark is 6, but anything from 4 to 8 could make sense. For Bucharest scores from 2 to 4.5 are possible.
Comparing results
Mark and Jeremy have a hard time agreeing about the scores now. They obtained quite different numbers. So they come and look at the scores on Teleport Cities, which gives Bucharest a score of 3 and London a score of 4 (just as an example—these are not necessarily the real numbers we have right now). This is what Jeremy sees when comparing with Teleport:
The numbers Teleport shows are clearly nowhere near what Jeremy considered possible values for the scores. It’s obvious that whoever came up with such different numbers compared to his expertise must be incompetent.
This is what Mark sees when comparing his scores with Teleport. His numbers don’t match Teleport’s exactly, but there is overlap in regions that Mark considers plausible, so he feels that Teleport’s scores are in agreement with his.
Conclusions:
Whatever anyone believes to be the “right” score for a place is highly subjective. It depends on:
- How good of an idea they have about the city and how sure they are about it
- How well they know the global scene—you could measure the speed of each individual connection speed in London, but without knowing what the values are globally, you can’t calculate a global score for London
Mark’s and Jeremy’s prior beliefs were not orders of magnitude apart. They even agreed on the average speed for London. But because Mark acknowledged the limitations of what he knows about this topic, he was able to accept the somewhat different scores Teleport was showing.
Jeremy, on the other hand, being very sure about his beliefs, cannot reconcile what’s in his head with what Teleport is showing. If Jeremy writes to us and says “you guys obviously don’t know what you’re doing!!!” then he might be right if he’s an expert on internet speeds in London as well as globally. Or he could just be an overconfident guy giving too much credit to his beliefs. We really have no way of telling.
This example was on a relatively easily measurable quantity. Internet speed is a number we can test, collect and aggregate. Topics like culture are dramatically more subjective. What is a numerical measure of culture? What constitutes as a good restaurant scene in a city for me personally can be totally different from what a playboy billionaire might consider adequate.
I think this a taste of one of the upcoming challenges for Teleport. User feedback should definitely get incorporated into our data, but in a way that takes into account who the user is. If people with kids are very happy with restaurants in a city, we might conclude that the restaurants there are quite child friendly. But this doesn’t mean that the restaurant score for billionaire playboy should go up for this city. It might even have to go down.
P.S.: No data scientists or Teleport users were hurt during the making of this post.