These days I'm doing some data munching and I've been introduced by a kind colleague to a couple of simple things which made me think how it is possible that I ever lived without them: screen and xargs. If you do use linux and are not aware of these: know that your life can be better.
(Hoping this is not obvious for everyone. And if you have similarly miraculous tools, please let me know in the comments.)
I am usually amazed by research that goes far within some topic and at the same time is intuitively communicated. Even more if this touches on a daily subject we're all familiar with. Even more if it is about some topic I'm really interested on (that is, a topic I've been discussing with other people around tables, glasses and drinks).
The Edge online magazine has a fascinating piece by Lera Boroditsky about How our language shapes the way we think. Based on psychological experiments, Lera talks about different senses of time and space, how thinking about the moon as a male or female makes a difference and the such. Really nice reading.
If after this you are still curious, there is less scientific piece at the Economist about different and difficult features in languages. The text is a bit of a collection of peculiarities in languages, but has interesting examples.
As part of the incredible reMap website, which magically enables you to navitage across the information visualizations in the visual complexity website, I found this puzzling visualization of Web domains and influential people in the Tokyo metro system. The job is by the Information Architects office, and does an incredible job of mixing playfulness and serious information display:
Accordingly to a paper this news article discusses on ScienceNOW, newborns are more prone to be women in warmer climates. Less scientifically, one can say the tropics are more feminine than colder places.
In 'The Surprising Power of Neighborly Advice', Gilbert and collaborators examine an intriguing hypothesis: what is the best predictor of your future reaction to an event, your forecast given the event's info or the reaction of someone close in your social network?
Interestingly, they find that the reaction of someone close in your social network is the best predictor. Not surprisingly, they also find that people often think the contrary is true.
Two direct implications come to mind. The first, discussed by the authors, is on how we consider these two predictors to make decisions. The second, not discussed in the paper, is for recommender systems. It suggests recommendation is more effective using information about other people's reaction to events than about a user's stated preferences.
I spent some hours last days browsing through Edward Tufte's nice book Visual Explanations. Although sometimes it gets on graphical issues way more complex than I normally need, it is a great material both on learning how to present results and on using data to support analytical thinking. So that I try to keep some of the lessons I just learned, I decided to keep a few notes here:
- On presenting data visually, ensure there is a scale and a reference for the reader
- It is often useful to look at data on scales one order of magnitude larger and smaller than the actual quantities
- Place data in an appropriate context for assessing cause and effect. This includes reasoning about reasonable explanatory variables and expected effects.
- Make quantitative comparisons. "The deep question of statistical analysis is compared to what?"
- Consider alternative explanations and contrary cases.
- Assess possible errors in the number reported in the graphics.
- In particular, aggregations on time and space, although sometimes necessary, can mask or distort data.
- Make all visual distinctions as subtle as possible, but still clear and effective. Think of elements in your displays as obeying degrees of contrast: if all (bg, axis, data, ...) have the same contrast, they'll all get the same attention. In particular, applying this to background elements clarifies data.
- Keep criticizing and learning from visual displays you find useful or not.
Amazon S3 had a major availability incident this Sunday and posted today a very transparent update on their blog about the causes of the problem and actions they are taking to prevent it from happening again.
From their report, it seems like a bit corruption in a control message (which ought to happen in a system of such large scale) combined with a gossiping protocol which spread (apparently) too much information across the system caused a mayhem in the server communication. When the engineers understood what was going on, they realized that the way to bring the system back to normal operation was to stop it and clear its state, what is popularly known as restarting it.
Lessons learned? Mainly, (1) if the scale is large enough, all kinds of bizarre behaviors will eventually show up and (2) having an efficient red button to bring the system to a clean state is very useful if you are running a long-lived system. (I'd add that spreading too much the state of the system is a trade-off between global knowledge and robustness, but this would lead to a lengthy discussion).
Interestingly, these are two lessons previously discussed by the operators of PlanetLab in a Usenix paper a while ago. PlanetLab has also experienced corrupted control messages (for which we typically do not do checksums) and implemented a red button which has already been used in at least one occasion, in December 2003.