I spent some hours last days browsing through Edward Tufte's nice book Visual Explanations. Although sometimes it gets on graphical issues way more complex than I normally need, it is a great material both on learning how to present results and on using data to support analytical thinking. So that I try to keep some of the lessons I just learned, I decided to keep a few notes here:

- On presenting data visually, ensure there is a scale and a reference for the reader

- It is often useful to look at data on scales one order of magnitude larger and smaller than the actual quantities

- Place data in an appropriate context for assessing cause and effect. This includes reasoning about reasonable explanatory variables and expected effects.

- Make quantitative comparisons. "The deep question of statistical analysis is compared to what?"

- Consider alternative explanations and contrary cases.

- Assess possible errors in the number reported in the graphics.

- In particular, aggregations on time and space, although sometimes necessary, can mask or distort data.

- Make all visual distinctions as subtle as possible, but still clear and effective. Think of elements in your displays as obeying degrees of contrast: if all (bg, axis, data, ...) have the same contrast, they'll all get the same attention. In particular, applying this to background elements clarifies data.

- Keep criticizing and learning from visual displays you find useful or not.

Amazon S3 had a major availability incident this Sunday and posted today a very transparent update on their blog about the causes of the problem and actions they are taking to prevent it from happening again.

From their report, it seems like a bit corruption in a control message (which ought to happen in a system of such large scale) combined with a gossiping protocol which spread (apparently) too much information across the system caused a mayhem in the server communication. When the engineers understood what was going on, they realized that the way to bring the system back to normal operation was to stop it and clear its state, what is popularly known as restarting it.

Lessons learned? Mainly, (1) if the scale is large enough, all kinds of bizarre behaviors will eventually show up and (2) having an efficient red button to bring the system to a clean state is very useful if you are running a long-lived system. (I'd add that spreading too much the state of the system is a trade-off between global knowledge and robustness, but this would lead to a lengthy discussion).

Interestingly, these are two lessons previously discussed by the operators of PlanetLab in a Usenix paper a while ago. PlanetLab has also experienced corrupted control messages (for which we typically do not do checksums) and implemented a red button which has already been used in at least one occasion, in December 2003.

Fubica pointed me to an interesting service today which I believe is an exciting exploration of business models that leverage commons-based peer production.

The service is FON, "a community of people making WiFi universal and free", accordingly to the website. The principle is that people buy a sharing-enabled router which has one secured channel (isolated from the owner's) that anyone participating in FON can use to access the Internet. This way, if you share some of your WiFi, you get access to other people's. An extra feature is that the resulting wireless network can be accessed by people not participating in FON, through a fee. The revenues of these accesses are shared between the FON company and the owner of the WiFi spot used.

What I find most exciting, however, is FON's business model. They are not the providers of the WiFi service themselves. Instead, they facilitate it by building the necessary technology (the WiFi routers) and provide an authority which eases access control, allocation of the exceeding resources (which always have a market potential) and billing.

However, I think what is new here is the type of the system and maybe the extent where this business model is being applied, and not the business model.

Looking at a bigger picture, providing enhanced governance for peer production systems is already being explored as a business model. After all, someone does profit from websites like digg and flickr. Thinking a bit more, money is spent to have Planetlab administrators running security among other aspects of the shared platform.

Considering these possibilities in a same framework opens some interesting questions: When is the service provided by an external (centralized?) entity necessary to enhance peer-production systems? How does introducing the economic valuation of the shared service affects the perception contributors have of the system and therefore their behavior? How sustainable is sharing in such conditions, as users start go game the system and try to profit from it?

Paul Ehrlich has an exciting story on the SEED magazine (which I find worthwhile to follow) detailing past and recent progresses on the field of cultural evolutionism.

It is enlightening to follow Paul's journey through the parallels which can be made on the way our culture and our genes evolve as a response to our environment. Even more interesting, however, is his discussion on the differences between these phenomenons and his experiments investigating these differences using canoe-building techniques as a case study. His final remarks do a good job in summing up it all:

We directly tested a theory of cultural evolution. Our work has helped to uncover a piece of the larger, more complex process of culture change and has shown that it is reasonable to think of that change as evolution. Natural selection can operate in cultural evolution as well as in genetic evolution. Though canoe features may not be related to the genetic attributes of people who construct and use them, nor is natural selection likely the central force in cultural evolution, a comprehensive view of cultural evolution does now seem possible. And despite the daunting complexity, I believe we will one day understand how cultures evolve, and that it will help us all to survive.


Lan-houses -- small establishments which provide paid access to the Internet and to some software (like text editors and games)--, are huge in Brazil, specially among the low-income population. The Brazilian Internet Committee estimates 30% of the Internet access in Brazil is done through lan-houses [pt] and it as been reported Rocinha [en] alone has over 100 lan-houses.

Furthermore, being attended by so many people on a regular basis, lan-houses have become community-gathering spaces. Ronaldo Lemos [pt] reports how, for example, children birthdays are commonly celebrated in lan-houses these days. Both him and Antônio Carvalho Cabral, both which participate in a project examining in detail the universe of lan-houses, have been active voices on defending these small enterprises and their potential for social inclusion.

So, after reading an article on SciDev about podcasting on poor regions and thinking a bit on what could Brazil and its lan-houses learn from that, I've come up with a project idea and thought it'd worth writing it down to register it.

The idea is simple: what if we could enable lan-houses as media-producing centers where the population could very easily record podcasts or videos with local news or art. Now sum to that a simple system (maybe based on YouTube) through which people could vote for the most interesting media available on that community. Then maybe CD-RWs or DVD-RWs could easily be used to have selections of this media circulate periodically in the community, giving the community frequent access to its own voice and maybe strengthening its cultural identity.



We move reasonably differently from albatrosses and monkeys. An impressive work on Nature this month uses a trace of a very large number (6 million) of cell phone users to model patterns in the mobility of human beings. The following is an excerpt from the abstract of it("Understanding individual human mobility patterns", by Golzález, Hidalgo and Barabási) and to the side is a link to one of their graphs just because it looks cool:

We find that, in contrast with the random trajectories predicted by the prevailing Lévy flight and random walk models7, human trajectories show a high degree of temporal and spatial regularity, each individual being characterized by a time-independent characteristic travel distance and a significant probability to return to a few highly frequented locations. After correcting for differences in travel distances and the inherent anisotropy of each trajectory, the individual travel patterns collapse into a single spatial probability distribution, indicating that, despite the diversity of their travel history, humans follow simple reproducible patterns. This inherent similarity in travel patterns could impact all phenomena driven by human mobility, from epidemic prevention to emergency response, urban planning and agent-based modelling.


Most interesting, Nature published on the same issue an editorial which, although praises this paper, discusses an interesting aspect of the modelling approach it takes:

To some extent this 'physicalization' of the social sciences is healthy for the field; it has already brought in many new ideas and perspectives. But it also needs to be regarded with some caution.

As many social scientists have pointed out, the goal of their discipline is not simply to understand how people behave in large groups, but to understand what motivates individuals to behave the way they do. The field cannot lose focus on that — even as it moves to exploit the power of these new technological tools, and the mathematical regularities they reveal. Comprehending capricious and uncertain human events at every level remains one of the most challenging questions in science.

A recently launched effort called scientists without borders is trying to ease networking between scientists in poor countries and their fellows worldwide.

This certainly has the interesting potential of peering scientists in developing countries with people in the centers of excellence in their fields around the world. Nevertheless, I think there are exciting possibilities also in easing developing-world researchers to discover each other. At least in computer science, it happens often that most research we have access to is that which is legitimated in US conferences, where US-made research prevails.

I believe increasing acknowledgment of research being done in countries which have similar issues can allow new initiatives and approaches which have a genuinely developing world perspective.

An interesting story on The Economist about the myth that countries with wider spread of new technologies (in this case, broadband) should be more productive than the rest has a few interesting facts:

Paul David, an economist at Oxford University, has shown that electric power, introduced in the 1880s, did not immediately raise productivity. Not until the late 1920s—when around half of America's industrial machinery were finally powered by electricity—did efficiency finally climb.


In 1987 Robert Solow, a Nobel Prize-winning economist, famously said: 'You can see the computer age everywhere but in the productivity statistics.' It was only in 2003 that The Economist felt comfortable boldly proclaiming: 'The 'productivity paradox' has been solved.'


Can the Japanese and Koreans (who finish at the top of OECD's charts) do something at 100Mb/s that the Americans, British and Germans (in the middle tier) can't at 20Mb/s? The idea that “bigger is better broadband” is orthodoxy, not economics. So is its corollary, the neo-Cartesian logic that goes: “Broadband ergo innovation.” But we have yet to see innovation happen at a high speed that couldn't happen at a slower one.


In short, though technology allows innovation, it does not imply it. Kind of obvious, after you realize it, but I think it allows for the interesting insight of understanding there are two ways for innovating, and one of them does not necessarily pass through evolving the infrastructure.

I've been recently investing time in using activity similarity graphs as tools to understand the structure of sharing in BitTorrent and tagging communities in two works in collaboration with Elizeu, Matei and Adriana (all three of them have done interesting work on characterizing system usage patterns using similarity graphs in the past).

I'm still toddling on this, but some of the graphs we're looking at are pretty big, what calls for graph visualization tools which are both versatile and efficient. I've been playing with two which are quite interesting:

This was made with GUESS, which is quite easy to use and has the great functionality of understanding python scripts to interact with the graph:



The problem I ran into was that for very large graphs, the fact that GUESS does its rendering in a background thread renders it very uninteractive (you don't really know whether what you asked the GUI to do is going to take long, and you don't have a button to cancel it).

I then found Cytoscape, which deals very nicely with very large graphs and even has some nice plugins for doing topology analysis. I only managed to plot this in Cytoscape:

Not long ago, I came across the fascinating TED website. I didn't have time to explore it as much as I would like yet, but two talks I had time to watch caught my attention as greatly interesting:

The first one is Yochai Benkler on sharing systems. Benkler has some of the most fascinating ideas and analysis of sharing systems, which I have been greatly interested in from the research perspective. I believe the intersection between the perspective of Benkler's work, P2P and grid computing has still a lot to be digged into.

The other nice talk I had the chance to see was Cameron Sinclair's talk on Architecture for humanity. In a nutshell, he has founded an organization which has been applying the ideas of open source development to design in general and creating lots of innovation for acting in disaster relief and humanitary efforts. A great lesson on the potential of sharing for intellectual work and some very interesting points on designing with the needs of communities in mind.

Both worth watching.

.