on “Formal Complex Systems”

Defining formally Complex Systems? Is it possible? Is it like defining sustainability? Is the attempt to formally define CS just helping create the buzzword that in the end everyone will hate?
Recent posts have all been about the burgeoning field of Complex Systems (CS). Despite the great work that many people have done so far, it still feels like a young field. It holds great promise for solving many of the world’s open questions, but no one really knows its potential

Information-Theoretic Measures for Objective Evaluation of Classifications

This work presents a systematic study of objective evaluations of abstaining classifications using Information-Theoretic Measures (ITMs). First, we define objective measures for which they do not depend on any free parameter. This definition provides technical simplicity for examining “objectivity” or “subjectivity” directly to classification evaluations. Second, we propose twenty four normalized ITMs, derived from either mutual information, divergence, or cross-entropy, for investigation. Contrary to conventional performance measures that apply empirical formulas based on users’ intuitions or preferences, the ITMs are theoretically more sound for realizing objective evaluations of classifications. We apply them to distinguish “error types” and “reject types” in binary classifications without the need for input data of cost terms. Third, to better understand and select the ITMs, we suggest three desirable features for classification assessment measures, which appear more crucial and appealing from the viewpoint of classification applications. Using these features as “meta-measures”, we can reveal the advantages and limitations of ITMs from a higher level of evaluation knowledge. Numerical examples are given to corroborate our claims and compare the differences among the proposed measures. The best measure is selected in terms of the meta-measures, and its specific properties regarding error types and reject types are analytically derived.

[1107.1837] Information-Theoretic Measures for Objective Evaluation of Classifications. by Bao-Gang Hu, Ran He, XiaoTong Yuan

very interesting for future reference, if I’m back to ITMs for my data…

Shortest Path Sardine

Fig.1 – (click to enlarge) The optimal shortest path among N=1084 points depicting a Portuguese sardine as a result of one of our latest Swarm-Intelligence based algorithms. The problem of finding the shortest path among N different points in space is NP-hard, known as the Travelling Salesmen Problem (TSP), being one of the major and hardest benchmarks in Combinatorial Optimization (link) and Artificial Intelligence. (D. Rodrigues, V. Ramos, 2011)

Almost summer time in Portugal, great weather as usual, and the perfect moment to eat sardines along with friends in open air esplanades; in fact, a lot of grilled sardines. We usually eat grilled sardines with a tomato-onion salad along with barbecued cherry peppers in salt and olive oil. That’s tasty, believe me. But not tasty enough however for me and one of my colleagues, Vitorino Ramos (blog link/twitter link). We decided to take this experience a little further on, creating the first shortest path sardine.

Fig. 2 – (click to enlarge) Our 1084 initial points depicting a TSP Portuguese sardine. Could you already envision a minimal tour between all these points?

As usual in Travelling Salesmen problems (TSP) we start it with a set of points, in our case 1084 points or cities (fig. 2). Given a list of cities and their pairwise distances, the task is now to find the shortest possible tour that visits each city exactly once. The problem was first formulated as a mathematical problem in 1930 and is one of the most intensively studied problems in optimization. It is used as a benchmark for many optimization methods. TSP has several applications even in its purest formulation, such as planning, logistics, and the manufacture of microchips. Slightly modified, it appears as a sub-problem in many areas, such as DNA sequencing. In these applications, the concept city represents, for example, customers, soldering points, or DNA fragments, and the concept distance represents travelling times or cost, or a similarity measure between DNA fragments. In many applications, additional constraints such as limited resources or time windows make the problem considerably harder. (link)

Fig. 3 – (click to enlarge) A well done and quite grilled shortest path sardine, where the optimal contour path (in blue: first fig. above) with 1084 points was filled in black colour. Nice T-shirt!

Even for toy-problems like the present 1084 TSP sardine, the amount of possible paths are incredible huge. And only one of those possible paths is the optimal (minimal) one. Consider for example a TSP with N=4 cities, A, B, C, and D. Starting in city A, the number of possible paths is 6: that is 1) A to B, B to C, C to D, and D to A, 2) A-B, B-D, D-C, C-A, 3) A-C, C-B, B-D and D-A, 4) A-C, C-D, D-B, and B-A, 5) A-D, D-C, C-B, and B-A, and finally 6) A-D, D-B, B-C, and C-A. I.e. there are (N1)! [i.e., N1 factorial] possible paths. For N=3 cities, 2×1=2 possible paths, for N=4 cities, 3x2x1=6 possible paths, for N=5 cities, 4x3x2x1=24 possible paths, … for N=20 cities, 121.645.100.408.832.000 possible paths, and so on.

The most direct solution would be to try all permutations (ordered combinations) and see which one is cheapest (using computational brute force search). The running time for this approach however, lies within a polynomial factor of O(n!), the factorial of the number of cities, so this solution becomes impractical even for only 20 cities. One of the earliest and oldest applications of dynamic programming is the Held–Karp algorithm which only solves the problem in time O(n22n).

In our present case (N=1084) we have had to deal with 1083 factorial possible paths, leading to the astronomical number of 1.19×102818 possible solutions. That’s roughly 1 followed by 2818 zeroes! – better now to check this Wikipedia entry on very large numbers. Our new Swarm-Intelligent based algorithm, running on a normal PC was however, able to formulate a minimal solution (fig.1) within just several minutes. We will soon post more about our novel self-organized stigmergic-based algorithmic approach, but meanwhile, if you enjoyed these drawings, do not hesitate in asking us for a grilled cherry pepper as well. We will be pleased to deliver you one by email.

Fig. 4 – (click to enlarge) Zoom at the end sardine tail optimal contour path (in blue: first fig. above) filled in black, from a total set of 1084 initial points.

(this is a joint twin post with Vitorino Ramos)

World Top Coffee Drinkers

Who are the world’s top coffee drinkers?

Coffee Drinkers

World's Top 30 Coffee Import - Are We Coffee Drinkers?

I’ve been playing a bit with R and experimenting with different data sets. The above plot is of the Top 30 coffee import countries ordered by their population. I fond a bit surprising some facts:

  • The US is the biggest coffee importer in the world, but if we divide the coffee imports by the population they are low in this ranking, even below Portugal.
  • Belgium! Why is belgium first? They have a similar population to Portugal, but the amount of coffee they import is almost 10 times bigger.

This is obviously just an exercise on data manipulation but some curiosities arise. Wikipedia has a page on Coffee consumption per capita that reorders these countries in a different way: Belgium comes 8th and the top spot is for Finland (Fins drink 4x the coffee we drink here in Portugal!), so don’t take this plot to seriously.

Data sources: Coffee Imports by country – Google Fusion Tables World Pupulation – Wikipedia

David Hales: why computer people need to think about political economy and other issues

Today at ISCTE-IUL we’ll have a seminar by David Hales. David Hales is a researcher at the Open University in the United Kingdom. His research is at the overlap between computer science and social science. He has a background is Computer Science and Artificial Intelligence but he has spent a lot of time with Sociologists, Philosophers and even lapsed Economists doing simulations.

He has been doing a lot of interesting research and his talk today is about “The socio-economics of distributed systems (why computer people need to think about political economy and other issues)“.

The seminar is Friday, 3rd June 2011, 18h00 at ISCTE-IUL, building 2, room C402

Later this year, he will also be the Keynote speaker of the PhD in Progress Workshop that I’m organising in Vienna during ECCS’11, so if you can’t see him talk today, you can always joins us in the European Conference on Complex Systems and have a good time!

Some Graph/Network libraries for Python

Either for Social Networks Analysis, for Multi-agent simulation or Text mining, networks are everywhere and everyone seems to be producing their own library. Here’s a quick collection of some popular libraries that integrate well with Python (among other things).

NetworkX (1.4) – http://networkx.lanl.gov/ Python Full python and slow for large number of nodes.

igraph (0.5.4) – http://igraph.sourceforge.net/index.html python extension module, R package, Ruby gem or as C library Finding it very useful to use with R <- integrates perfectly with other thing I do in R.

boost – http://www.boost.org/ C++