awk length of each line sorted

This is just an awk one-liner for helping me find long lines, usually in latex documents, that makes merging different versions of my paper easier. It just identifies the long lines and you can then easily break them into smaller ones. No more 3000 character lines.

awk '{print length($0), ++nn}' file.txt | sort -n

It is all emotion, but we are all alone in the end.

Sentiment Analysis, sentiment analysis, sentiment analysis. Argh… How do SVM, Naive Bayes, Logistic Regression, Multilayer Perceptrons, Best-first Trees, Functional trees, and C4.5 compare? Psomakelis et al. did a comparative study of twitter. Sentiment analysis is probably one of the most dangerous approaches to understanding crowd behaviour because it is dictating what normality is. A normality that comes from big data processing. This means that for practical purposes—read commercial—many normal behaviours will become outliers… and you might become targeted for “corrective emotional measures”—meaning that you’ll buy that new godWatch they are trying to sell. Well, fascinating dystopia this one that we are building. The science part of it is cool.

What is strategy, again? I just say… let the algorithms kick in and you’ll need to make this question again, and again, and again, and you’ll need an algorithm to answer it because human brains will not be able to cope with the answer very soon.

Lily, the first throw and shoot camera. Now, selfies suck. But everybody is taking selfies. Then came the selfie stick. That everybody thinks is ridicule but that you see in every corner of any touristic location. And now??? the Selfie Drone? Crap… THE WORLD IS BECOMING A LONELY PLACE. GET A FRIEND PEOPLE.

— And speaking of loneliness, Sudan needs a partner. This is very sad story but I imagine that human (selfie) stupidity will put the northern white rhino in the same shelf as the Dodo.

Are you Brainless today? Go read these.

— An indicator that we are seeing a shift in AI and that deep learning has become a big thing in research and industry — and my bet is that deep learning will have a bigger impact on society than the other buzzword starting with Big that also has DATA in it — is that the number of patents that are brain related as risen dramatically. Yes, Your AI brain in a silicon neural network stowed away neatly in your pocket is just around the corner.

— Clustering. Clustering and Classification are probably the two main tasks you want machine learning to do. So, it is always good to see how others are doing theirs. In this paper Hasnat et al. do a comparative study of clustering methods in multinomal distributions.

100 synapses and you get yourself a mini brain, capable of doing image classification. But wait… it only classifies 3 letters… Hm. I think the implementation still needs to be improved a little before we have skynet…

— If you study Twitter from a scientific perspective and are interested in how twitter reflects politics, then head out o this paper by Murthy and Retto where they compare the print coverage and tweets in the US. Republican Primaries. It focuses on sentiment analysis — something that is becoming more prevalent in science these days. The study of emotion moves money, and scientists love money.

— And finally, the gods of the Mountains must be angry. So, for a minute think about how ephemeral everything is when put in the right perspective.

In defence of a Public Peer Review System

Scientists are used to having papers rejected. It comes with the territory, but usually the reviewers had some good reason — and give it to you — to reject the manuscript in the first place. Now this . . .

Many times the reviewer knows what he’s doing and the comments actually help improve the paper, but sometimes the reviews are just plain incomprehensible and look like they were written by a monkey punching a keyboard while watching themselves on the mirror.

I’ve seen reviewers showing a total lack of understanding about the subjects in hand, being partial, being biased, or worse, not even being able to write a sentence in English. Some reviewers I’ve read, wrote in some language similar to English that until today none of the authors or the editors have been able to understand. A mystery for the science historians of the future. Worse, there are reviewers that protected by anonymity are bluntly rude. Real douchebags. This was another of these cases.

It came from a reviewer saying that the authors should consider finding some male scientist to co-author a paper that was authored only by female scientists. Worse, he went on rambling about the marginal superiority of male scientists. You can get the details of this particular case at RetractionWatch. The tweet above is just an illustration of this reviewer medieval thinking.

A matter of Accountability and Public Peer Review

This leads me to think that we urgently need to end this lack of accountability in the review process. We need to quickly implement Public Peer Review. The idea is that first drafts go to an online repository, like arXiv for example. Reviewers will do their work and publish their reviews as comments in the repository. No anonymity, no protection of douches hiding behind the journal editors. They can still send private emails to the editors, but then what they write publicly has to match what they write privately. Reviews should be public. Authors can improve on their paper and submit a new version. At this point, everyone knows what is going on. Good reviewers will be praised by their scientific acuity and honesty. No one will keep saying things behind a blind knowing they will not become exposed to ridicule. It’s time to expunge those douches by exposing them publicly.

I imagine that some authors, editors and reviewers won’t like this open process. In any case, I imagine that it might be possible to keep the process private until sometime before publication. After publication, this online repository should become available so anyone could trace back the process. I prefer it to stay public always, but I imagine that some authors would prefer to have it closed mainly when collaborating with industry and NDAs were signed.

In any case, I think that the gatekeeper of this should be the author of the paper, not the editor or the reviewer. When submitting a new article the author should be able to keep it private for some time — say 1 month — after the last reviewer submitted his review. This is to give time to the authors to respond to the reviewers comments with a corrected version. After that month passes, the paper and comments should be public. Obviously the authors can go public with the process anytime before that date.

Finally, with all this changes, I also think that all papers should have a byline with the reviewers in the final print. What is the problem of having “This paper was reviewed by X and Y, reviewers comments are available online at HTTP…” in a footnote of the first page? Credit is due to where credit is due. Give reviewers credit for their good work and let the scientific community judge them when they are not good. Also, many other scientists, namely junior scientists starting their careers could learn from good reviewers. They could improve their own reviews by seeing how good scientist were doing theirs.

Being public about the process, is the first step to show the quality of your work. I don’t know why so many are afraid of having their work be publicly scrutinised. In a time where versioning is so simple — git is 10 years old now — where tracing back changes and accountability for actions is so important, I don’t understand how we are still in the feudal ages of protecting reviewers identities. Can it be because most reviewing work IS NOT PAID by journals and they just don’t care to do a quality work? If that is so, why keep this BROKEN SYSTEM?

We need PUBLIC PEER REVIEW, NOW!

Can you park your car like this?

One of the problems with cars is that they take space. A lot of space, mainly when they are not being used. A research project is developing robots that could make the task of parking cars a totally different experience.

The authors claim that “a swarm of robots is able to extract vehicles from confined spaces with delicate handling, swiftly and in any direction. The novel lifting robots are capable of omnidirectional movement, thus they can under-ride the desired vehicle and dock to its wheels for a synchronized lifting and extraction. The overall developed system applies reasoning about available trajectory paths, wheel identification, local and undercarriage obstacle detection, in order to fully automate the process.”

This is one amazing idea, and while right now they are probably targeting the security forces that need to deal with badly parked cars and potential threats to security, I imagine that these would be great in managing parking silos. One could just drop the car at the entrance and a group of robots would take the car and park the car in the silo as to optimise the space available. No more having to put with those drivers that need the space of three cars to park a CORSA.

Well, in any case the European team of AVERT is going to present this work on a conference, but I see a lot of potential in their idea to make this into a commercial success. The preprint of the paper is available for download.

In a not so far away future robots will replace valets and optimise parking space in silos. Great!

Your Grandfather’s Oldsmobile—NOT! self-driving cars around the corner

Your Grandfather’s Oldsmobile—NOT! – BLOG@UBIQUITY.

The self-driving car is coming to our streets. Might not be as soon as some predicted, but it will come in incremental steps. Technology has this feature: You dream about radical changes and then they appear slowly, one step at the time, and when you look back you realise that reality is bigger than your original dream. So let’s keep dreaming about self-driving cars, planes, bicycles or balloons. Someone will implement all the good things to get there and beyond.

Thinking in Minimum Spanning Trees

Minimum Spanning Trees are subgraphs of a graph that are TREES and connect all vertices together such the total cost of connection is minimal. Because of that, they are interesting objects.

Minimum Spanning Tree

Minimum spanning trees have many application, from routing in movement application to traversing of databases. My favorite is to define use in game AIs, by computing cost of travel for characters according to the kind of terrain they have to travel through.

A Background on Spanning Trees.

  • First, TREES, What are they? For any two given nodes have only one path between them and all nodes have one path between them. In 1857, Cayley coined the term TREE and it stuck though many graph TREES don’t have any resemblance to real trees.
  • Trees have no cycles, thus, no transitivity, no triangles, no clustering coefficient. Trees are efficient when the cost of establishing connections is high (think for example subway lines that are costly to build).
  • Minimum Spanning Trees are the optimisation of the COST of trees. If the graph is unweighted the cost is unitary and all edges are equal — and one can have multiple equivalent Minimum Spanning Trees. If the graph if weighted then you’ll have a unique Minimum Spanning Tree (usually).

MST Algorithm

An ALGORITHM to compute minimum spanning trees for a weighted undirected graph is the Prim’s Algorithm — although it was proposed 27 years earlier by a Czech mathematician called Vojtěch Jarník and the algorithm should be named after him (sometimes it is).

Here’s an example how the algorithm runs:

Spanning trees are very important because they are used in pathfinding algorithms like in the Dijkstra’s algorithm or in the A* Algorithm.

Spanning Tree in R and igraph

In R, the igraph package implements the Prim’s algorithm when computing minimum spanning trees.

A simple example of the use of the minimum spanning tree in R follows:

library(igraph)
g <- erdos.renyi.game(20, 0.25)
E(g)$weight  <- runif(length(E(g)), 0., 1.)
E(g)$width  <- 3.0*E(g)$weight
 
mst <- minimum.spanning.tree(g)
E(mst)$color <- 'red'
 
layout  <- layout.kamada.kawai(mst)
 
plot(g, layout=layout)
par(new=T)
E(mst)$width  <-  3
plot(mst, layout=layout)
par(new=F)

The image that illustrates this post on minimum spanning trees is one run of the short code above. I generated a random graph with 20 nodes and a probability of connecting any two of 25%. Then I gave the edges random weights and computed the weighted Minimum Spanning Tree. The trick of how to plot a minimum spanning tree superimposed on the original graph is to do it in a two-step plot. First you plot the graph with a pre-computed layout. Then you use the same pre-computed layout to plot the second layer that holds the minimum spanning tree.

You need to set the parameter new=True so that the R plot doesn’t erase the previous one. You can compute the layout for the minimum spanning tree if you want edges not to overlap and obtain a clear planar graph. iGraph offers many different graph layouts to experiment from. Kamada Kawai usually works very well for minimum spanning trees.