Technology – Page 20

Watermark in LaTeX / pdftex in 4 lines

To watermark a document in a LaTeX document using pdftex add the following lines to your LaTeX document preamble:

\usepackage{draftwatermark}
\SetWatermarkLightness{ 0.9 }
\SetWatermarkText{\today}
\SetWatermarkScale{ 3 }

First we invoque the LaTeX package draftwatermark. Then, as a watermark is something that appears transparent when viewed, the code defines the lightness of the watermark. Next, the text of the watermark is defined — in the example above I’m using the date. You can choose CONFIDENTIAL or SECRET or any watermark text you might need for your projects. And finally, the scale of the watermark on the document is set. Play with this value to test the coverage of the watermark in the document.

This is a basic example for watermarking a document. If you need more options the draftwatermark package has many and you should read the documentation for all watermarking options and features.

3 more in the Best Python Performance list

Just added Pyrex, Cython, Shedskin to the Best Python Performance software compilation thanks to Victor. These three are a bit different as they try to join C/C++ and Python worlds through the compilation of extensions or compilation of your scripts instead of trying to re-write python itself as PyPy. Might be a good solution to some projects.

Wikileaks Cablegates

The Cablegate is getting way out of any reasonable logic. It is becoming a soap opera. Before starting to throw yourself into one camp, please read the following links and think…

Defend WikiLeaks or lose free speech Dan Gillmor points that the attacks on Wiileaks (that hasn’t been charged yet) are an attack on journalism itself.
The lawless Wild West attacks WikiLeaks The list of those acting against Wikileaks grows by the minute. All big companies are trying to fall under government grace.
Don’t shoot messenger for revealing uncomfortable truths Julian Assange wrote to the Australian just before being arrested in London today
The Internet’s Voltaire Moment Simon Philipps talks about the implications of the topological changes the network is going through and why it is important that Wikileaks or The Pirate Bay exist, even if you don’t really agree with them.
Live with the WikiLeakable world or shut down the net. It’s your choice John Naugthon from the guardian takes on the hypocrisy of the politicians and really asks them: What do you want to do next?
NewScientist:Info pirates seek an alternative internet How the structure of DNS with it’s 13 root servers is being challenged by P2P. We need more P2P in our lifes if we want freedom.
Why WikiLeaks Matters More (And Less) than You Think Wikileaks and GDP? Yes… more correlated that you might think at first. Wikileaks is a new way to see and act in the this world.

(more…)

Google Under European Fire.

European Union is something of a mess. In one hand plays the role of the protector of monopolies like with the approval of legislation that will unplug european citizens from the web in case of piracy (there goes internet access as a fundamental right down the drain). On the other hand wants to protect the citizens from the monopolies.

Curiously, this apparent contradiction, isn’t really a contradiction. The monopolies aren’t equal. In the first case Europe wants to protect those industries that are whining about loosing money, and that, they say, will be in risk of bankruptcy. On the other case the monopolies are from companies that don’t complain, that innovate constantly and that have enough money in their bank accounts to save Ireland (or Portugal) several times in this economic crisis.

So, if your doing well, making money and you don’t whine… the EU will investigate you, accuse you, and ask you for a bribe… (ups, fine you). If you’re company wants to keep doing business as it did 50 years ago, then the EU will ask its citizens to pay up whatever these old farts want.

Get your act together EU!

For each action there’s a reaction…

In Physics this is true, and probably is also in many things of life…

Australia wanted to force a ban of infected computers from the networks, but suddenly in an moment of lucidity the government saw that this could backfire. The problem with infected computers is that the persistence of the infections has nothing to do with the removal of single nodes but with the topology of the network as an whole. I fear that this type of measure is misguided by another type of intentions. If a system that removes users computers from network is in place and users are accustomed to it, wont copyright agencies be the next ones to ask for this? (Well, they already passed the 3 strikes law). Or what will happen when someone’s computer is publishing political views different from those of the government? Or, could for example a rugby team forbid computers from the opponent team to be online? You see where this could be heading.

The internet grew in a self-organised way. (Self-organised not meaning randomly). This type of measures are constrains that affect the network. For now the negative feedbacks that these constrains impose are still minor compared to the positive feedback loops that the network has to expand and grow, but one day they will be to much and might hurt the network in a way that the giant component might break into smaller parts. Then all that these governments will have are a bunch of sticks that don’t really make a tree anymore.

The ways of the world are in some ways incomprehensible to politicians (not all, but the majority). Trying to rule on matters that are out of their control will end on failure of the rules or catastrophe of the system. Let’s hope that they stick to what they are best at (whatever that is).

Boilerplate: Article extraction from webpages

The amount of clutter text present at different webpages makes the task of discovering what is important a pain. At the observatorium I’ve been using a simple Tag to Text ratio to try to extract the important sections of text from webpages. The results are good, but not great, the method is fast and it works if one has in consideration that noise exists and can’t be totally eliminated.

The other day I found another technique that I think might become my de facto standard technique for text extraction from webpages as its first results are better than what I expected. The algorithm is able to detect the meaningful sections of pages with high accuracy and also has the benefit of being truly fast.

This is derived from the paper “Boilerplate Detection using Shallow Text Features” by Christian Kohlschüster et al. that was presented at WSDM 2010. and there’s a google code repository available with the Java source and binaries to download.

ZFS and Novell or not Novell?

Today I’m following with some interest two linux related stories:

The first one is that the ZFS performance in Linux is not that great. Linux users (and Mac users for what is worth it) have been touted about the super benefits of Suns’ ZFS file system for ages. Well… tests don’t show much. Personally I’m sticking with ext4.
The other story that hit the news today is that Novell is being sold for some gazillion dollars. Hm… We are witnessing a lot of cash movement on companies that have a big role in open source this year. What’s next? Canonical? Red Hat? The truth is that this buy has a fishy side to all of this and that is that part of Novell assets is being bought by a consortium put together by Microsoft. Taking into account that Novell and Microsoft were bestfriends for some time now…

Is RSS really Dead?

Almost every modern website uses RSS feeds to deliver content to users. But, in the time of social networks is RSS still really the way to push information to readers?

RSS is a big technology as it allows people to subscribe to content and then read it latter in their readers, maybe offline or in another terminal. The problem is that this technology is being replaced with faster mediums. Social networks are to blame in part for this, for example twitter even made everything fit into 140 characters. To write a blog post with more than 300 words has become the exception rather than the rule, and the majority of sites try to deliver information fast and in a continuous stream of small consumable snippets.

It’s fast food time in the interwebs and no one seems to care.

In this context, RSS that once allowed you to get the information you wanted in a longer and probably well organised application, is now being declared dead. RSS is not natural for 140 character long messages (although twitter as RSS feeds for their users). It was invented for longer conversations and readings. It even managed to be used to send large media files to users, as in Podcasts (are these also dead?).

But, although RSS is great, it seems that it is fading into the background. It’s being used more and more as data exchange mechanism. A way for your website to send information to Google or Bing rather that to your users. Google even suggests adding your RSS feed as the sitemap file in their website. During some time microformats were what every body was talking about, but as very few standards were defined, microformats never really got into mainstream websites, and for all practical uses RSS is/was the workhorse of data sharing.

Although invisible to most users, RSS still is around (even if I removed the subscribe RSS button from the top right you can still access it).

The news that Bloglines (one of the first online RSS aggregators where you could read news from your favorite sites) was closing, made a stir around the webs as people suddenly realised how dependent they are of this technology. Luckily, Ask.com agreed with MerchantCircle to keep Bloglines active as a recognition of its importance even if the glory days of Bloglines are now past (at least while google reader is dominating the market and the next big thing doesn’t show up).

In the RSS world there was a time of feast that passed. That time was the time of multiple reader applications and strong development. When it was found that there was no business model for RSS and Google Reader became dominant, RSS faded into it’s secondary passive role. It’s probably one of the most used pieces of technologies around and one that people rely on transparently. This will make keep it alive.

I bet it will survive these strange times of junk food and fast driving.

[this text has 496 words]