July 30, 2008

Removing Flash-based content from feeds with Yahoo Pipes

Yahoo Pipes

I have not written about Yahoo Pipes in detail on this blog yet, but I thought this was interesting so I am posting about it. To quickly summarize, Yahoo Pipes allows you to produce feeds or widgets from web-accessible sources in several formats. I find it very convenient to aggregate and filter feeds.


Recently, I wanted to remove some Flash-based content from a feed that I follow since I am unable to see it in my reader anyway. I was trying to use the Regex module to match the tags for the content and replace them with a note stating that they were removed. Surprisingly, many of my regular expressions I was entering were not working and the Flash-based content remained in the feed. After testing quite a few expressions I finally found one that worked. Here it is:

[<][^<]*application/x-shockwave-flash[^>]*[>][<]/[a-z]*[>]

I still do not understand what was wrong with the other expressions I tried. Most of them were simpler and more specific than this one, and they should have matched. Anyway, this works for now.

UPDATE: Wired recently published an introduction to Pipes article

July 23, 2008

European Trip

Recently, I had the opportunity to travel around Europe. I took many photographs during the trip and here is a small sample of these photographs.









July 1, 2008

What is in Portage (Charts and Maps)

While considering new partitioning schemes and filesystem choices for future installs and reinstalls, I realized that I had almost no idea what kind of files were on my system. I needed to see histograms of file sizes in particular directories in order to make better decisions for partitioning, filesystem selection and filesystem tuning.

I started playing with the du command to try and get some data and after a while I came up with the following:

find . -type f -execdir du --apparent-size --block-size=512 '{}' \; | grep -o ^[0-9]* | sort -n | uniq -c

This series of commands produced the data I was looking for and I graphed it using OpenOffice Calc. Here is a summary of what I got for the directories of Portage. As expected, there were "lots of small files."

First, the Portage tree (no /distfiles here):

You can clearly see the portion of files less than 2k compared to the rest. There are a lot of them, but they do not take up the majority of the space.

Then the Portage cache:

Lots of 512b+ to 1k files here, represented by the yellow region.

Finally, the Portage installed packages database:

Lots of sub 512b files, but they do not take up a lot of the space.

In retrospect, I needed to use something better than OpenOffice to graph the data and I will update this post if I get to it. Since the histograms were quite skewed as you can see from the results above, here are some maps generated by FSview to give you a different view of the directories:

From left to right: the Portage tree, the Portage cache and the Portage installed packages database