All 7 entries tagged Bash

View all 13 entries tagged Bash on Warwick Blogs | View entries tagged Bash at Technorati | There are no images tagged Bash on this blog

December 05, 2019

Scripting discovery of random YouTube videos

(I just discovered this post as unpublished draft from about three years ago. No idea why I didn't publish it. The method described in it still works. All the code is bash script. There's a comment in the last bit of code "needs more refining" which I leave as an exercise for those so inclined.)

Do you now, or have you ever wanted to, script discovery of random YouTube videos? I did recently and couldn't find anything useful online. So I made up my own method.

If you're thinking YouTube videos are identified by 11 character strings so you can generate a random 11 character string and use that, you're not technically wrong, but it's not the way to go about it. As a test I generated 1000 and none of them were were valid. This isn't at all surprising given how many possible values those 11 characters provide. In my observation, each character in can be an lower or uppercase letter a number, or a -. That's 63 possible characters. A calculator tells me that 63^11 is 62050608388552830000. (If you want to say that out loud, say "sixty-two quintillion", then mumble a bit.)

function getVideoID  {
   local id="";
   while [ "${id}" = "" ];do
      id=$(curl -s$( < /dev/urandom tr -dc A-Za-z-0-9 | head -c4) | grep -o 'watch?v=[a-zA-Z0-9]\{11\}' | sort | uniq | sort -R | head -1);
   echo "${id/watch?v=/}";

That gets you a valid id, such as dQw4w9WgXcQ. If you discover videos entirely at random some of what you find will be NSFW. Really. It will be. The method I use to filter out NSFW content uses youtube-dl

function getVideoUrl  {
   local url="";
   url=$(./youtube-dl --age-limit 0 --get-url "${1}");
   echo "${url}";


videoUrl=$(getVideoUrl "${videoID}");

If ${videoUrl} is not zero length then, in my experience at least, the video is SFW and it's value is an url of the raw video which could be used as input value for ffmpeg or whatever. (To emphasis, it is *my experience* that this method filters out NSFW content.) If you just want to download the whole video, youtube-dl can do that for you. (youtube-dl will find the highest quality version of the video by default. You may want to change that depending on your available bandwidth or what you intend to do with the video.)

Some videos on YouTube have a video component that is just a static image. E.g. someone's ripped an album and then combined the audio with the album cover art to create something that can be uploaded to YouTube. Such videos are visually uninteresting and maybe you want to identify those videos and discard them rather than use them in whatever it is you're doing that involves random YouTube videos. I did, so I worked out a way of doing that too. The method I've used is to generate a bunch of images from the video, then compare them in a way which gets a value that represents how much the images differ by. If that value is less than a certain value, discard it. I've used GraphicsMagick for comparing the images. ImageMagick can be used to but is slower. (The less powerful your hardware, the bigger the speed difference is. ImageMagick output is slightly different to GraphicsMagick so you can't just remove the "gm", the awk and cut arguments would need changing.) To extract the images you obviously first have to download the video and in the below the downloaded video is theVideo.mp4

# generate an image at 2 second intervals
ffmpeg -loglevel fatal -i theVideo.mp4 -vf fps=1/2 -y foo__%02d.jpg

if [ $? -eq 0 ];then

  # get an integer value that represents how different the images all are to each other
  v=$(gm compare -metric MAE foo__*.jpg null:-  | grep Total | awk '{print $2}' | cut -d . -f 2);

  if [ ! -z "${v}" -a "${v:0:2}" != "00" -a "${v:0:2}" != "01" ];then **** needs more refining 018 019 OK 010 not OK maybe test 3rd char too
    # the video isn't a static image
    # do whatever it is you want to do with it


I arrived at discarding videos where the first two characters of v are 00 after calculating v for a bunch of videos.

October 05, 2014

Matlab Log File 'Art'

Sometimes I like to create an image from a dataset, just because. (Previously Also a few weeks ago I was looking at Matlab log files a lot ( And thus, this

26th July 2014. Solarized colours.

(Click to embiggen.)

It's generated from Matlab license check outs for a single day. The image consists of 60 concentric circles split in to 24 segments. Each circle represents one minute, the innermost circle being 0 and the outermost 59 minutes past the hour. Each segment represents one hour. Midnight is where 12 would be on a clock face, noon is where 6 would be. A coloured segment indicates that a license was checked out during that hour. The distance from the centre represents the minutes past the hour when the license was checked out. Each segment is drawn with an opacity of 25%. The brighter the segment the more licenses checked out, though the brightness tops out at four licenses. (Finer graduations would mean that segments representing a single license would be really faint.) For example, the image below shows from inner to outer:

  • 1 license checked out at N minutes past midnight then being checked in sometime between 01:00 and 02:00
  • 2 licenses being checked out at N+1 minutes past 01:00 with one checked in again during the same hour and no check in time being found for the other license.
  • 4 licenses being checked out N+2 minutes past 02:00 then checked back in again sometime in the same hour (not necessarily at the same time).

Matlab log file art example image.

There is an element of doubt around tracking when a given license is checked in again. The Matlab license server log does not allocate any sort of identifier to a license check out so it's impossible to definitively identify when it was checked in again. I have taken the check in time to be the first time that a check in by the user@host combination occurs after they checked out a license. A user checking out multiple licenses from a single host could make that assumption incorrect.

The colours are the accent colours of the solarized palette

Here's an image from the same data using colours used by Pirelli to denote the different compounds of their Formula 1 tyres.

26th July 2014. Pirelli Formula 1 tyre compounds colours.

This is with the colours of the RAF roundel. (Things which are round…)

26th July 2014. RAF roundel colours.

I was going to try doing one with University colours. Then I discovered the Corporate Identity part of the University website, which used to provide details of a colour palette for use in things University related, currently only provides details for a single shade of blue.

The images are generated using a bash script and ImageMagick. The script draws up to 5000 segments at a time. Initially it drew one at a time but it's a lot quicker drawing multiple segments at the same time. 5000 seemed like a nice number that didn't trip that error you get when bash command arguments are too long. It's nowhere near 5000 times quicker to draw segments 5000 at a time. This is due, to some degree I don't care enough to work out even roughly, to the temporary images being stored as mpc ( on tmpfs, thus minimising I/O overheard. (I (mis)use /dev/shm for this sort of thing since it's already there and usually has enough space.) Images are generated at 10000x10000 then shrunk. This is done to remove small unwanted artefacts which sometimes show up between adjacent segments in the same circle. Like this

Matlab log file art artefact example.

As that example shows, they don't appear consistently and I'm not sure why they do. I can't make them not occur without leaving gaps. If the images are generated at 1000x1000 the artefacts show up. If the images are generated at 10000x10000 the artefacts show up, but conveniently this detail is lost when the image is shrunk to 1000x1000.

Other examples which I find less aesthetically pleasing than the one linked here can be seen at

May 19, 2010

Getting a random word in a bash script

Earlier I got to wondering if there was an easy way to get a random word in a bash script. My first thought was

$ sort -R /usr/share/dict/words | head -1

That works but takes over ten seconds. The first Google hit I got was this page which suggets:

#Number of lines in $WORDFILE
tL=`awk 'NF!=0 {++c} END {print c}' $WORDFILE`
for i in `seq $NUMWORDS`
sed -n "$rnum p" $WORDFILE

Which works, but only returns words that start with a b or c. The problem is that bash's $RANDOM is a number between 0 and 32767 and there's many more words than that in /usr/share/dict/words.

I didn't find any better suggestions so after a bit of fiddling I came up with this.

# seed random from pid
# using cat means wc outputs only a number, not number followed by filename
lines=$(cat $WORDFILE | wc -l);
sed -n "$rnum p" $WORDFILE;

The maximum value of multiplying two values of $RANDOM is greater than the number of lines in /usr/share/dict/words thus the problem of only getting back words starting with a b or c is eliminated.

Some time later, whilst looking at something else I came accross reference to the shuf command. Turns out that does much the same as 'sort -R' only much faster (around 0.2 second). So if you want a simple and fast way to get a random word in a bash script all you need is:

$ shuf -n1  /usr/share/dict/words

Edit: Assuming you're using a *nix machine which has shuf installed. shuf is part of GNU core utilities so it should be included in any randomly chosen Linux install. It's not included in Mac OS X (at least not Leopard which is what I have to hand) though or Solaris 10. Mac users could get shuf by installing the coreutils package that's available via MacPorts.

March 01, 2010

Staring at the

Writing about web page

SOHO image wallpaper script grabs one or all of the latest SOHO images once an hour and sets it/them as your GNOME wallpaper. I was going to make it work for Mac OS X as well but by the time I'd found out how to set the relevant properties from a script I couldn't be bothered. (Mac OS X doesn't provide a way to set the background colour or the placement  (introduced in Snow Leopard) from a script. You have to alter the plist file and then restart the Dock as described here.)

Example screenshots:

$ ./soho_image_wallpaper

SOHO image wallpaper (all)

$ ./soho_image_wallpaper blue zoom

SOHO image wallpaper (blue zoom)

December 03, 2009

University Advent Calender as your (though maybe not your) wallpaper.

Writing about web page

For no good reason other than because the idea got stuck in my head, a script (advent wallpaper script) which grabs an image of the University advent calender, does some stuff to it, sets it as your wallpaper and then adds a launcher to the panel that when clicked shows what's 'behind' today's door with the icon for the corresponding door as it's icon. Assuming you use GNOME for your desktop environment that is. Which statistically speaking, you probably don't.

Example screengrab (to click is to make larger):

Uni Advent Calender as wallpaper

Requires ImageMagick, gnome-web-photo, curl and Epiphany (the web browser not the holiday). Call from a cron-job or login script of something to automatically update wallpaper should you feel the desire to do so.

November 16, 2009

Generating spec compliant thumbnails

A lot of Linux software that needs to generate thumbnails of an image uses the thumbnail specification. This is good as it means if one application has already made a thumbnail of an image then another application can make use of it rather than generating it's own. I recently found myself looking for a way to generate such thumbnails.

The impetus for this was gnome-appearance-properties. This is the application which allows you to do stuff like change your wallpaper. Sadly as the number of wallpapers available increases it's usability decreases to some extent. The reason for this is that the user interface is frozen until all the wallpaper thumbnails are displayed. This in itself isn't too bad, unless you have thousands of wallpapers, but if it's combined with a lack of pre-generated thumbnails the user interface is frozen until all the thumbnails have been generated. This is annoying because if there are few hundred wallpapers available thumbnail generation can take over thirty seconds even on a decent spec machine. Thirty seconds during which the user is left looking at an unresponsive interface. There is long standing bug report regarding this, that I can't currently locate, which makes the very sensible suggestion that the thumbnails should be loaded asynchronously. Hopefully at some point someone will implement that, but in the mean time I found myself wondering whether it was possible to script the generation of thumbnails in advance.

My first thought was ImageMagick and a bash script because I'm already familiar with those. As it turns out ImageMagick comes very, very close to being able to generate such thumbnails using the -thumbnail option. I say close, because whilst it inserts both the MTime and URI information required by the spec, it generates the URI incorrectly by inserting one too many slashes at the start. It creates

$ convert /usr/share/pixmaps/backgrounds/cosmos/earthrise.jpg -thumbnail 128x foo.png
$ identify -verbose foo.png | grep Thumb::URI
Thumb::URI: file:////usr/share/pixmaps/backgrounds/cosmos/earthrise.jpg

when it needs to be

 Thumb::URI: file:///usr/share/pixmaps/backgrounds/cosmos/earthrise.jpg

At time of writing this is actually fixed, but only in the svn version. If you have 6.5.7-8 or later then you should find it generates the URI properly. If you have an older version you can create the thumbnails like this:

# makethumb - script to generate thumbnails to spec
# *** Assumes GNU coreutils. ***
tagfile=/tmp/$(basename $0)_tags
mkdir -p $saveto
thumbname=$(echo -n file://$file | md5sum| cut -d " " -f 1);
mtime=$(date +%s -r "$file")
echo "Thumb::URI={file://${file}}" >$tagfile
echo "Thumb::MTime={${mtime}}" >>$tagfile
convert -resize 128x -strip +profile "*" $file MIFF:- | cat $tagfile - | convert MIFF:- "PNG:${saveto}/${thumbname}.png"
rm -f $tagfile
$ makethumb /usr/share/pixmaps/backgrounds/cosmos/earthrise.jpg

Generating thumbnails this way is quite slow though. Using

$ find  /usr/share/pixmaps/backgrounds/ -type f -exec ~/makethumb {} \;

332 thumbnails took around 1:15 in my tests, though obviously this will vary depending on the spec of the machine. I tried using a variant of the script which generated thumbnails for all the files in a given directory, so only invoking the script once instead of multiple times. There was no significant difference in speed between the two methods though.

So I started looking for some way to use GNOME's thumbnail generation capabilities. The only example I could find of doing this used Python GTK bindings and was incomplete. I've only ever cobbled together one python script before, (that was also to use GTK bindings), but I managed to put together this

# - script to generate thumbnails using GTK bindings

import gnome.ui
import gnomevfs
import time
import sys
import os


thumbFactory = gnome.ui.ThumbnailFactory(gnome.ui.THUMBNAIL_SIZE_NORMAL)
if thumbFactory.can_thumbnail(uri ,mime, 0):
thumbnail=thumbFactory.generate_thumbnail(uri, mime)
if thumbnail != None:
thumbFactory.save_thumbnail(thumbnail, uri, mtime)

Using that to generate thumbnails in the same manner shown above for the bash script was about ten seconds faster. However after some experimentation I put together this (updated 15/8/11 to include suggestions from comment 2):

# - generates thumbnails for all files in a directory

import gnome.ui
import gnomevfs
import time
import os


thumbFactory = gnome.ui.ThumbnailFactory(gnome.ui.THUMBNAIL_SIZE_NORMAL)

for subdir, dirs, files in os.walk(dir):
for file in files:
path = os.path.join(subdir, file)
uri = gnomevfs.get_uri_from_local_path(path)
mtime = int(os.path.getmtime(path))
print uri
print mtime
if thumbFactory.can_thumbnail(uri ,mime, 0):
thumbnail=thumbFactory.generate_thumbnail(uri, mime)
if thumbnail is not None:
thumbFactory.save_thumbnail(thumbnail, uri, mtime)

I found that generates 332 thumbnails in around 9 seconds. A massive difference to repeatedly invoking the script. I expect there are people who could provide a detailed explanation of why it's so much faster. I am not one of them.

It's also interesting to note that I've found this script generates thumbnails around three times faster than the gnome-appearance-properties creates them. Why that is I have no idea. The thumbs that result are not identical. The thumbnails generated by the Python script have the width and height of the original image embedded in them whilst the ones generated by gnome-appearance-properties do not. The ones generated by gnome-appearance-properties have a very lightly larger file size and the Channel Statistics embedded in the thumbnails are different too. However both sets of thumbnails say they were generated by GNOME::ThumbnailFactory.

Interesting as all this is, (to me anyway if it's not to you then why did you read this far?), it's all about generating thumbnails on a per-user basis. What if it was possible to have a system wide cache of per-generated thumbnails. E.g. you install a bunch of wallpapers and along with them you can install thumbnails that will be used rather than each user generating their own. The thumbnail spec does cover this. So I tried creating such thumbnails. gnome-appearance-properties ignored them. When I say ignored them, I don't mean it looked at them and didn't use them, the output of strace indicates that it doesn't even look to see they exist. Which is a shame.

November 06, 2008

The joy of shell

Writing about web page

There's a discussion going on Slashdot entitled "(Useful) Stupid Unix Tricks?".  (Clicking the 'Get more comments'  button at the bottom of the page a few times makes more of the discussion visible.) I love discussions like that because I nearly always pick up something useful.

A useful trick that occured to me this morning was to combine && with a kdialog yes/no prompt:

kdialog --yesno "are you sure?" && do_stuff

Of course stuff which is really useful on one Unix-like OS is sometimes useless on another. I work on Linux, but sometimes need to do something the multi user Solaris boxes such as mimosa or primrose etc and occcasionally get tripped up by differences. Though I've found changing my shell on Solaris to bash instead of the default tcsh and adding /usr/local/gnu/bin/ to the start of my $PATH helps!

Search this blog


RSS2.0 Atom
Not signed in
Sign in

Powered by BlogBuilder