Wednesday, June 01, 2011

Backup Grooveshark Playlist #2

I just realised that if you copy a URL from the browser address bar on Grooveshark there is a # in there and if you rip this full URL the HTML does not contain the playlist.

To get around this I have written a second script that will clean the URL so that it will be identical to a Grooveshark URL link (ie one that someone sends to you or posts online).

New script;

# Pull the name of the playlist from the URL

playlistname=`echo $1 | sed "s/\// /g" | awk '{print "Grooveshark_"$5}'`

# Clean the URL and grab the HTML from the clean URL

cleanURL=`echo $1 | sed "s/#\///"`

wget -O $playlistname.html ${cleanURL}

# Now parse the HTML and extract the songs, then strip the HTML tags

awk '/Songs on Playlist/, $NF ~ /noscript/' $playlistname.html | sed -e :a -e 's/<[^>]*>//g;/ $playlistname

The script is executed the same as before

No comments: