I just realised that if you copy a URL from the browser address bar on Grooveshark there is a # in there and if you rip this full URL the HTML does not contain the playlist.
To get around this I have written a second script that will clean the URL so that it will be identical to a Grooveshark URL link (ie one that someone sends to you or posts online).
New script;
# Pull the name of the playlist from the URL
playlistname=`echo $1 | sed "s/\// /g" | awk '{print "Grooveshark_"$5}'`
# Clean the URL and grab the HTML from the clean URL
cleanURL=`echo $1 | sed "s/#\///"`
wget -O $playlistname.html ${cleanURL}
# Now parse the HTML and extract the songs, then strip the HTML tags
awk '/Songs on Playlist/, $NF ~ /noscript/' $playlistname.html | sed -e :a -e 's/<[^>]*>//g;/ $playlistname
The script is executed the same as before
No comments:
Post a Comment