I\'d like to scrape a website to programmatically collect any external links within any flash elements on the page. I\'d also like to collect any other text, if possible, bu
Yanking "external links" out of a flash can be as simple as, for instance:
curl -s http://hostname/path/to/file.swf | strings | grep http
Of course, this'll fail if the author has taken any attempt to hide the URL.
YMMV a lot. Good luck!
Decompiling the Flash source would let you see the ActionScript part of the Flash file, which I've found to often contain info like links.
A free decompiler is Flare. It's command line only, and works fine. It won't decode some of the info in newer Flash formats (>CS3 I think). It dumps all the AS into one file.
Sothink SWF Decompiler is a more sophisticated commercial program. It will work fine with any Flash file I've tried and the results are quite thorough and well organized. it's GUI based and I don't know if it is easily automated.
With Flare, since it's a command line tool, one could easily write a script to obtain the SWF, decompile it, grep for 'http://', and log the results.
As a very crude first step you could use Google to get a text snippet out of the swf, given that the swf has been indexed by Google and that you know it's URL. e.g:
http://www.google.com/search?q=site%3Awww.michaelgraves.com%2Fmga.swf