[PageOneX] [dev] Scraping front pages from more sources than Kiosko.net

pablo rey pablo at basurama.org
Tue Jun 25 00:53:40 EDT 2013


Sometimes Kiosko.net is not enough :)

*Print newspapers*
For example in this thread about the Turkish protests
http://pageonex.com/matrushka/gezi-parki-protests-in-turkish-newspapers some
images are missing. They are available in
http://www.gazeteoku.com/gazete-mansetleri/sozcu-gazetesi.html but there
are also new sites that collect front pages, like Cover
Times<http://www.covertimes.com/>
.

We could  build different scrapers for all these new databases of
images.Besides,if oneimageis misssing in onedatabasewhen clicking in the
"try to download" button, it could eventually look for it in other data
bases.


*Online newspapers*
PastPages has just released its
API<http://blog.pastpages.org/post/53734104165/say-hello-to-the-pastpages-api>.
They are storing screenshots of major news-sites every hour! In a wuick
sketch I coudl see how it would look like in a PageOneX style. Testing
online front pages in #pageonex: BBC coverage on
#occupygezi<http://blog.pageonex.com/2013/06/19/testing-online-front-pages-bbc-on-occupygezi/>
 I took 2 images per day.

Thoughts?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/pageonexdev/attachments/20130625/8bf93b1a/attachment.htm


More information about the Pageonexdev mailing list