[PageOneX] [dev] Further work in scraper script for Kiosko web

Rafael Porres Molina rporres at gmail.com
Fri Mar 15 11:00:59 EDT 2013


Hi Pablo,

I've pushed a new script version and kiosko.csv. Now it is sorted
alphabetically.

Regards,

Rafa


2013/3/15 pablo rey <pablo at basurama.org>

> Thanks Rafa, and welcome to the list.
>
> We have now an expanded list with 690 newspapers (before they were 'just'
> 384). Almost doubled!
>
> It's also important to mention that we also have a new column in the
> kiosko.csv<https://github.com/numeroteca/pageonex/blob/master/public/kiosko.csv> file
> with the url of the online newspapers. As a first step we want to use it
> for linking to the newspapers web site while coding front pages. Apart from
> linking to kiosko.net it is nice to cite the source of the images
> properly. This is what kiosko.net is doing, and might be a 'solution' to
> avoid problems with data property.
>
> After updating and populating the data base with this new list of
> newspapers, we'll have to take in account that there are more newspapers
> and in different order (different media_id?) when merging previous data
> bases or threads (the ones in heroku).
>
> best,
> p
>
>
>
>
> On Fri, Mar 15, 2013 at 10:43 AM, Rafael Porres Molina <rporres at gmail.com>wrote:
>
>> Hi devs,
>>
>> First thing is to introduce myself: I'm a friend of Pablo's, Perl hacker
>> and sysadmin. A while ago he told me about that pageonex needed a list of
>> all the newspapers in Kiosko (kiosko.csv), and I found a way of doing it. I
>> don't know very much of Ruby so I offered to write it in Perl. Since the
>> list is not meant to be dynamic, we concluded that language was not a
>> problem.
>>
>> I've updated the script to get the newspaper urls and to fetch more types
>> of newspapers. Before it just listed the general newspapers. Now I've
>> included everything that I found Kiosko can offer taking care of avoiding
>> duplicates.
>>
>> If you have any doubt about how the script works, or you find any bug,
>> please let me know ;-)
>>
>> Regards,
>>
>> Rafa
>>
>> _______________________________________________
>> Pageonexdev mailing list
>> Pageonexdev at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/pageonexdev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/pageonexdev/attachments/20130315/9ecb2441/attachment.htm


More information about the Pageonexdev mailing list