Monday 29 April 2013

Web scraping Amazon and Rotten Tomatoes

[Rajesh] put web scraping to good use in order to gather the information important to him. He’s published two posts about it. One scrapes Amazon daily to see if the books he wants to read have reached a certain price threshold. The other scrapes Rotten Tomatoes in order to display the audience score next to the critics score for the top renting movies.

Web scraping uses scripts to gather information programmatically from HTML rather than using an API to access data. We recently featured a conceptual tutorial on the topic, and even came across a hack that scraped all of our own posts. [Rajesh's] technique is pretty much the same.

He’s using Python scripts with the Beautiful Soup module to parse the DOM tree for the information he’s after. In the case of the Amazon script he sets a target price for a specific book he’s after and will get an email automatically when it gets there. With Rotten Tomatoes he sometimes likes to see the audience score when considering a movie, but you can’t get it on the list at the website; you have to click through to each movie. His script keeps a database so that it doesn’t continually scrape the same information. The collected numbers are displayed alongside the critics scores as seen above.

Source: http://hackaday.com/2013/01/23/web-scraping-amazon-and-rotten-tomatoes/

Note:

Roze Tailer is experienced web scraping consultant and writes articles on linkedin email scraping, linkedin profile scraping, amazon data scraping, amazon data scraping, yellowpages data scraping, product information scraping and yellowpages data scraping.

No comments:

Post a Comment