Monday, 29 April 2013

Amazon Price Scraping

Running a software company means that you have to be dynamic, creative, and most of all innovative. I strive every day to create unique and interesting new ways to do business online. Many of my clients sell their products on Amazon, Google Merchant Central, Shopping.com, Pricegrabber, NextTag, and other shopping sites.

Amazon is by far the most powerful, and so I focus much of my efforts on creating software specifically for their portal. I’ve created very lightweight programs that move data from CSV, XML, and other formats to Amazon AWS using the Amazon Inventory API. I’ve also created programs that push data from Magento directly to Amazon, and do this automatically, updating every few hours like clockwork. Some of my customers sell hundreds of thousands of products on Amazon due to this technology.

Doctrine ORM and Magento

I’m a strong believer in the power of Doctrine ORM in combination with Zend Framework, and I was an early adopter of this technology in production environments. More recently, I’ve been using Doctrine to generate models for Magento and then using these models in the development of advanced information scraping systems for price matching my client’s products against Amazon’s merchants. I prefer to use Doctrine because the documentation is awesome, the object model makes sense, and it is far easier to utilize outside of the Magento core.

What is price matching?
Price matching is when you take product data from your database and change it to just slightly below the lowest pricing available on Amazon, depending upon certain rules. The challenge here is that most products from distributors don’t have an ASIN (Amazon product id) number to check against. Here are the operations of my script to collect data about Amazon products:

    Loops through all SKUs in catalog_product_entity
    For each SKU, gets a name, asin, group, new/used price, url, manufacturer from Amazon
    If name, manufacturer, and asin exist it stores the entry in an array
    It loops through all the entries for each sku and it checks for _any_ of the following:
        Does full product name match?
        Does manufacture name match?
        Does the product group match?
        (break the product name into words) Do any words match?
        If any of the following are true, it will add the entry to the database
    If successful, it enters the data into attributes inside Magento:
        scrape_amazon_name
        scrape_amazon_asin
        scrape_amazon_group
        scrape_amazon_new_price
        scrape_amazon_used_price
        scrape_amazon_manufacturer
    If the data already exists, or partial data exists it updates the data
    If the data is null or corrupt, it ignores it

Data Harvesting
As you can see from the above instructions, my system first imports all the data that’s possible. This process is called harvesting. After all the data is harvested, I utilize a feed exporter to create a CSV file specifically in the Amazon format and push it via Amazon AWS encrypted upload.

Feed Export (Price Matching to Amazon’s Lowest Possible Price)
The feed generator then adjusts the pricing according to certain rules:

    Product price is calculated against a “lowest market” percentage. This calculates the absolute lowest price the client is willing to offer
    “Amazon Lowest Price” is then checked against “Absolute Lowest Sale Price” (A.L.S.P.)
    If the “Amazon Lowest Price” is higher than the A.L.S.P, then it calculates 1 dollar lower than A.L.P. and stores that as the price in the feed for use in Amazon.
    The system updates the price in the our database and freezes the product from future imports, then it archives the original import price for reference.
    If an ASIN number exists it pushes the data to amazon using that, if not it uses MPN/ SKU or UPC

Conclusion
This type of system is wonderful because it accurately stores Amazon product data for later use, this way we can see trends in price changes. It insures that my client will always be the absolute lowest price for hundreds of thousands of products on Amazon (or Google/ Shopping.com/ PriceGrabber/ NextTag/ Bing). Whenever the system needs to update, it takes around 10 hours to harvest 100,000 products. It takes 5 minutes to export the entire data set to amazon using my feed software. This makes updating very easy and it can be accomplished in one evening. This is something that we can progressively enhance to protect against competitors throughout the market cycles, and it’s a system that is easy to upgrade in the event Magento changes it’s data model.

Upgrades
Since we utilize Doctrine, it’s all outside of Magento. So we can go ahead and upgrade Magento to a newer version any time we want. Then we just re-generate the database models and our system becomes compliant with any changes Magento made automatically. I’ll probably come back and do another article on just this topic, as it’s one I’m very interested in writing about.

Source: http://www.christopherhogan.com/2011/11/12/amazon-price-scraping/

Note:

Roze Tailer is experienced web scraping consultant and writes articles on linkedin email scraping, linkedin profile scraping, amazon data scraping, amazon data scraping, yellowpages data scraping, product information scraping and yellowpages data scraping.

No comments:

Post a Comment