Storage Wars -Wget to scrape website

This week’s assignment is to use Wget to scrape a website.

Valentine’s Day is around the corner, and as usual, all the Valentine’s Day-related commercials has started to appear everywhere in life, I thought the best way to have an intimate moment with loved ones should be something more than going to an overpriced dinner at a overcrowded joint. Why not making something enjoyable yourself and make it personal and sincere?

Presentation is the key, even if it is just a cake. So I started to look for pictures of chocolate cakes and the first website came to mind is foodnetwork.com.

As I am fairly new to html and have not used Wget before, As I browse through the website, I noticed the amount of pictures on the site, I knew I have to narrow down to something specific so it won’t burden my server too much. So I decided to look for/ scrape /save pictures under the chocolate cheesecake recipes and comfort food sections.

The command I used is:

wget -r -A jpeg,jpg http://www.foodnetwork.com/topics/chocolate-cheesecake-recipes.html

 

A few things happened:

1) When I tried to document work progress and process by uploading some Wget log here, WordPress won’t let me:

Screen Shot 2015-02-03 at 7.36.30 AM

Even after I changed the file name, it still resisted:

Screen Shot 2015-02-03 at 7.41.24 AM

2) It took long, I mean, really long for Wget to get pictures…. hours. It lost connection to the website a few time in the process, so I have to start over again and again.

Screen Shot 2015-02-03 at 6.34.58 AM

Given the amount of space and speed allowed, finally I was able to get some data index on my server and google drive.

Screen Shot 2015-02-03 at 8.41.08 AM

 

 

Leave a Reply