Download Image in PhantomJS: Illustration image

Actually you don’t really download image directly in PhantomJS because it doesn’t provide the functionality to do such thing. So, the answer for “How to?” is “No, you don’t really”.

You may have found another solution that instructs you to migrate to CasperJS (which has the quick function to download image and other files), but then there is a situation where you can’t or don’t want to type new codes for it.

I have also tried using the XMLHttpRequest way from this discussion, but that thing won’t work for cross-domain request, except if you want to open the image URL first with page.open() and then you do the request to get the file (but when I tried with the same origin request, it doesn’t work though!) (the common case here is to download image file from another web page). If you’ve been in all that situations, then this article is what you really want.

So, how?

Before I continue, have you ever took a screenshot of a web page in PhantomJS with page.render() function? Because that function is what will be used in one of below steps to download image file in PhantomJS.

Take a screenshot of the image

First, open a web page with URL of an image that you want to download.

page.open('http://url.com/of/image/you/want/to/download.jpg');

Then, inside the callback of above function, render the screenshot of it.

page.open('http://url.com/of/image/you/want/to/download.jpg', function (status) {
   if (status === 'success') {
       page.render('/path/to/your/download/location/filename.jpg');
   }
});

Your image is then saved in your desired location.

Note: In this example I took the screenshot in JPG image format with no arguments placed next to the saving location path. That will results in a quality reduction (you can try it yourself to see those little bit noisy on the image) to the original. If you want to get the best quality, take {format: 'jpeg', quality: '100'} as the second argument of that rendering function.

(FYI: the default number of quality for both image format is 75.)

But wait…

“The image has a smaller dimension to the original!”
“The image takes some black background on the right and bottom side!”
And then you are wondering, why could it be happened?

Here’s the thing. In PhantomJS, there are settings that store browser viewport size, and page clipping size for rasterizing/rendering needs.
The common cause that makes the image is downsized is the viewport size which is smaller than the original image dimension, and one that makes it extended with a black background is that your loaded image has a smaller dimension than the viewport.

Then, how to make things very right?

There will be 3 (three) steps to achieve this:

  • Set viewport size
  • Set page clipping size for rendering (+ get image offset and dimension)
  • Screenshot it!

What are we waiting for? Let’s head to each steps one by one.

1. Set browser viewport size

As you can see in Figure 1 below, if I don’t do any assignment to the viewport size, it will have the width of 300 and height of 400, and that’s the default value that PhantomJS has.

Having that default value set, if you scrape various images that have dimensions equals to or smaller than the viewport size, you will be fine. But if you scrape an image with a larger dimension than the viewport, the image will be shrunken into a smaller one that fits the viewport.

Fig 1. PhantomJS' headless browser viewport size of a.) original, and b.) modified with 9000 of width and height. The viewport size set with default value will make the image downsized.
Fig 1. PhantomJS’ headless browser viewport size of a.) original, and b.) modified with 9000 of width and height. The viewport size set with default value makes the image downsized while the extended viewport makes it has the original resolution (2245×2811 px).

To set the viewport size, modify the viewportSize property of the web page module. Take a look at these lines of codes for example.

page.viewportSize = {
    width: 1366,
    height: 768
};

You set the viewport width to width property and viewport height to height property. Put them below the initialization of the web page module.

If you are not sure what is the largest dimension among plenty of images you want to scrape, set the viewport to any large numbers, e.g. 9000 of width and 9000 of height.

2. Set page clipping size

After you set the viewport size, now set the page clipping size. It is done by modifying the clipRect property of the web page module.

page.clipRect = {
    top: 0,
    left: 0,
    width: 500,
    height: 800
};

That is meant to be the starting location of image cropping and size of the image. The top and left properties are the starting position of the cropping, then the cropping takes width of width property and height of height property.

Of course, that codes is just an example of how to set the page clipping properties value manually, i.e. when you already know the image offset and dimension before. But the topic here is about PhantomJS, the automation tool. So, how do we get it automatically?

Here’s how it is done by using the getBoundingClientRect() function of the DOM.

page.clipRect = page.evaluate(function () {
    // Get the element you want and then get its offset + dimension values
    var element = document.getElementsByTagName('img')[0];
    var rect = element.getBoundingClientRect();
    
    // Prepare all needed properties
    return {
        top: rect.top,
        left: rect.left,
        width: rect.width,
        height: rect.height
    };
});

Run that codes for every image you want to scrape.

3. The ‘download image’ step

After you carefully follow the first and second step, don’t forget to give PhantomJS an instruction to render the web page into a file of a screenshot image, below the assignment of the offset and dimension value to page.clipRect property.

page.open('http://url.com/of/image/you/want/to/download.jpg', function (status) {
    if (status === 'success') {
        page.clipRect = page.evaluate(function () {
            var element = document.getElementsByTagName('img')[0];
            var rect = element.getBoundingClientRect();
            
            return {
                top: rect.top,
                left: rect.left,
                width: rect.width,
                height: rect.height
            };
        });
        
        page.render('/path/to/your/download/location/filename.jpg');
    }
});

Now go ahead and run your full script with these steps implemented on it. You should see that whatever the dimension your scrapped images have, it doesn’t affect the output dimension anymore, in other words, you are getting those images with their original dimension.


If you got any questions or some trouble when following those steps, write it down to the comment section of this post. Thanks!