When we started building Thum.io, we set out to build the best website screenshot generator out there. We wanted to build on top of existing tools, but also wanted to make sure we would produce the highest quality thumbnails.
We use HTML5Test to test our browser configuration. HTML5Test gives a great overview of what features are supported by a given browser. Some things to look for when evaluating a render solution are:
PhantomJS is a fast head-less browser based on Webkit. It is easy to install and easy to use. The only problem is that it doesn't have the support that you would expect from a full-featured browser. We created a screenshot using this code:
var page = require('webpage').create(); page.viewportSize = { width: 1200, height: 1200}; var address = 'http://www.html5test.com/'; var output = 'html5.png' page.open(address, function (status) { if (status !== 'success') { console.log('Unable to load the address!'); phantom.exit(); } else { window.setTimeout(function () { page.render(output); phantom.exit(); }, 5000); // Change timeout as required to allow sufficient time } });
phantomjs html5.js
Selenium is a well known Web Automation framework and it has the ability to take screenshots. The nice thing about Selenium is that it uses an actual browser render pages, so the website snapshot is going to look exactly as it would on a desktop computer.
One thing Selenium doesn't seem to be designed for is long-lived server environments. Selenium was originally designed for running test-cases, where you spin up a Selenium server, run a bunch of tests, and then spin down the server. We are using selenium instances that run for days or even weeks at a time so we very quickly uncovered some issues with long running servers. The good news is that our fixes are now a part of selenium.
The built in Selenium screenshot system relies on javascript. This causes problems
because it means that if the page has javascript running, then a screenshot of the
website can't be taken. It also means that you can't take any screenshots until
onComplete
fires. This wouldn't work for us since we wanted
to animate the rendering, so we built our own screenshotting code.
When developing our API, we wanted the resulting url's to be as readable as
possible. Placing the URL in the path rather than in a url parameter means
that url's don't have to be encoded. For example compare these two:
/get/http://www.google.com/?q=test /get/?url=http:%2F%2Fwww.google.com%2F%3Fq%3Dtest
/width/1200
or /wait/5/
.
If we combine these into one url we can create an image similar to our PhantomJS
test using this url:http://image.thum.io/get/width/1200/wait/5/https://www.html5test.com/
We hope this overview helps people understand our technology, but technology
is never enough to build great products. That's why we take supporting our
users very seriously.
If you have any questions or comments,
email support@thum.io.