Wednesday, February 6, 2013

Headless Chrome/Firefox Selenium Java in Amazon EC2

Setting up chrome/firefox in linux based x86_64 EC2 instance
In this post we'll see how to setup chrome/firefox in EC2 instance.
Further we'll discuss the way to setup chromedriver with selenium [java] in EC2 instance aswell.
Why is that difficult?
Linux based EC2 instances lack gtk+, which is a must to launch any GUI enabled applications.

How to solve?
Compile gtk+ from source.
This gist by joekiller has the complete dependency tree resolved for installing gtk+ for x86_64 machines.
Line 77 Basically installs firefox from its tarball. You can comment it out incase if you don't wish to.
Incase Line 42 fails. Do a wget for direct tarball from here, instead the complex recursive wget.
For Ex:
wget http://releases.mozilla.org/pub/mozilla.org/firefox/releases/latest/linux-x86_64/en-US/firefox-18.0.2.tar.bz2
So, All done?
Nopes, Now we can't run either of the browsers as we lack X11 server which does graphics operations & screen outputing.
As our environment is headless we can't go for real X11 server as well [lack of hardware units to do screen outputing].

How do we solve this?
We need Xvfb setup in our instance.
$ yum install Xvfb
$ Xvfb :1 -screen 0 1280x768x24 & 
#starts Xvfb in display ":1"
Should start a X virtual frame buffer (Xvfb) which performs all graphics operation in memory.

All set?
Not really :P Still we haven't configured the instance to use display ":1" for graphics operations.
We can do it by
$ export DISPLAY=:1
Executing the above will remain for the session that user has logged in [similar to standard linux terminal session].
You can even edit the bashrc profile. But it might need a instance restart to take effect [Not very sure].

Problem 2:
Setting up chromedriver[java] isn't a tough task incase you landed in here first :D

Step:1
Install chrome binary from the source or add the source urls of centos to amzn repo and do a yum install
Note: Chrome will refuse to start as root. So make sure your java process will be running as non root profile.
[root@xyz chrome]$ google-chrome 
Xlib:  extension "RANDR" missing on display ":1.0".
[9706:9706:0206/061403:ERROR:chrome_browser_main_extra_parts_gtk.cc(51)] Startup refusing to run as root.
Step:2
Setting up the display is necessary to make chrome to use Xvfb for graphics operations
$ vim /usr/bin/google-chrome
#add export DISPLAY=:1 to the beginning of the script
Step:3
Download the appropriate driver from code.google.com
Locate it in a directory which can be accessed without root permissions

Step:4
See the sample tutorial @Chrome Selenium Binding
Below I'm highlighting the important config code from the same link.
service = new ChromeDriverService.Builder()
        .usingChromeDriverExecutable(new File("path/to/my/chromedriver")) //location of the driver you downloaded
        .usingAnyFreePort()
        .build();
Step:5
Start the java process [non-root] you might see some c errors. Ignore them unless your process contiues to run.
Monitor the status of chromedriver.log [In our system it is located in /opt directory].
Make sure there are no "Connection Refused" or "ShutDown" messages.
These errors will eventually throw an exception at java main thread as TimedOutException.
Then main thread will exit.

As a whole
$ wget "https://gist.github.com/joekiller/4144838/raw/1560dbcf05cd90ba1052e8d999007f8803778c4a/gtk-firefox.sh"
#remember to comment line 77 incase u don't need firefox
$ chmod 777 gtk-firefox.sh
$ sh gtk-firefox.sh
#above command will take atleast 30 min
$ yum install Xvfb
$ Xvfb :1 -screen 0 1280x768x24 &
#install google-chrome via yum [add source urls to repo] or from source
$ google-chrome
#make sure the command doesn't exit till you force it to [no gtk fatal errors or permission issues]
$ which google-chrome
$ /usr/bin/google-chrome
$ vim /usr/bin/google-chrome
# add export DISPLAY=:1
# run the java code it should work
If you face any issues leave a comment will see if I could help.
Hope this helps :)