Understanding the WebDriver architecture_Learning Selenium Testing Tools（Third Edition）-QQ阅读女频古言网

上QQ阅读APP看书，第一时间看更新

Understanding the WebDriver architecture

The WebDriver architecture does not follow the same approach as Selenium RC (for more details, refer to http://docs.seleniumhq.org/docs/05_selenium_rc.jsp), which was written purely in JavaScript for browser automation. The JavaScript, in Selenium RC, would then emulate user actions. This JavaScript would automate the browser from within the browser. On the other hand, WebDriver tries to control the browser from outside the browser. It uses the accessibility API to drive the browser. The accessibility API is used by a number of applications to access and control applications when they are used by disabled users and is common to web browsers.

WebDriver uses the most appropriate way to access the accessibility API. If we look at Firefox, it uses JavaScript to access the API. If we look at Internet Explorer, it uses C++. This approach means that we can control browsers in the best possible way.

Understanding the WebDriver architecture

The system is made up of four different layers, as can be seen in the preceding image.

The WebDriver API

The WebDriver API (refer to http://selenium.googlecode.com/svn/trunk/docs/api/java/index.html?overview-summary.html) is the part of the system that you interact with all the time. Things have changed from the 140 line long API that the Selenium RC API had. This is now more manageable and can actually fit on a normal screen. This is made up of the WebDriver and the WebElement objects:

driver.findElement(By.name("q"))

and

element.sendKeys("I love cheese");

These commands are then translated to the SPI, which is stateless (stateless means there is no record of previous interactions and each interaction request has to be handled based entirely on information that comes with it). This can be seen in the next section.

The WebDriver SPI

When the code enters the SPI (Stateless Programming Interface), it is then called to a mechanism that breaks down what the element is, using a unique ID, and then called a command that is relevant. All of the API calls will happen in a top-down approach.

Using the example in the previous section would be like the following SPI:

findElement(using="name", value="q")
sendKeys(element="webdriverID", value="I love cheese")

From here, we call the JSON Wire protocol. We still use HTTP as the main transport mechanism. Developers created the JSON Wire Protocol to communicate with browsers, a simple client server transport architecture.

The JSON Wire Protocol

The WebDriver developers created a transport mechanism called the JSON Wire Protocol. This protocol is able to transport all the necessary elements to the code that controls it. It uses a REST-like API as the way to communicate.

The Selenium server

The Selenium server, or browser, depending on what is processing, uses the JSON Wire Protocol commands to break down the JSON object and then does what it needs to. This part of the code is dependent on which browser it is running on.

As mentioned earlier, it could be done in the browser via C++ if it's in IE or if not available, we inject Selenium. Refer to the following screenshot for other language/browser combinations:

Selenium WebDriver supported languages and browsers

The preceding screenshot shows how many languages the Selenium WebDriver currently supports. In simple words, these are the languages in which we can build the framework, which in turn will interact with the Selenium WebDriver and work on various browsers and other devices. So, we have a common API that we use for Selenium that has a common set of commands and we have various bindings for the different languages. So, you can see there's Java, Python, Ruby; there's also some other bindings and new bindings can be added very easily. The Selenium WebDriver contains a set of common libraries that allow sending commands to respective drivers.

As seen in the screenshot, on the right-hand side, we have the drivers. We have various Internet browser-specific drivers (such as IE driver, Firefox, Chrome), and others such as an HTML unit, which is an interesting one. It works in a headless mode which makes text execution faster. It also contains mobile-specific drivers as well. However, the basic idea here is that each one of these drivers knows how to drive the browser that it corresponds to. So, the Chrome driver knows how to handle the low level details of the Chrome browser and drives it to do things, such as clicking a button, going into pages, getting data from the browser itself; the same thing goes for Firefox, IE, and so on.