background

In mid-2008, developers Michael Carter and Ian Hickson were particularly keenly aware of the pain and limitations of Comet's implementation of anything truly powerful. Through cooperation on the IRC and W3C mailing lists, they developed a plan to introduce a new standard for modern real-time two-way communication on the Internet, thus creating the name "WebSocket".

This idea entered the W3C HTML draft standard. Soon after, Michael Carter wrote an article introducing the Comet community to WebSockets. In 2010, Google Chrome 4 was the first browser to provide full support for WebSockets, and other browser vendors also adopted this approach in the following years. In 2011, RFC 6455-WebSocket protocol-was published on the IETF website.

Today, all major browsers fully support WebSockets, even including Internet Explorer 10 and 11. In addition, since 2013, browsers on iOS and Android have supported WebSockets, which means that in general, the modern environment supported by WebSocket is very healthy. Most "Internet of Things" or IoT also run on some versions of Android, so starting in 2018, WebSocket support on other types of devices is also quite common

What is WebSockets?

WebSockets is a transport layer built on top of the device's TCP/IP stack. The goal is to provide web application developers with a TCP communication layer that is as close to the original as possible in nature, while adding some abstractions to eliminate some differences. They also satisfy the fact that networks have additional security considerations that must be taken into account to protect consumers and service providers

You may have heard that WebSockets are called "transports" and "protocols" at the same time. The former is more accurate, because although they are a protocol, because a strict set of rules must be followed to establish communication and contain the transmitted data, the standard does not take any regulations on how to construct the actual data payload. In fact, part of the specification includes a specification that the client and server agree on an agreement, and the transmitted data will be formatted and interpreted through the agreement. The standard calls these "sub-protocols" to avoid ambiguity in terminology. Examples of sub-protocols are JSON, XML, MQTT, WAMP, etc. These not only ensure the way the data is structured, but also the way the communication must start, continue and eventually terminate. As long as both parties understand what the agreement contains, anything will happen. WebSocket only provides the transport layer through which the message delivery process can be realized. This is why most common sub-protocols are not unique to WebSocket-based communications

Authentication and authorization

Think of WebSockets as a thin layer built on top of TCP/IP. Anything beyond the basic handshake and message framework specifications needs to be processed on a per-application or per-library basis. Quoting RFC:

This protocol does not specify any specific way the server can authenticate the client during the WebSocket handshake. The WebSocket server can use any client authentication mechanism available to general HTTP servers, such as cookies, HTTP authentication, or TLS authentication.

In short, you can still use HTTP-based authentication methods, or use sub-protocols such as MQTT or WAMP, both of which provide authentication and authorization methods.

Use HTTP to connect

One of the early considerations when defining the WebSocket standard was to ensure that it works well with the network. This means recognizing that the Web usually uses URLs instead of IP addresses and port numbers for addressing, and that WebSocket connections should be able to use any other type of HTTP-based initial handshake that is the same as the Web request.

This is what happens in a simple HTTP GET request.

Suppose there is an html page on http://www.example.com. If you don't go deep into the HTTP protocol itself, it is enough to know that the request must start with the so-called Request-Line, followed by a series of key-value pair header lines, each line tells the server some information about what. Expect to follow the header data in the subsequent request payload, and what it can get from the client about the type of response it can understand.

The first token in the request is the HTTP method, which tells the server what type of operation the client is trying to refer to the URL. When the client only requests the server to provide it with a copy of the resource referenced by the specified URL, use the GET method

A system example of request headers formatted according to HTTP RFC is shown below

GET /index.html HTTP/1.1
Host: www.example.com

After receiving the request header, the server then formats a response header starting with a status line, and then a set of key-value header pairs to provide the client with supplementary information from the server about the server's request. response. The "status line" tells the client the HTTP status code (usually 200 if there is no problem) and provides a short "reason" text description explaining the status code. The key-value heading pair appears next, followed by the actual data requested (unless the status code indicates that the request cannot be fulfilled for some reason)

HTTP/1.1 200 OK
Date: Wed, 1 Aug 2018 16:03:29 GMT
Content-Length: 291
Content-Type: text/html
(additional headers...)

(response payload continues here...)

You may ask, what does this have to do with WebSockets?

Abandon HTTP to get something more suitable

When making an HTTP request and receiving a response, the actual two-way network communication involved takes place through an active TCP/IP socket. The web URL requested in the browser is mapped to the IP address through the global DNS system, and the default port for HTTP requests is 80. This means that although the web URL has been entered into the browser, the actual communication is via TCP/IP, using something similar to 123.11 .85.9:80 IP address and port combination.

We now know that WebSockets are also built on top of the TCP stack, which means that all we need is a way for the client and server to agree to keep the socket connection open and reuse it for continuous communication. If they do, they can send and receive binary data.

To start re-adjusting TCP sockets for WebSocket communication, the client can include standard request headers invented specifically for this type of use case:

GET /index.html HTTP/1.1
Host: www.example.com
Connection: Upgrade
Upgrade: websocket

image.png

The Connection header tells the server that the client wants to negotiate a change in how the socket is used. The attached value Upgrade indicates that the current transmission protocol used over TCP should be changed. Now that the server knows that the client wants to upgrade the currently used protocol through an active TCP socket, the server knows to look for the corresponding upgrade header, which will tell it the remaining life cycle of which transport protocol the client wants to use. Once the server sees websocket as the value of the Upgrade header, it knows that the WebSocket handshake process has started.

Please note that if you want to know more details about the details presented in this article, please refer to RFC 6455 which outlines the handshake process (and everything else)

Avoid interesting troubles

In addition to the content described above, the first part of the WebSocket handshake involves proving that this is actually a correct WebSocket upgrade handshake, and that the process is not circumvented or simulated by the client or possibly through some kind of intermediate deception. The proxy server in the middle.

When starting to upgrade to a WebSocket connection, the client must include the Sec-WebSocket-Key header, which has a unique value for the client. This is an example:

Sec-WebSocket-Key: BOq0IliaPZlnbMHEBYtdjmKIL38=

If you use the WebSocket class provided in modern browsers, the above content will be processed automatically. You only need to look for it on the server side and generate a response.

In response, the server must append the special GUID value 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 to the key, generate the SHA-1 hash value of the result string, and then include it as the base-64 encoded value of Sec. It contains the WebSocket-Accept header in the response:

Sec-WebSocket-Accept: 5fXT1W3UfPusBQv/h6c4hnwTJzk=

In the Node.js WebSocket server, we can write a function to generate this value, as shown below:

const crypto = require('crypto');

function generateAcceptValue (acceptKey) {
  return crypto
    .createHash('sha1')
    .update(acceptKey + '258EAFA5-E914-47DA-95CA-C5AB0DC85B11','binary')
    .digest('base64');
}

Then we only need to call this function, pass the value of the Sec-WebSocket-Key header as a parameter, and set the return value of the function to the value of the Sec-WebSocket-Accept header when sending the response.

To complete the handshake, write the appropriate HTTP response headers to the client socket. A simple response looks like this:

HTTP/1.1 101 Web Socket Protocol Handshake
Upgrade: WebSocket
Connection: Upgrade
Sec-WebSocket-Accept: m9raz0Lr21hfqAitCxWigVwhppA=

So far, we have not completed the handshake-there are many things to consider.

Sub-agreement-Unified Language

Clients and servers usually need to agree on a compatible strategy for formatting, interpreting, and organizing data within a given message and within a period of time from one message to the next. This is where the sub-agreement (mentioned earlier) comes in. If the client knows that it can handle one or more specific application-level protocols (such as WAMP, MQTT, etc.), it can include a list of protocols it understands. Make the initial HTTP request. If it does so, the server needs to choose one of the protocols and include it in the response header, otherwise it will fail the handshake and terminate the connection.

Example of subprotocol request header:

Sec-WebSocket-Protocol: mqtt, wamp

Example countdown header sent by the server in the response:

Sec-WebSocket-Protocol: wamp

Please note that the server must select exactly one protocol from the list provided by the client. Selecting multiple will mean that the server cannot reliably or consistently interpret the data in subsequent WebSocket messages. For example, if the server selects json-ld and json-schema. Both are data formats constructed based on the JSON standard, and there will be many edge cases, one of which may be interpreted as the other, causing unexpected errors when processing the data. Although it is undeniable that it is not a messaging protocol per se, this example still applies.

When both the client and the server are implemented to use the common messaging protocol from the beginning, the Sec-WebSocket-Protocol header can be omitted from the initial request, in which case the server can ignore this step. When implementing common services, infrastructure, and tools, sub-protocol negotiation is the most useful. In these services, infrastructure, and tools, once a WebSocket connection is established, there is no guarantee that the client and server can understand each other.

The standardized name of the general protocol should be registered in the IANA registry for the WebSocket sub-protocol name. At the time of writing, 36 names have been registered, including soap, xmpp, wamp, mqtt, etc. Although the registry is the normative source that maps the sub-protocol name to its interpretation, the only strict requirement is that the client and server agree on what the sub-protocols they choose from each other actually mean, regardless of whether it appears in the IANA registry.

Please note that if the client requests the use of the sub-protocol but does not provide anything that the server can support, the server must send a failure response and close the connection.

WebSocket Extension

There is also a header that defines the expansion of data payload encoding and framing, but at the time of this article, there is only one standardized extension type, which provides a WebSocket-equivalent to gzip compression in messages. Another example where extensions might workThe sub is multiplexing-using a single socket to interleave multiple concurrent communication streams.

WebSocket extension is a somewhat advanced topic and is beyond the scope of this article. Now, it is enough to know what they are and how they fit the picture.

Client-Use WebSockets in the browser

The WebSocket API is defined in the WHATWG HTML Living Standard, which is actually very simple and easy to use. One line of code is required to construct WebSocket:

const ws = new WebSocket('ws://example.org');

Note that using ws, you usually have an http plan. You can also choose to use wss, usually https. These protocols were introduced along with the WebSocket specification to represent HTTP connections, including requests to upgrade the connection to use WebSockets.

Creating the WebSocket object itself does not do much. The connection is established asynchronously, so you need to listen for the completion of the handshake before sending any messages, and also include a listener for messages received from the server:

ws.addEventListener('open', () => {
  // Send a message to the WebSocket server
  ws.send('Hello!');
});

ws.addEventListener('message', event => {
  // The `event` object is a typical DOM event object, and the message data sent
  // by the server is stored in the `data` property
  console.log('Received:', event.data);
});

There are also errors and shutdown events. WebSockets will not automatically recover when the connection is terminated-this is what you need to implement yourself, and one of the reasons why there are many client libraries. Although the WebSocket class is simple and easy to use, it is actually just a basic building block. Support for additional functions such as different sub-protocols or messaging channels must be implemented separately.

Generate and parse WebSocket message frames

Once the handshake response is sent to the client, the client and server can start communicating using the sub-protocol of their choice (if any).

WebSocket messages are delivered in packages named "frames", which start with a message header and end with "payload"-the message data of this frame. Large messages may divide the data into several frames, in this case you need to track what has been received so far, and group the data after all the data arrives

Likes(0)

Comment list count 0 Comments

No Comments

WeChat Self-Service

WeChat Consult

TaoBao

support@elephdev.com

发表
评论
Go
Top