— splice two sockets for
zero-copy data transfer
socket *so, int fd,
struct timeval *tv);
socket *so, int
is used to splice together a source and a drain socket. The source socket is
passed as the so argument; the file descriptor of the
drain is passed in fd. If fd is
negative, an existing splicing gets dissolved. If max
is positive, at most that many bytes will get transferred. If
tv is not NULL, a
timeout(9) is scheduled to dissolve splicing in the case when no data
can be transferred for the specified period of time. Socket splicing can be
invoked from user-land via the
setsockopt(2) system-call at the
SOL_SOCKET level with the socket option
Before connecting both sockets, several checks are executed. See the ERRORS section for possible failures. The connection between both sockets is implemented by setting these additional fields in struct socket:
- struct socket *so_splice links from the source to the drain socket.
- struct socket *so_spliceback links back from the drain to the source socket.
- off_t so_splicelen counts the number of bytes spliced so far from this socket.
- off_t so_splicemax specifies the maximum number of bytes to splice from this socket if non-zero.
After connecting both sockets,
somove() to transfer the mbufs already in the
source receive buffer to the drain send buffer. Finally the socket buffer
SB_SPLICE is set on both socket buffers, to
indicate that the protocol layer has to call
somove() whenever data or space is available.
transfers data from the source's receive buffer to the drain's send buffer.
It must be called at
splsoftnet(9) and so must be a spliced
drain socket. It may be necessary to split an mbuf to handle out-of-band
data inline or when the maximum splice length has been reached. If
mbufs will always succeed. For
out-of-band property might get lost or a short splice might happen. In the
latter case, less than the given maximum number of bytes are transferred and
user-land has to cope with this. Note that a short splice cannot happen if
somove() was called by
sosplice(). So a second
setsockopt(2) after a short splice pointing to the same
maximum will always succeed.
Before transferring data,
checks both sockets for errors and that the drain socket is connected. If
the drain cannot send anymore, an
EPIPE error is set
on the source socket. The data length to move is limited by the optional
maximum splice length and the space in the drain's send socket buffer. Up to
this amount of data is taken out of the source's receive socket buffer.
If the maximum splice length has been reached, an mbuf may get
split. Otherwise an mbuf is either moved completely to the send buffer or
left in the receive buffer for later processing. If SO_OOBINLINE is set,
out-of-band data will get moved as such although this might not be reliable.
The data is sent out to the drain socket via the protocol function. If that
fails and the drain socket cannot send anymore, an
EPIPE error is set on the source socket.
If the idle timeout was specified and no data was transferred for
that period of time, splicing gets dissolved and an
ETIMEDOUT error is set on the source socket.
Finally the socket splicing gets dissolved if the source socket cannot receive anymore and its receive buffer is empty; or if the drain socket cannot send anymore; or if the maximum has been reached; or if an error occurred.
If the socket buffer flag
SB_SPLICE is set, the functions
somove() to trigger the transfer when new
data or buffer space is available. While socket splicing is active, any
read(2) from the source socket will block and the wakeup will not be
delivered to the file descriptor. A read event is signaled to user-land
sosplice() returns 0 on success and
otherwise the error number.
somove() returns 0 if
socket splicing has been finished and 1 if it continues.
sosplice() will succeed unless:
- The given file descriptor fd is not an active descriptor.
- The source or the drain socket is already spliced.
- The given maximum value max is negative.
- The source or the drain socket is neither connected nor in the process of connecting to a peer.
- The given file descriptor fd is not a socket.
- The source or the drain socket is a listen socket.
- The source socket's protocol layer does not have the
PR_SPLICEflag set. At the moment only TCP supports socket splicing.
- The drain socket's protocol does not have the same pr_usrreq function as the source.
- The source socket is non-blocking and the receive buffer is already locked.
setsockopt(2), options(4), timeout(9)
Socket splicing first appeared in OpenBSD 4.9.
The idea for socket splicing originally came from Markus Friedl ⟨email@example.com⟩, and Alexander Bluhm ⟨firstname.lastname@example.org⟩ implemented it.