— splice two sockets for zero-copy data
The function sosplice
() is used to splice together
a source and a drain socket. The source socket is passed as the
argument; the file descriptor of the drain
is passed in fd
is negative, an existing splicing gets
dissolved. If max
is positive, at most that
many bytes will get transferred. If tv
NULL, a timeout(9)
to dissolve splicing in the case when no data can be transferred for the
specified period of time. Socket splicing can be invoked from userland via the
level with the socket option
Before connecting both sockets, several checks are executed. See the
section for possible
failures. The connection between both sockets is implemented by setting these
additional fields in struct socket
- struct socket
*so_splice links from the source to the
- struct socket
*so_spliceback links back from the drain
to the source socket.
so_splicelen counts the number of bytes
spliced so far from this socket.
so_splicemax specifies the maximum number
of bytes to splice from this socket if non-zero.
- struct timeval
so_idletv specifies the maximum idle time
- struct timeout
so_idleto provides storage for the kernel
timeout if idle time is used.
After connecting both sockets, sosplice
() to transfer the mbufs already in the
source receive buffer to the drain send buffer. Finally the socket buffer flag
is set on both socket buffers, to
indicate that the protocol layer has to call
() whenever data or space is available.
The function somove
() transfers data from the
source's receive buffer to the drain's send buffer. It must be called at
must be a spliced source socket. It may be
necessary to split an mbuf to handle out-of-band data inline or when the
maximum splice length has been reached. If
, splitting mbufs will always
property might get lost or a short splice might happen. In the latter case,
less than the given maximum number of bytes are transferred and userland has
to cope with this. Note that a short splice cannot happen if
() was called by
(). So a second
after a short
splice pointing to the same maximum will always succeed.
Before transferring data, somove
() checks both
sockets for errors and that the drain socket is connected. If the drain cannot
send anymore, an
error is set on the
source socket. The data length to move is limited by the optional maximum
splice length and the space in the drain's send socket buffer. Up to this
amount of data is taken out of the source's receive socket buffer. To avoid
splicing loops created by userland, the number of times an mbuf may be moved
between sockets is limited to 128.
For atomic protocols, either one complete packet is taken out, or nothing is
taken at all if: the packet is bigger than the drain's send buffer size, in
which case the splicing gets aborted with an
error; the packet does not fit
into the drain's current send buffer space, in which case it is left in the
source's receive buffer for later processing; or the maximum splice length is
located within a packet, in which case splicing gets dissolved like a short
splice. All address or control mbufs associated with the taken packet are
If the maximum splice length has been reached, an mbuf may get split for
non-atomic protocols. Otherwise an mbuf is either moved completely to the send
buffer or left in the receive buffer for later processing. If SO_OOBINLINE is
set, out-of-band data will get moved as such although this might not be
reliable. The data is sent out to the drain socket via the protocol function.
If that fails and the drain socket cannot send anymore, an
error is set on the source socket.
For packet oriented protocols somove
over the next packet queue.
If a maximum splice length was specified and at least this amount of data has
been received from the drain socket, splicing gets dissolved. In this case, an
error is set on the source socket if
the maximum amount of data has been transferred. Userland can process this
error to distinguish the full splice from a short splice or to react to the
completed maximum splice immediately. If an idle timeout was specified and no
data has been transferred for that period of time, the handler
() dissolves splicing and sets an
error on the source socket.
The function sounsplice
() is called to dissolve the
socket splicing if the source socket cannot receive anymore and its receive
buffer is empty; or if the drain socket cannot send anymore; or if the maximum
has been reached; or if an error occurred; or if the idle timeout has fired.
If the socket buffer flag
the functions sorwakeup
() will call
() to trigger the transfer when new data or
buffer space is available. While socket splicing is active, any
from the source socket
will block and the wakeup will not be delivered to the file descriptor. A read
event or a socket error is signaled to userland after dissolving.
() returns 0 on success and otherwise the
error number. somove
() returns 0 if socket
splicing has been finished and 1 if it continues.
() will succeed unless:
- The given file descriptor
fd is not an active descriptor.
- The source or the drain socket is already spliced.
- The given maximum value
max is negative.
- The source socket requires a connection and is neither
connected nor in the process of connecting to a peer.
- The drain socket is neither connected nor in the process of
connecting to a peer.
- The given file descriptor
fd is not a socket.
- The source or the drain socket is a listen socket.
- The source socket's protocol layer does not have the
PR_SPLICE flag set. Only TCP and UDP
socket splicing is supported.
- The drain socket's protocol does not have the same
pr_usrreq function as the source.
- The source socket is non-blocking and the receive buffer is
Socket splicing for TCP first appeared in OpenBSD 4.9
support for UDP was added in OpenBSD 5.3
The idea for socket splicing originally came from
and Alexander Bluhm
implemented it. Mike Belopuhov
added the timeout feature.