|SOSPLICE(9)||Kernel Developer's Manual||SOSPLICE(9)|
— splice two sockets for zero-copy data
socket *so, int fd,
struct timeval *tv);
socket *so, int
sosplice() is used to splice
together a source and a drain socket. The source socket is passed as the
so argument; the file descriptor of the drain is
passed in fd. If fd is negative,
an existing splicing gets dissolved. If max is
positive, at most that many bytes will get transferred. If
tv is not NULL, a
timeout(9) is scheduled to dissolve
splicing in the case when no data can be transferred for the specified
period of time. Socket splicing can be invoked from userland via the
setsockopt(2) system-call at the
SOL_SOCKET level with the socket option
Before connecting both sockets, several checks are executed. See the ERRORS section for possible failures. The connection between both sockets is implemented by setting these additional fields in the struct sosplice *so_sp field in struct socket:
After connecting both sockets,
somove() to transfer the mbufs already in the
source receive buffer to the drain send buffer. Finally the socket buffer
SB_SPLICE is set on both socket buffers, to
indicate that the protocol layer has to call
somove() whenever data or space is available.
somove() transfers data from
the source's receive buffer to the drain's send buffer. It must be called at
so must be a spliced source socket. It may be
necessary to split an mbuf to handle out-of-band data inline or when the
maximum splice length has been reached. If wait is
M_WAIT, splitting mbufs will always succeed. For
M_DONTWAIT the out-of-band property might get lost
or a short splice might happen. In the latter case, less than the given
maximum number of bytes are transferred and userland has to cope with this.
Note that a short splice cannot happen if
was called by
sosplice(). So a second
setsockopt(2) after a short splice
pointing to the same maximum will always succeed.
Before transferring data,
both sockets for errors and that the drain socket is connected. If the drain
cannot send anymore, an
EPIPE error is set on the
source socket. The data length to move is limited by the optional maximum
splice length and the space in the drain's send socket buffer. Up to this
amount of data is taken out of the source's receive socket buffer. To avoid
splicing loops created by userland, the number of times an mbuf may be moved
between sockets is limited to 128.
For atomic protocols, either one complete packet is taken out, or
nothing is taken at all if: the packet is bigger than the drain's send
buffer size, in which case the splicing gets aborted with an
EMSGSIZE error; the packet does not fit into the
drain's current send buffer space, in which case it is left in the source's
receive buffer for later processing; or the maximum splice length is located
within a packet, in which case splicing gets dissolved like a short splice.
All address or control mbufs associated with the taken packet are
If the maximum splice length has been reached, an mbuf may get
split for non-atomic protocols. Otherwise an mbuf is either moved completely
to the send buffer or left in the receive buffer for later processing. If
SO_OOBINLINE is set, out-of-band data will get moved as such although this
might not be reliable. The data is sent out to the drain socket via the
protocol function. If that fails and the drain socket cannot send anymore,
EPIPE error is set on the source socket.
For packet oriented protocols
iterates over the next packet queue.
If a maximum splice length was specified and at least this amount
of data has been received from the drain socket, splicing gets dissolved. In
this case, an
EFBIG error is set on the source
socket if the maximum amount of data has been transferred. Userland can
process this error to distinguish the full splice from a short splice or to
react to the completed maximum splice immediately. If an idle timeout was
specified and no data has been transferred for that period of time, the
soidle() dissolves splicing and sets an
ETIMEDOUT error on the source socket.
sounsplice() is called to
dissolve the socket splicing if the source socket cannot receive anymore and
its receive buffer is empty; or if the drain socket cannot send anymore; or
if the maximum has been reached; or if an error occurred; or if the idle
timeout has fired.
If the socket buffer flag
set, the functions
sowwakeup() will call
somove() to trigger the transfer when new data or
buffer space is available. While socket splicing is active, any
read(2) from the source socket will block.
Neither read nor write wakeups will be delivered to the file descriptors.
After dissolving, a read event or a socket error is signaled to userland on
the source socket. If space is available, a write event will be signaled on
the drain socket.
sosplice() returns 0 on success and
otherwise the error number.
somove() returns 0 if
socket splicing has been finished and 1 if it continues.
sosplice() will succeed unless:
PR_SPLICEflag set. Only TCP and UDP socket splicing is supported.
Socket splicing for TCP first appeared in OpenBSD 4.9; support for UDP was added in OpenBSD 5.3.
The idea for socket splicing originally came from Markus Friedl <email@example.com>, and Alexander Bluhm <firstname.lastname@example.org> implemented it. Mike Belopuhov <email@example.com> added the timeout feature.
|July 4, 2019||OpenBSD-current|