There is a potential, corner case scenario, where a deadlock can occur
between TCP and socket layers, when both ends of the connection transmit
data.
The scenario is as follows:
* Both ends of the connection transmit data,
* Zephyr side send() call gets blocked due to filing the TX window
* The next incoming packet is data packet, not updating the RX window
on the peer side or acknowledging new data. The TCP layer will
attepmt to notify the new data to the socket layer, by calling the
registered callback. This will block the RX thread processing the TCP
layer, as the socket mutex is already acquired by the blocked send()
call.
* No further packets are processed until the socket mutex is freed,
which does not happen as the only way to unblock send() is process
a new ACK, either updating window size or a acknowledging data.
The connection stalls until send() times out.
The deadlock is not permament, as both threads get unlocked once send()
times out. It effectively breaks the active connection though.
Fix this, by unlocking the socket mutex for the time the send() call is
idle. Once the TCP layer notifies that the window is available again,
the mutex is acquired back.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Some errors can occur in the sending process that have to be handled
in a polling fasion instead of blocking using semaphores. In this case
apply an exponentially growing backoff time. This will allow for fast
reactions in most situations and prevents high system loads in case
resolving the situation takes a little longer.
Signed-off-by: Sjors Hettinga <s.a.hettinga@gmail.com>
When an operation on the socket is not supported by the implementation,
which is the case for some drivers, set errno to a value that reflects
this situation rather than signalling an error with the file descriptor.
Signed-off-by: Ulf Lilleengen <lulf@redhat.com>
Prevent local "addrlen_copy" variable from being used uninitialized in
accept() userspace verification function.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
To improve the performance with small chunks send, implement Nagle's
algorithm. Provide the option TCP_NODELAY to disable the algorithm.
Signed-off-by: Sjors Hettinga <s.a.hettinga@gmail.com>
Fragmented data passed to sendmsg() should be sent as a single datagram in
case of datagram sockets (i.e. DTLS connection). Right now that is not
happening now, as each fragment is sent separately, which works fine only
for stream sockets.
There is no mbedTLS API for 'gather' write at this moment. This means that
implementing sendmsg() would require allocating contiguous memory area at
Zephyr TLS socket level and copying all data fragments before passing to
mbedTLS library. While this might be a good option for future, let's just
check if data passed to sendmsg() API consists of a single memory region
and can be sent using single send request. Return EMSGSIZE error if there
are more then one data fragments.
Signed-off-by: Marcin Niestroj <m.niestroj@emb.dev>
Implement POLLOUT for stream sockets, based on newly introduced tx_sem
functionality of the TCP stack.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Utilize the TCP semaphore monitoring transmit status at the socket
layer. This allows to resume transfer as soon as possible instead of
waiting blindly.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Make use of the status field, reported by TCP, in the socket receive
callback. This allows to differentiate a graceful connection shutdown
from actual errors at TCP level (transmission timeout or RST received).
In case of error reported from TCP layer, set a new SOCK_ERROR flag on
the socket, and store the error code in the net_context user_data.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
In order to bring consistency in-tree, migrate all subsystems code to
the new prefix <zephyr/...>. Note that the conversion has been scripted,
refer to zephyrproject-rtos#45388 for more details.
Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
Introduce set/get SO_SNDBUF option using the setsockopt
function. In addition, for TCP, check the sndbuf value
before queuing data.
Signed-off-by: Mohan Kumar Kumar <mohankm@fb.com>
'optval' in setsockopt(..., SO_BINDTODEVICE, ...) was casted explicitly
from 'const void *' to 'struct ifreq *'. Rely on C implicit casting from
'const void *' to 'const struct ifreq *' and simply update variable
type. This prevents unwanted modification of ifreq value in the future.
Signed-off-by: Marcin Niestroj <m.niestroj@emb.dev>
Introduce set/get SO_RCVBUF option using the setsockopt
function. In addition, use the rcvbuf value to set the
tcp recv window.
Signed-off-by: Mohan Kumar Kumar <mohankm@fb.com>
The verification function for accept() did not take into account that
addr and addrlen pointers provided could be NULL.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
TCP module can report EAGAIN in case TX window is full. This should not
be forwarded to the application, as blocking socket is not supposed to
return EAGAIN.
Fix this for sendmsg by implementing the same mechanism for handling TX
errors as for regular send/sendto operations.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Add basic shutdown() implementation for net_context sockets, which
handles only SHUT_RD as 'how' parameter and returns -ENOTSUP for SHUT_WR
and SHUT_RDWR. The main use case to cover is to allow race-free wakeup
of threads calling recv() on the same socket.
Signed-off-by: Marcin Niestroj <m.niestroj@grinn-global.com>
So far shutdown() implementation was a noop and just resulted in warning
logs. Add shutdown() method into socket vtable. Call it if provided and
fallback into returning -ENOTSUP if not.
Signed-off-by: Marcin Niestroj <m.niestroj@grinn-global.com>
Add implementation of net_tcp_update_recv_wnd() function.
Move the window deacreasing code to the tcp module - receive window
has to be decreased before sending ACK, which was not possible when
window was decreased in the receive callback function.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Fix net_pkt leak by increasing net_context the reference count earlier
in the zsock_accepted_cb() with instalment of the
zsock_received_cb() callback.
And consequently flushing recv_q and decrement net_context
reference count if zsock_accept_ctx() fails.
Signed-off-by: Daniel Nejezchleb <dnejezchleb@hwg.cz>
Report ZSOCK_POLLHUP event if peer closed the connection, and thus the
socket is in EOF state.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Simplify common `getsockname()` implementation by using VTABLE_CALL()
macro, in the same way as other socket calls do. This additionally
allows to cover the case, when `getsockname()` is not implemnented by
particular socket implementation, preventing the crash.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Replace unpacked in6_addr structures with raw buffers in net_ipv6_hdr
struct, to prevent compiler warnings about unaligned access.
Remove __packed parameter from `struct net_6lo_context` since the
structure isn't really serialized.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Replace unpacked in_addr structures with raw buffers in net_ipv4_hdr
struct, to prevent compiler warnings about unaligned access.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Implements mechanism similar to the one available in net/lib/sockets.c
(since the merge of #27054) in sockets_can to enable parallel rx/tx.
Fixes#38698
Signed-off-by: Mateusz Karlic <mkarlic@internships.antmicro.com>
When creating a socket, all of the registered socket implementation are
processed in a sequence, allowing to find appropriate socket
implementation for specified family/type/protocol. So far however,
the order of processing was not clearly defined, leaving ambiguity if
multiple implmentations supported the same set of parameters.
Fix this, by registering socket priority along with implementation. This
makes the processing order of particular socket implementations
explicit, giving more flexibility to the user, for example when it's
neeed to prioritze one implementation over another if they support the
same set of parameters.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
This migrates all the current iterable section usages to the external
API, dropping the "Z_" prefix:
Z_ITERABLE_SECTION_ROM
Z_ITERABLE_SECTION_ROM_GC_ALLOWED
Z_ITERABLE_SECTION_RAM
Z_ITERABLE_SECTION_RAM_GC_ALLOWED
Z_STRUCT_SECTION_ITERABLE
Z_STRUCT_SECTION_ITERABLE_ALTERNATE
Z_STRUCT_SECTION_FOREACH
Signed-off-by: Fabio Baltieri <fabiobaltieri@google.com>
k_timeout_t was converted to ticks using a nonsense function
causing poll timeout corruption for offloaded sockets; this
commit uses ticks directly from the struct instead.
Fixes#37472
Signed-off-by: Emil Lindqvist <emil@lindq.gr>
When zsock_close() is called, socket is freed before the mutex for the
socket is unlocked. If the freed socket is given to another thread
immediately, the mutex for the socket will be initialized by the new
socket owner, while the mutex is still locked by the thread calling
zosck_close().
Fixes#36568
Signed-off-by: Chih Hung Yu <chyu313@gmail.com>
Allow caller to specify microsecond accuracy and not convert
to milliseconds.
Fixes#36072
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
The k_fifo_ prefix is meant for kernel API functions, and
not to our socket helper. So remove the k_ prefix in order
to avoid confusion.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
If we are waiting all the data i.e., the MSG_WAITALL flag is set,
then if we have not yet received all the data at the end of the
receive loop. We must use the condition variable to get the signal
when the data is ready to be received. Otherwise the receive loop
will not release the socket lock and receive_cb will not be able
to indicate that data is received.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Fix a regression when application is waiting data but does
not notice that because socket layer is not woken up.
This could happen because application was waiting condition
variable but the signal to wake the condvar came before the
wait started. Normally if there is constant flow of incoming
data to the socket, the signal would be given later. But if
the peer is waiting that Zephyr replies, there might be a
timeout at peer.
The solution is to add locking in socket receive callback so
that we only signal the condition variable after we have made
sure that the condition variable is actually waiting the data.
Fixes#34964
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
This value is used to measure the RX/TX statistics. The previous
use of the timestamp field did not work in RX path as the timestamp
value could be overwritten by the driver if gPTP timestamping
is enabled. So to fix the RX statistics, use a separate field
for the create time.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
This option was only able to collect statistics of transmitted
data. The same functionality is available if one sets the
CONFIG_NET_PKT_RXTIME_STATS and/or CONFIG_NET_PKT_TXTIME_STATS
options.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
The RX statistics might not get updated properly because the used
ifdef was referring the TX options.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Implement SO_BINDTODEVICE socket option which allows to bind an open
socket to a particular network interface. Once bound, the socket will
only send and receive packets through that interface.
For the TX path, simply avoid overwriting the interface pointer by
net_context_bind() in case it's already bound to an interface with an
option. For the RX path, drop the packet in case the connection handler
detects that the net_context associated with that connection is bound to
a different interface that the packet origin interface.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
When recvfrom() was called with src_addr != NULL, then source address
was fetched from beginning of net_pkt. This works with native IP stack
obviously. However with offloaded IP stack there is no IP header, so
trying to parse missing IP header results in undefined behavior.
Check if network interface has offloaded IP stack. If positive, then
figure out if there is assigned remote address to network context on
which packet was received. Return this remote address, which SHOULD be
the source address of received packet. Otherwise, return an error.
Signed-off-by: Marcin Niestroj <m.niestroj@emb.dev>
Implement MSG_WAITALL flag for stream sockets. Setting this flag on
`recv()` call will make it wait until the requested amount of data is
received.
In case both, MSG_WAITALL all is set and SO_RCVTIMEO option configured
on a socket, follow the Linux behavior, i. e. when the requested amount
of data is not received until the timeout expires, return the data
received so far w/o an error.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
This patch adds implementation of socket option used to get
protocol used for given socket (e.g. IPPROTO_TCP). This option
is not defined in POSIX, but it is Linux extension.
Signed-off-by: Hubert Miś <hubert.mis@nordicsemi.no>
This patch adds implementation of socket option used to get
type of given socket (e.g. SOCK_STREAM).
Signed-off-by: Hubert Miś <hubert.mis@nordicsemi.no>
Because unoconnected stream socket doesn't have any chance to receive
any data, so a blocking recv() would hang forever on it (and does
without this change).
Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
If there is no space in the sending window, then return -EAGAIN
so that the caller may try later.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>