Windows TCP stack has a peculiar behavior - when running iperf, it will
fill out the RX window almost entirely, but will not set PSH flag on
packets. In result, our stack would delay the ACK and thus window
update, affecting throughputs heavily.
In order to avoid that, keep track of the most recent window size
reported to the peer, and reduce it when receiving new data. In case the
RX window, as seen from the peer perspective, drops below certain
threshold, and the real RX window is currently empty, send an ACK
immediately when updating window, so that peer can continue
with sending data.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Delay an ACK in case no PSH flag is present in the data packet. as
described in RFC 813. This allows to reduce the number of ACK packets
we send and thus improve the TCP download throughput.
The results achieved on `nucleo_h723zg` board and the zperf sample
are as follows:
Before: 77.14 Mbps
After: 93.14 Mbps
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Partial conditional compilation within tcpv4/6_init_isn() caused
errors in coverity, as hash variable seemed to be used w/o
initialization.
As those functions are not really used at all if
CONFIG_NET_TCP_ISN_RFC6528 is disabled, we can just conditionally
compile entire functions to avoid confusion.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Utilize a code spell-checking tool to scan for and correct spelling errors
in all files within the `subsys/net/ip` directory.
Signed-off-by: Pisit Sawangvonganan <pisit@ndrsolution.com>
The Xilinx AXI Ethernet subsystem is capable of RX/Tx checksum
offloading. While it supports computing IP and UDP/TCP checksums, it
does not support computing ICMP checksums and only computes IP checksums
for ICMP messages. Thus, this patch adds an additional configuration for
ethernet drivers that indicates for which protocols checksum offloading
is (to be) supported. This flag is then considered by the IP subsystem
in determining when flags need to be computed in software.
Signed-off-by: Eric Ackermann <eric.ackermann@cispa.de>
Replaced tcp_out() with tcp_out_ext() when sending
SYN during tcp connect, so we can intercept error value,
beacuse in such case the packet would not be sent at all
and the stack would not trigger any mechanism to flush
this context so it ends up leaking. These scenarios can
arise when the underlaying interface is not properly
configured or is disconnected. The conn is marked to be
closed and is closed before exiting tcp_in().
Since we can now identify this case we can also exit
early in the net_tcp_connect() function and not wait
for any timeout.
Signed-off-by: Daniel Nejezchleb <dnejezchleb@hwg.cz>
When BUILD_WITH_TFM is enabled we can dispatch hash computation
to TFM. This allows to remove the built-in support of SHA256 from
the non-secure side (if it's not used for any other purpose, of course).
Signed-off-by: Valerio Setti <vsetti@baylibre.com>
We usually cannot use context->local for established TCP connections
because the local address is not updated for TCP if we are bound to
any address. So create helper that try to figure out the end point
addresses.
Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
Add reference counting to network interface address (for both
IPv4 and IPv6) so that the address is not removed if there are
sockets using it. If the interface address is removed while there
are sockets using it, the connectivity will fail for the said
socket.
Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
The size field in net_buf should not be used directly as then
the optional headroom will not be taken into account.
There is the net_buf_max_len() API that should be used instead.
Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
In case FIN packet included data bytes, Zephyr would ignore the
data. It wasn't passed to the application, and it wasn't considered
when bumping the ACK counter. This ended out in connection timing out,
instead of being properly terminated.
Fix this, by refactoring FIN processing in TCP_ESTABLISHED state.
Instead of handling FIN/FIN,ACK/FIN,ACK,PSH cases separately, have a
common handler when FIN flag is present, and when ACK flag is present
along with FIN. When FIN is present, take any potentially incoming data
into account.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Remove IPSP support from the tree.
It has no maintainers, and is regularly broken. The fact that it's
nontrivial to set-up in linux makes it hard to fix reported issues.
Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
If the packet cloning fails (can easily happen when working with
loopback interface and when having low net_buf count), then
print a warning to the user. Error could also be possible but
as the situation might correct itself in this case, the warning
should be enough.
Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
Deprecate CONFIG_NET_TCP_ACK_TIMEOUT as it is redundant with the
combination of CONFIG_NET_TCP_INIT_RETRANSMISSION_TIMEOUT and
CONFIG_NET_TCP_RETRY_COUNT. The total retransmission timeout (i. e.
waiting for ACK) should depend on the individual retransmission timeout
and retry count, having separate config is simply ambiguous and
confusing for users.
Moreover, the config was currently only used during TCP handshake, and
for that purpose we could use the very same timeout that is used for the
FIN timeout. Therefore, repurpose the fin_timeout_ms to be a generic,
maximum timeout at the TCP stack.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Instead of having a single config specifying the memory pool size for
variable-sized net buffers, have a separate one for TX and RX for better
configuration granularity when optimizing memory usage of the
application.
Deprecate the old configuration but use its value as a default (for now)
for the new configs. This will need to change when the config is
deleted.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
The final ACK check during passive close was wrong - we should not
compare its SEQ number with the ACK number we've sent last, but rather
compare the ACK number it acknowledges matches our current SEQ number on
the connection. This ensures, that the ACK received is really
acknowledging the FIN packet we've sent from our side, and is not just
some earlier retransmission. Currently the latter could be the case, and
we've closed the connection prematurely. In result, when the real "final
ACK" arrived, the TCP stack replied with RST.
Subsequently, we should increment the SEQ number on the connection after
sending FIN packet, so that we are able to identify final ACK correctly,
just as it's done in active close cases.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Debug logs in helper functions like tcp_unsent_len() or
tcp_window_full() are not very helpful and generate a heavy, unnecessary
log output. Therefore, tcp_unsent_len() will no longer generate log, and
tcp_window_full() will print out a log only when the window is actually
full, which could be an useful information.
Also, reduce the log load during TX, as currently redundant logs were
printed in tcp_out_ext(), tcp_send_process_no_lock() and finally in
tcp_send().
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
The mutex was removed in tcp_recv() where it doesn't seem
to be needed anymore as tcp_conn_search() got
tcp_mutex. In the other areas the tcp_mutex was
narrowed down to protect only the list.
Signed-off-by: Daniel Nejezchleb <dnejezchleb@hwg.cz>
This commit implements simple rate-limiting for Neighbor Reachability
Hints in TCP module to prevent the potentially costly process of
frequent neighbor searches in the table, enhancing system performance.
Signed-off-by: Łukasz Duda <lukasz.duda@nordicsemi.no>
This commit introduces a new IPv6 API for positive reachability
confirmation, as specified in RFC 4861, Section 7.3.1. This feature aims
to enhance the effectiveness of the Neighbor Discovery mechanism, by
enabling upper-layer protocols to signal that the connection makes a
"forward progress".
The implementation within TCP serves as a reference. Compliance with
RFC 4861, especially Appendix E.1, was ensured by focusing on reliable
handshake and acknowledgment of new data transmissions.
Though initially integrated with TCP, the API is designed for broader
applicability. For example, it might be used by some UDP-based protocols
that can indicate two-way communication progress.
Signed-off-by: Łukasz Duda <lukasz.duda@nordicsemi.no>
Add a function callback that is called when the TCP connection
is closed. This is only available if doing network tests.
Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
The FIN timer was not set when we entered the FIN_WAIT_1 state.
This could cause issues if we did not receive proper packets
from peer. With this fix, the connection is always terminated
even if peer does not respond.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
If we are in a passive close state, then it is possible that
the ack we are waiting is lost or we do not accept the one peer
sent to us because of some earlier out of memory issue.
So install a timer (using by default the FIN timer value) to
close the connection if the last ack is not received on time.
Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
Calling the registered receive callback when releasing TCP context
doesn't make sense, as at that point the application should've already
closed the associated socket (that's one of the conditions for the
context to be released). Therefore, remove the pointless receive
callback call, while keeping the loop to unref any leftover data packets
(although again, I don' think there should be any packets left at that
point, as they're all consumed in tcp_in()).
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
While improving thread safety of the TCP stack I've introduced a
possible deadlock scenario, when calling tcp_conn_close() in tcp_in().
This function shall not be called with connection mutex locked, as it
calls registered recv callback internally, which could lead to deadlock
between TCP/socket mutexes.
This commit moves the tcp_conn_close() back where it was originally
called. I've verified that the thread safety is still solid with the
test apps used originally.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
When a TCP connection is established, if there is no data exchange
between the two parties within the set time, the side that enables
TCP Keep-alive will send a TCP probe packet with the same sequence
number as the previous TCP packet. This TCP probe packet is an empty
ACK packet (the specification recommends that it should not contain
any data, but can also contain 1 nonsense byte, such as 0x00.). If
there is no response from the other side after several consecutive
probe packets are sent, it is determined that the tcp connection has
failed, and the connection is closed.
The keep-alive default parameters are aligned with Linux defaults.
Signed-off-by: Horse Ma <mawei@coltsmart.com>
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Eliminate race between TCP input thread and TCP work queue, when
dereferencing connection. This normally would not manifest itself during
standard TCP operation, but could be a potential opening for abuse, when
the already closed TCP connection is kept being spammed with packets.
The test scenario involved sending multiple TCP RST packets as a
response to establishing the connection, which could result in system
crash. The following changes in the TCP stack made it stable in such
scenario:
1. Use `tcp_lock` when searching for active connections, to avoid
potential data corruption when connection is being removed when
iterating.
2. Avoid memset() during connection dereference, not to destroy mutex
associated with the connection. The connection context is only
cleared during allocation now.
3. Lock the connection mutex while releasing connection.
4. In tcp_in(), after locking the mutex, verify the connection state,
and quit early if the connection has already been dereferenced.
5. When closing connection from the TCP stack as a result of RST or
malformed packet, verify connection state to make sure it's only done
once, even if multiple RST packets were received.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Fix the possible race between TCP work items already scheduled for
execution, and tcp_conn_unref(), by moving the actual TCP context
releasing to the workqueue itself. That way we can be certain, that when
the work items are cancelled, they won't execute. It could be the case,
that the work item was already being processed by the work queue, so
clearing the context could lead to a crash.
Remove the comments around the mutex lock in the work handlers regarding
the race, as it's not the case anymore. I've kept the locks however, as
they do make sense in those places.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Otherwise, if the application was for example blocked on poll() pending
POLLOUT, it won't be notified.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Rework how data is queued for the TCP connections:
* net_context no longer allocates net_pkt for TCP connections. This
was not only inefficient (net_context has no knowledge of the TX
window size), but also error-prone in certain configuration (for
example when IP fragmentation was enabled, net_context may attempt
to allocate enormous packet, instead of let the data be fragmented
for the TCP stream.
* Instead, implement already defined `net_tcp_queue()` API, which
takes raw buffer and length. This allows to take TX window into
account and also better manage the allocated net_buf's (like for
example avoid allocation if there's still room in the buffer). In
result, the TCP stack will not only no longer exceed the TX window,
but also prevent empty gaps in allocated net_buf's, which should
lead to less out-of-mem issues with the stack.
* As net_pkt-based `net_tcp_queue_data()` is no longer in use, it was
removed.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
In case of reassembled IP packets, we cannot rely on checksum
offloading as the drivers/HW has no means to verify L4 checksum before
the fragment is reassembled. Therefore, for such packets, verify L4
checksum in the software unconditionally.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Modify internal L4 protocols APIs, to allow to enforce checksum
calculation, regardless of the checksum HW offloading capability.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Fix tcp.c compilation if user unsets
CONFIG_NET_TCP_CONGESTION_AVOIDANCE config option.
Fixes#64824
Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
If we try to connect to a port which no socket is listening to,
we will get a packet with "ACK | RST" flags set. In this case
the errno should be ECONNREFUSED instead of ETIMEDOUT like we
used to return earlier.
Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
Increase reference count already when initial SYN is sent.
This way the tcp pointer in net_context is fully valid for
the duration of the connection.
Fixes#63952
Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
When a new incoming connection is accepted, do not overwrite
the listening remote address. The remote address for the
listening socket is the any address (like :: for IPv6)
and not the accepted socket remote address.
This only affects the "net conn" shell command which prints
the remote address incorrectly. There is no problem accepting
new connections in this case.
Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
The socket address passed in accept() call should point
to the new connection address and not the old one.
Fortunately this only affects things after the v4-mapping-to-v6
support so older code than this works fine.
Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
The address length of the accepted socket must reflect the
size of the address struct, so it should either be
sizeof(struct sockaddr_in) for IPv4 or sizeof(struct sockaddr_in6) for
IPv6 socket.
Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
This allows IPv4 and IPv6 share the same port space.
User can still control the behavior of the v4-mapping-to-v6
by using the IPV6_V6ONLY socket option at runtime.
Currently the IPv4 mapping to IPv6 is turned off by
default, and also the IPV6_V6ONLY is true by default which
means that IPv4 and IPv6 do not share the port space.
Only way to use v4-mapping-to-v6 is to enable the Kconfig
option and turn off the v6only socket option.
Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
Send RST as a reply for unexpected TCP packets in the following
scenarios:
1) Unexpected ACK value received during handshake (connection still open
on the peer side),
2) Unexpected data packet on a listening port (accepted connection
closed),
3) SYN received on a closed port.
This allows the other end to detect that the connection is no longer
valid (for example due to reboot) and release the resources.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Add a helper function which allows to send a RST packet in response to
an unexpected TCP packet, w/o associated connection or net context.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
rand32.h does not make much sense, since the random subsystem
provides more APIs than just getting a random 32 bits value.
Rename it to random.h and get consistently with other
subsystems.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
Fixing kernel crash caused by memory release
while having a scheduled work item pending.
Signed-off-by: Jeroen van Dooren <jeroen.van.dooren@nobleo.nl>
This commit adds support for the SO_REUSEADDR option to be enabled for
a socket using setsockopt(). With this option, it is possible to bind
multiple sockets to the same local IP address / port combination, when
one of the IP address is unspecified (ANY_ADDR).
The implementation strictly follows the BSD implementation and tries to
follow the Linux implementation as close as possible. However, there is
one limitation: for client sockets, the Linux implementation of
SO_REUSEADDR behaves exactly like the one for SO_REUSEPORT and enables
multiple sockets to have exactly the same specific IP address / port
combination. This behavior is not possible with this implementation, as
there is no trivial way to identify a socket to be a client socket
during the bind() call. For this behavior, one has to use the
SO_REUSEPORT option in Zephyr.
There is also a new Kconfig to control this feature similar to other
socket options: CONFIG_NET_CONTEXT_REUSEADDR. This option is enabled by
default if TCP or UDP are enabled. However, it can still be disabled
explicitly.
Signed-off-by: Tobias Frauenschläger <t.frauenschlaeger@me.com>
In case RST packet is received or malformed packet is received, the TCP
should not proceed with the state machine execution (which may process
the invalid packet) but rather jump directly to exit, where the
connection will be closed.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>
Modify the signature of the k_mem_slab_free() function with a new one,
replacing the old void **mem with void *mem as a parameter.
The following function:
void k_mem_slab_free(struct k_mem_slab *slab, void **mem);
has the wrong signature. mem is only used as a regular pointer, so there
is no need to use a double-pointer. The correct signature should be:
void k_mem_slab_free(struct k_mem_slab *slab, void *mem);
The issue with the current signature, although functional, is that it is
extremely confusing. I myself, a veteran Zephyr developer, was confused
by this parameter when looking at it recently.
All in-tree uses of the function have been adapted.
Fixes#61888.
Signed-off-by: Carles Cufi <carles.cufi@nordicsemi.no>
There was a corner case which was not handled well in a scenario, when
listening socket was closed during an active handshake with a new
client.
When a listening socket is closed, the accept callback is cleared on the
TCP context. If this happened during a handshake with a new client, i.
e. before final ACK from the client was processed, this lead to a
context leak, as application did not take ownership of the connection
(i. e. had no means to close it).
Fix this, by proactively closing the connection at the TCP level when no
accept_cb is available. Instead of ignoring the fact that no accept_cb
is available, the TCP stack will now enter TCP_FIN_WAIT_1 state and
proceed with a graceful teardown of the connection.
Signed-off-by: Robert Lubos <robert.lubos@nordicsemi.no>