Is ndo_start_xmit()
called (or could it be called) from atomic contexts? Please provide documentation references.
More specifically, is ndo_start_xmit()
forbidden from directly doing any of the following:
- reading and writing a TCP socket?
- locking a mutex or semaphore?
- sleeping?
- calling
kmalloc()
withoutGFP_ATOMIC
?
I did some simple tests, and at least the first 3 cases fail at least occasionally. Calls to ndo_start_xmit()
driven by network operations involving less queueing seem to fail more often. If I do the above operations indirectly through a work queue, then the failures seem to go away. All of this hints that maybe ndo_start_xmit()
is called from an atomic context... But there is no documentation to suggest that I need to use work queues. (And why would I throw away 250us of latency?)
Others have also noticed that ndo_start_xmit() behaves as if it is called in an atomic context. However, just Googling it turns up only these random threads, not any real documentation.
Furthermore, when searching the entire kernel source for comments, we hear only crickets:
% grep -r -A 10 -B 10 -e netdev_ops -e ndo_start_xmit . --include '*.[ch]' | grep -i atomic
./net/l2tp/l2tp_eth.c- stats->tx_bytes = (unsigned long) atomic_long_read(&priv->tx_bytes);
./net/l2tp/l2tp_eth.c- stats->tx_packets = (unsigned long) atomic_long_read(&priv->tx_packets);
./net/l2tp/l2tp_eth.c- stats->tx_dropped = (unsigned long) atomic_long_read(&priv->tx_dropped);
./net/l2tp/l2tp_eth.c- stats->rx_bytes = (unsigned long) atomic_long_read(&priv->rx_bytes);
./net/l2tp/l2tp_eth.c- stats->rx_packets = (unsigned long) atomic_long_read(&priv->rx_packets);
./net/l2tp/l2tp_eth.c- stats->rx_errors = (unsigned long) atomic_long_read(&priv->rx_errors);
./net/core/neighbour.c- atomic_dec(&neigh->tbl->entries);
./net/core/dev.c- smp_mb__after_atomic(); /* Commit netif_running(). */
./net/core/dev.c- storage->rx_dropped += (unsigned long)atomic_long_read(&dev->rx_dropped);
./net/core/rtnetlink.c- skb = nlmsg_new(bridge_nlmsg_size(), GFP_ATOMIC);
./net/core/rtnetlink.c- rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
./drivers/net/can/usb/ems_usb.c- atomic_set(&dev->active_tx_urbs, 0);
./drivers/net/can/usb/usb_8dev.c- atomic_set(&priv->active_tx_urbs, 0);
./drivers/net/can/usb/peak_usb/pcan_usb_core.c- atomic_set(&dev->active_tx_urbs, 0);
./drivers/net/usb/sierra_net.c- dev->net->dev_addr[ETH_ALEN-2] = atomic_inc_return(&iface_counter);
./drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c- if (!atomic_dec_and_test(&adapter->fcoe.refcnt))
./drivers/net/ethernet/toshiba/ps3_gelic_net.c- atomic_inc(&card->tx_timeout_task_counter);
./drivers/net/ethernet/toshiba/ps3_gelic_net.c- atomic_dec(&card->tx_timeout_task_counter);
./drivers/net/ethernet/toshiba/spider_net.c- atomic_dec(&card->tx_timeout_task_counter);
./drivers/net/thunderbolt.c- atomic_set(&net->command_id, 0);
./drivers/net/thunderbolt.c- atomic_set(&net->frame_id, 0);
./drivers/staging/octeon/ethernet.c- if (!atomic_read(&cvm_oct_poll_queue_stopping))
./drivers/staging/ks7010/ks_wlan_net.c- atomic_set(&update_phyinfo, 0);
./include/linux/netdevice.h- atomic_long_t rx_nohandler;
./include/linux/netdevice.h- atomic_t carrier_up_count;
./include/linux/netdevice.h- atomic_t carrier_down_count;
So one might reasonably conclude that the problem is not that ndo_start_xmit()
is called from an atomic context, but that I am using broken APIs for sockets, mutex, semaphore, and sleep. But before I start unintentionally rewriting the OS one API at a time, can we maybe find some documentation reference explaining that ndo_start_xmit()
is (or could be) called from an atomic context, and in what way?
If ndo_start_xmit()
is called from atomic contexts, then is this considered good software design? When is it an example for others to follow?
来源:https://stackoverflow.com/questions/57061626/can-ndo-start-xmit-be-called-from-atomic-contexts