This is the mail archive of the ecos-discuss@sourceware.org mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: pbuf_alloc failures with LwIP


Hi Elad,

Hmm, I've had a quick look at the pbuf management in eCos 3.0. It's quite different from the CVS version, so I'm not that familiar with it.

Nonetheless, I'm surprised by the PBUF statistics:

  PBUF - "each pbuf is 1024 bytes"
          avail: 30
          used: 1
          max: 11
          err: 2
          alloc_locked: 0
          refresh_locked: 0

There's something wrong here. Considering that "alloc_locked = 0", the only way for "err" to be incremented is if you run out of pbufs. However, the sign that you have run out of pbufs is that "max" equals "avail". Yet, in your case, max = 11, while avail = 30. So you didn't run out of pbufs, you only used 11 out of 30.

Digging a bit more, it appears that "err" in increased when pbuf_pool_alloc() returns NULL. This happens when the linked-list of available pbufs is empty.

So, how come the linked-list of available pbufs is empty when max = 11? In my opinion, the linked-list of available pbufs is corrupt or truncated.

Are you sure that you're respecting the thread-safe requirements of lwIP? Are you using multiple threads? If so, make sure that the SYS_ARCH_PROTECT macro (in lwip/sys.h) is defined to do something useful, rather than being an empty definition.

Regards,

Michael.

On 14/06/2012 06:43, Elad Yosef wrote:
Hi Michael,
Thanks for the detailed reply.

I think I have exactly the same problem that you have - the networking
stops working.

I got the LwIP stats after the networking stopped working, see



LINK
         xmit: 0
         rexmit: 0
         recv: 0
         fw: 0
         drop: 0
         chkerr: 0
         lenerr: 0
         memerr: 0
         rterr: 0
         proterr: 0
         opterr: 0
         err: 0
         cachehit: 0

IP_FRAG
         xmit: 0
         rexmit: 0
         recv: 0
         fw: 0
         drop: 0
         chkerr: 0
         lenerr: 0
         memerr: 0
         rterr: 0
         proterr: 0
         opterr: 0
         err: 0
         cachehit: 0

IP
         xmit: 17643
         rexmit: 0
         recv: 63100
         fw: 0
         drop: 0
         chkerr: 0
         lenerr: 0
         memerr: 0
         rterr: 0
         proterr: 0
         opterr: 0
         err: 0
         cachehit: 0

ICMP
         xmit: 2775
         rexmit: 0
         recv: 2950
         fw: 0
         drop: 175
         chkerr: 0
         lenerr: 0
         memerr: 0
         rterr: 0
         proterr: 175
         opterr: 0
         err: 0
         cachehit: 0

UDP
         xmit: 4714
         rexmit: 0
         recv: 53209
         fw: 0
         drop: 0
         chkerr: 0
         lenerr: 0
         memerr: 0
         rterr: 0
         proterr: 0
         opterr: 0
         err: 0
         cachehit: 0

TCP
         xmit: 6715
         rexmit: 0
         recv: 6941
         fw: 0
         drop: 0
         chkerr: 0
         lenerr: 0
         memerr: 2705
         rterr: 0
         proterr: 0
         opterr: 0
         err: 0
         cachehit: 0

PBUF - "each pbuf is 1024 bytes"
         avail: 30
         used: 1
         max: 11
         err: 2
         alloc_locked: 0
         refresh_locked: 0

  MEM HEAP
         avail: 1024
         used: 0
         max: 720
         err: 0

  MEM PBUF
         avail: 8
         used: 0
         max: 2
         err: 0

  MEM RAW_PCB
         avail: 4
         used: 0
         max: 0
         err: 0

  MEM UDP_PCB
         avail: 3
         used: 3
         max: 3
         err: 0

  MEM TCP_PCB
         avail: 16
         used: 0
         max: 8
         err: 0

  MEM TCP_PCB_LISTEN
         avail: 1
         used: 1
         max: 1
         err: 0

  MEM TCP_SEG
         avail: 6
         used: 0
         max: 4
         err: 0

  MEM NETBUF
         avail: 10
         used: 0
         max: 6
         err: 0

  MEM NETCONN
         avail: 12
         used: 4
         max: 7
         err: 0

  MEM API_MSG
         avail: 6
         used: 0
         max: 2
         err: 0

  MEM TCP_MSG
         avail: 12
         used: 0
         max: 7
         err: 0

  MEM TIMEOUT
         avail: 4
         used: 2
         max: 3
         err: 0


I would appreciate if can take a look


Elad


On Wed, Jun 13, 2012 at 6:47 PM, Michael O'Dowd <michael.odowd@kuantic.com> wrote:
Hi Elad,

I ran into a similar problem recently. I'm using a recent CVS checkout
rather than 3.0. Also, I'm probably not using the same ethernet HW, so I
don't know how well my reply corresponds to your case.

The eth_drv.c file is the glue between lwIP and the underlying ethernet
driver, so the issue that you are encountering may be specific to the
driver. In my case, when under stress, eth_drv.c generates the error
message: "cannot allocate pbuf to receive packet". Soon after that, the
ethernet driver stops receiving traffic permanently, but does not crash. In
your case, if I understand correctly, your system crashes.

The issue is that when eth_drv_recv() fails to allocate a pbuf, it returns
without calling the ethernet driver recv() function: (sc->funs->recv)(). In
my case, the driver requires that it's recv() function be called, in order
to complete the processing of the packet reception and to free up the
receive buffer(s). Failing to call it, apparently causes the receive path to
cease functioning (I'm still investigating the details). In your case, I
gather that it crashes the system.

Note: I'm running on an NXP 1788 (Cortex-M3), using the
"devs/arm/lpc2xxx/current/src/if_lpc2xxx.c" ethernet driver.

There are two aspects to this problem:

1) In my opinion, there is a bug in eth_drv_recv(). If there are no pbufs
available, then it should at least cause the received packet to be
discarded. Otherwise, the system may fail whenever there is a minor burst of
traffic on the network. It doesn't take much: there are only 16 pbufs
available by default. Whether or not the system fails, depends on how the
ethernet driver reacts to the failure to call it's recv() function. I hope
to fix this on my platform in the near future.

2) You should also keep an eye on your pbuf usage, just to make sure that
you don't have a pbuf memroy leak. You could also try to allocate more
pbufs, if you have the available memory.

If you are using the default lwip configuration, the pbuf memory allocation
is handled by memp.[hc]. It has a fixed number of pbufs available. The
default is 16 pbufs, and can be changed in the configtool under: [lwIP
networking stack/Memory options/Number of memp struct pbufs].

Alternatively, if you have lots of memory, you could enable the checkbox:
[lwIP networking stack/Memory options/Use malloc for pool allocations]. This
bypasses the memp pools and their static limitations. Though this will make
it harder to spot a pbuf memory leak. I haven't tried this personally.

Finally, (when using memp) the pbuf usage can be monitored with
lwip/stats.h. If you have access to a serial port, try calling
stats_display(). Here is a snippet of the pbuf related output:

  MEM PBUF_POOL
          avail: 16
          used: 0
          max: 3
          err: 0
The "err" counter increases when pbuf_alloc() fails.

Hope that helps,

Regards,

Michael O'Dowd
Kuantic SAS


On 12/06/2012 22:40, Elad Yosef wrote:
Hi all,
I'm using LwIP stack on my target and experiencing crashes under stress.

function eth_drv_recv) from ../io/eth/v3_0/ser/lwip/eth_drv.c
calls pbuf_alloc() and this allocation fails.

Is this result of some bad configuration?

Thanks
Elad


-- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]