This is the mail archive of the ecos-bugs@sourceware.org mailing list for the eCos project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug 1001764] Enhancement of MMC/SD over SPI driver

From: bugzilla-daemon at bugs dot ecos dot sourceware dot org
To: ecos-bugs at ecos dot sourceware dot org
Date: Wed, 01 May 2013 15:10:35 +0000
Subject: [Bug 1001764] Enhancement of MMC/SD over SPI driver
Auto-submitted: auto-generated
References: <bug-1001764-13 at http dot bugs dot ecos dot sourceware dot org/>

Please do not reply to this email, use the link below.

http://bugs.ecos.sourceware.org/show_bug.cgi?id=1001764

--- Comment #19 from Mike Jones <mjones@linear.com> ---
I am having some bad behavior in the SPI code when using MMC patch with FXM
with a stable application I used prior to FXM. Let me be upfront and say that I
did find some bad application code that was writing over memory when processing
some va_args, and I can't rule out other application errors, but I have
constrained the application code and very carefully looked at potential memory
issues, even looking at every sprintf for issues with formatting. So error free
application code is only an assumption.

(If anyone has suggestions on how to string process with type safety in an eCos
application, please offer advice. I assume it would have to be some C++
library, possibly with templates. I don't know what is practical within the
cortex tool chain.)

The application uses I2C, and an SD card. It reads lots of data from I2C and
writes to the SD card, but never at the same time.

The problem is a struct pointing to the SPI dev ends up with an address of
0xFFFFFFFF and this causes an exception 5. This happens in the SPI code inside
one call. During this call, all other threads are blocked.

I will describe where this happens in code and hope someone has ideas on how to
debug it, or perhaps the author might see something that indicates a possible
driver bug.

By instrumenting the code as follows:

cyg_spi_transaction_transfer(cyg_spi_device* device, cyg_bool polled,
                             cyg_uint32 count, const cyg_uint8* tx_data,
cyg_uint8* rx_data,
                             cyg_bool drop_cs)
{
    cyg_spi_bus*    bus;
    CYG_CHECK_DATA_PTR(device, "valid SPI device pointer required");

    if(tx_data == 0xFFFFFFFF || rx_data == 0xFFFFFFFF || device == 0xFFFFFFFF)
        return;

    if (device->spi_bus == 0xFFFFFFFF)
        return; 

    bus = device->spi_bus;

    if(bus->spi_transaction_transfer == NULL || bus->spi_transaction_transfer
== 0xFFFFFFFF)
        return;

    CYG_CHECK_DATA_PTR(bus, "SPI device does not point at a valid bus
structure");
    CYG_ASSERT(bus->spi_current_device == device, "SPI transfer requested
without claiming the bus");
    CYG_CHECK_FUNC_PTR(bus->spi_transaction_transfer, "SPI device has not
provided a transfer function");
    (*(bus->spi_transaction_transfer))(device, polled, count, tx_data, rx_data,
drop_cs);
}

Eventually it will break on the first return and device will be 0xFFFFFFFF.

By looking at the call chain, the problem occurs:

mmc_spi_send_command_start(cyg_mmc_spi_disk_info_t* disk, cyg_uint32 command,
cyg_uint32 arg)
{
    cyg_spi_device* dev = disk->mmc_spi_dev;
    cyg_uint8       request[7];
    cyg_uint8       response[7];
    cyg_uint8       reply;
    int             i;

...

    reply       = response[6];
    for (i = 0; (i < MMC_SPI_COMMAND_RETRIES) && (0 != (reply & 0x0080)); i++)
{
        cyg_spi_transaction_transfer(dev, cyg_mmc_spi_polled, 1,
mmc_spi_ff_data, response, 0);
        reply = response[0];

in the retry loop. Before this in the code with ... is also a
cyg_spi_transaction_transfer call. The bad address NEVER occurs on the first
use of transfer, and ALWAYS occurs in the second call to transfer, but only
after running a long time.

If I examine variables in mmc_spi_send_command_start, the disk->mmc_spi_dev has
a good address, and dev does not. This implies that somewhere between the
assignment at the top and the retry loop it gets overwritten.

The underlying code is using the request and reply arrays, possibly with DMA. I
don't see any simple way to explain the messed up address, as the dev address
is lower than the arrays. All F's is also an unusual value, other than the 512
byte buffer for sending FFs to the device to clear it. But that is pretty much
static data, so it would require a copy to cause this problem.

Also, note that the I2C is reading on a 100ms interval from a counter/timer,
and then writing to SD. The application runs for several minutes before the
problem occurs. So the code is basically performing the same operation every
100ms and then after some time fails.

I created some test code to malloc all 16M of the memory and test with 0x55 and
0xAA, and a counter, etc, and I don't get any memory failures. I basically
malloc in 100K chunks, test, and then free at the end, and then do it again,
etc. So I believe the underlying memory system is ok.

Does anyone have ideas that might help narrow this down?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

References:
- [Bug 1001764] New: Enhancement of MMC/SD over SPI driver
  - From: bugzilla-daemon

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]