This is the mail archive of the ecos-devel@sourceware.org mailing list for the eCos project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: NAND technical review

From: Ross Younger <wry at ecoscentric dot com>
To: Jonathan Larmour <jifl at jifvik dot org>
Cc: Rutger Hofman <rutger at cs dot vu dot nl>, Jürgen Lambrecht <J dot Lambrecht at televic dot com>, eCos developers <ecos-devel at ecos dot sourceware dot org>, Deroo Stijn <S dot Deroo at televic dot com>
Date: Tue, 20 Oct 2009 11:17:43 +0100
Subject: Re: NAND technical review
References: <4ACB4B58.2040804@ecoscentric.com> <4ACC61F0.3020303@televic.com> <4AD3E92E.5020301@jifvik.org> <4AD47ADE.9010606@cs.vu.nl> <4AD6A7EC.8080703@jifvik.org> <4ADC452B.5040706@ecoscentric.com> <4ADD14E1.3050702@jifvik.org>

Jonathan Larmour wrote:
> To double check, you mean reading was slowest, programming was faster
> and erasing was fastest, even apparently faster than what may be the
> theoretical fastest time? (I use the term "fast" advisedly, mark).
> 
> Are you sure there isn't a problem with your driver to cause such
> figures? :-)

Those are the raw numbers. Yes, I agree that they don't appear to make
sense. As I said, profiling - which will include figuring out what's going
on here - is languishing on the todo list ...

> I wonder if Rutger has the ability to compare with his YAFFS throughput.
> OTOH, as you say, the controller plays a large part, and there's no
> common ground with R so it's entirely possible no comparison can be fair
> for either implementation.

The YAFFS benchmarking is done by our yaffs5 test, which IIRC goes only
through fileio so ought to be trivially portable. It doesn't appear in my
last drop on the bz ticket, but will when I get round to freshening it.

>> After I taught the library to use h/w
>> ECC I immediately saw a 46% speedup on reads and 38% on writes when
>> compared with software ECC [...]
> 
> Just to be sure, are the differences measured by these percentages
> purely in terms of overall data throughput per time?

These are from my raw NAND benchmarks (tests/rwbenchmark.c) which measure
the end-to-end time taken for a whole cyg_nand_page_read() / write /
block_erase call to return.

> I'm very interested in the fact that software changes you made, had such
> a relatively large change to the performance. 

> [hardware ECC]
> Hence my surprise at E not having support, even in principle, before!
> But clearly you're at the stage where stuff is nearly working. 

I was surprised too; but then I had been operating under the general mantra
of "first make it work, then make it work fast" and the speed work is still
in progress ...

To be clear: hwecc _is_ working well, on this customer port, and getting it
going on the STM3210E is on the cards so I have something I can usefully
share publicly.

> Just as an aside, you may find that improving eCos more generally to
> have e.g. assembler optimised implementation of memcpy/memmove/memset
> (and possibly others) may improve performance of these and other things
> across the board. GCC's intrinsics can only do so much. (FAOD actual
> implementations to use (at least to start with) can be found in newlib.

The speedups in my NAND driver on this board came from a straightforward
Duff's device 8-way-unroll of what had been HAL_{READ,WRITE}_UINT8_VECTOR;
16-way and 32-way unrolls seemed to add a smidgen more performance but
increased code size perhaps disproportionately. (Using the existing VECTOR
macro but with -funroll-loops gave a similar speed-up but more noticeable
code bloat across the board.)

The word copies in newlib's memcpy et al look like they would boost
performance generally, but I have attempted to avoid copying data around as
far as possible in my layer. I don't see them as helping at all with NAND
device access: you have to make a sequence of 8-bit or 16-bit writes to the
MMIO register, and that's that. This is pretty much the same situation as
Tom Duff found himself in ...

To try and fit with the eCos philosophy, I've left the localised unroll as a
CDL option in this driver, defaulting to off. I expect similar unrolls would
be profitable in other NAND drivers, but a more generalised solution might
be preferable: something like HAL_READ_UINT8_VECTOR_UNROLL, with options to
configure whether and how far it was unrolled?

Ross

-- 
Embedded Software Engineer, eCosCentric Limited.
Barnwell House, Barnwell Drive, Cambridge CB5 8UU, UK.
Registered in England no. 4422071.                  www.ecoscentric.com

Follow-Ups:
- Re: NAND technical review
  - From: Jonathan Larmour

References:
- Re: NAND technical review
  - From: Ross Younger
- Re: Re: NAND technical review
  - From: Jürgen Lambrecht
- Re: NAND technical review
  - From: Jonathan Larmour
- Re: NAND technical review
  - From: Rutger Hofman
- Re: NAND technical review
  - From: Jonathan Larmour
- Re: NAND technical review
  - From: Ross Younger
- Re: NAND technical review
  - From: Jonathan Larmour

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]