This is the mail archive of the mailing list for the eCos project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: NAND technical review

Ross Younger wrote:
Jonathan Larmour wrote:

I wonder if Ross has any performance data for E he could contribute?

I have done a little benchmarking and so have _some_ numbers to hand, but the goalposts are moving and my figures are a bit old and must be treated with caution...

On the EA LPC2468 board (Samsung K9 NAND chip), with the state of my code on
July 8, compiling with -O2 and asserts off, my NAND benchmarker reported
average page read times[*] of 3578us per page, programming 2680us, and
erasing 1848us. These stack up against the fastest-possible raw chip times
(which I computed from the "typical" times on the datasheet) of 88.5, 363.5
and 2000us.

To double check, you mean reading was slowest, programming was faster and erasing was fastest, even apparently faster than what may be the theoretical fastest time? (I use the term "fast" advisedly, mark).

Are you sure there isn't a problem with your driver to cause such figures? :-)

This led to a YAFFS throughput data rate, on a recently-erased NAND array,
of up to 480kB/s in reading and 578kB/s in writing. (Actual rates vary
depending on the size of chunk you pass to read() and write().)

I wonder if Rutger has the ability to compare with his YAFFS throughput. OTOH, as you say, the controller plays a large part, and there's no common ground with R so it's entirely possible no comparison can be fair for either implementation.

The board is based on the Samsung S3C2410X ucontroller and carries the same
Samsung K9 NAND chip as on the EA LPC2468. Now, this CPU has a dedicated
NAND controller with hardware ECC... After I taught the library to use h/w
ECC I immediately saw a 46% speedup on reads and 38% on writes when compared
with software ECC. I've also added an option to do a partial loop unroll in
the read and write cycles which gives a further 4% boost on reads and 15% on

Just to be sure, are the differences measured by these percentages purely in terms of overall data throughput per time?

I'm very interested in the fact that software changes you made, had such a relatively large change to the performance. If that's true, this seems to go against the possibility that waiting for hardware (the NAND chip) may have figured as the dominating component of the time (which would mean the software components of the overall time are lost in the noise). Instead the software latency required in setting up the next operation can be noticeable - which was my concern with R in my mail of 2009-10-15 which you're replying to.

The current (work-in-progress) numbers I have from the benchmarker
are 452us per page read, 623us per write and 1934us per erase; YAFFS
throughput is similarly impressive at 4690 kB/s in reads and 3432 kB/s in
writes. (Charles Manning has stated publicly several times that if you want
YAFFS to be fast, you should start by looking at the speed of your NAND driver.)

Hmm, as opposed to what though? YAFFS itself isn't able to change much.

Of course, we're not comparing apples with apples here; the S3C2410X is an
ARM9 whose CPU clock runs at 200MHz, but the EA LPC2468 is an ARM7TDMI
running at just 48MHz, but even so the speed-up given by hardware ECC
demonstrates that option to be a no-brainer.

Hence my surprise at E not having support, even in principle, before! But clearly you're at the stage where stuff is nearly working. I look forward to a code drop, as the APIs would benefit from comparison with R's. It looks like R has considered a variety of interesting ECC hardware so it would be interesting to see if E's could cope.

BTW: Some profiling and souping up is on my todo list, and some more
benchmarking will probably happen at that time. When I implement hardware
ECC support on the STM3210E I intend to produce some before and after numbers.

Just as an aside, you may find that improving eCos more generally to have e.g. assembler optimised implementation of memcpy/memmove/memset (and possibly others) may improve performance of these and other things across the board. GCC's intrinsics can only do so much. (FAOD actual implementations to use (at least to start with) can be found in newlib.

--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]