This is the mail archive of the
mailing list for the eCos project.
Re: Re: NAND technical review
- From: Jürgen Lambrecht <J dot Lambrecht at televic dot com>
- To: eCos developers <ecos-devel at ecos dot sourceware dot org>
- Date: Thu, 8 Oct 2009 10:16:03 +0200
- Subject: Re: Re: NAND technical review
- References: <4ACB4B58.email@example.com>
Just some explanatory remarks below, hardware related.
Ross Younger wrote:
1. NAND 101 -------------------------------------------------------------<snip>
(Those familiar with NAND chips can skip this section, but I appreciate
that not everybody on-list is in the business of writing NAND device
drivers :-) )
Such a "broken bit" is because the transistor that contains the bit is
physically broken, and is stuck at 1 or at 0 (I don't know if it can be
both). So you cannot anymore erase it (flip it back to 1) or program it
(flip to 0).
Now, I mentioned ECC data. NAND technology has a number of underlying
limitations, importantly that it has reliability issues. I don't have a full
picture - the manufacturers seem to be understandably coy - but my
understanding is that on each page, a driver ought to be able to cope with a
single bit having flipped either on programming or on reading. The
I thought only programming or erasing could break it, not reading?
Is somebody sure about this?
NAND flash chips are very dense chips (many bits on a small size) and
there is a trade-off in manufacturing between reliablility and density.
To make them dense (hence cheap) faults have to be tolerated.
recommended way to achieve this is by storing an ECC in the spare area: the
algorithm published by Samsung is popular, requiring 22 bits of ECC per 256
bytes of data and able to correct a 1 bit error and detect a 2 bit error.
There is also the question of bad blocks. Again, full details are sketchy. A
chip may be shipped with a number of "factory-bad" blocks (e.g. up to 20 on
this Samsung chip); they are marked as such in their spare area. (What
constitutes a "bad" block is not published; one imagines that the factory
have access to more test information than users do and that there may be
statistical techniques involved in judging the likely reliability of the
block.) Blocks may also fail during the life of the device, usually by the
The manufacturer just tries to program all bits a first time to check
for manufacturing errors. When a broken bit is discovered, the entire
block is marked bad.
chip reporting a failure during a program or erase operation. Because of<snip>
this, the manufacturers recommend that chip drivers scan the device for
factory-bad markers then create and maintain a Bad Block Table throughout
the life of of the device. How this is done is not prescribed, but the
behaviour of the Linux MTD layer is something approximating a de facto standard.
(iii) Electrical(below a hardware designer note :-)
Most, if not all, NAND chips have the same broad electrical interface.
There is a master Chip Enable line; nothing happens if this is not active.
Be carefull on this: a standard chip enable is only active during the
actual read or write. But an access to a NAND flash is a complete cycle
during which the NAND flash embedded control logic needs to keep its state!
Therefore, the Chip Enable (or Chip Select) of the NAND flash is (on my
ARM9 anyhow) connected to a GPIO pin (general-purpose input/output pin).
Therefore the SW has to assert this pin at the start of an access and
de-assert it at the end.
The read hardware Chip Select pin is not connected.
(In R's SW in the io/flash_nand/../controller: cyg_nand_ctl_chip_select,
that calls chip_select implemented in the board-specific driver in
Data flows into and out of the chip via its data bus, which is 8 or 16 bits
wide, mediated by Read Enable and Write Enable lines.
Commands and addresses are sent on the data bus, but routed to the
appropriate latches by asserting the Address Latch Enable or Command Latch
Enable lines at the same time.
There is also a ready/busy line which the driver can use to tell when an
operation is in progress. Typical operation times from the Samsung spec
sheet I have to hand are 25us for a page read, 300us for a page program, and
2ms for a block erase.
(iv) Board hook-up
Sometimes the ready/busy line isn't wired in or requires a jumper to be setWe started our driver this way
to route it. This can be worked around: for a read operation, one can just
insert a delay loop for the prescribed maximum time, while for programs and
erases, most (all?) chips have a "Read Status" command which can be used to
query whether the operation has completed.
It can be beneficial to be able to set up the ready/busy line as anTo speed up, now we poll the ready/busy. To use it as interrupt is still
interrupt source, as opposed to having to poll it. Whilst there is an
overhead involved in context-switching, if other application threads have
much to do it may be advantageous overall for the thread waiting for the
NAND to sleep until woken by interrupt.
Of course, it is possible to put multiple chips on a board. In that caseIndeed, this would be difficult: a NAND is not a simple memory mapped
device as a NOR flash or SRAM, easy to put in parallel.
there needs to be a way to route between them; I would expect this to be
done with the Chip Select line, addressed either by different MMIO addresses
or a separate GPIO or CPLD step. Theoretically, multiple chips could be
hooked up in parallel to give something that looks like a 16 or 32-bit
"wide" chip, but I have never encountered this in the NAND world, and it
would impose a certain extra level of complexity on the driver.
Only because of bad block management, putting them in parallel is
difficult: they cannot be put parallel in hardware, they need to be
addresses separately. Then they must be made parallel virtually in software.