This is the mail archive of the ecos-devel@sourceware.org mailing list for the eCos project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: NAND technical review

From: Jonathan Larmour <jifl at jifvik dot org>
To: Rutger Hofman <rutger at cs dot vu dot nl>
Cc: Ross Younger <wry at ecoscentric dot com>, eCos developers <ecos-devel at ecos dot sourceware dot org>
Date: Thu, 15 Oct 2009 04:49:18 +0100
Subject: Re: NAND technical review
References: <4AC6218C.20407@jifvik.org> <4ACB4B58.2040804@ecoscentric.com> <4ACC0722.9020601@jifvik.org> <4ACCC13F.40009@cs.vu.nl>

[ Sorry for getting back to this late - I wanted to continue with Ross before he went on holiday ]

Rutger Hofman wrote:

Jonathan Larmour wrote:

A device number does seem to be a bit limiting, and less deterministic. OTOH, a textual name arguably adds a little extra complexity.

This will be straightforward to change either way.

Noted, thanks.

I note Rutger's layer needs an explicit init call, whereas yours DTRT using a constructor, which is good.

I followed flash v2 in this. If the experts think a constructor is better, that's easy to change too.

Flash v2 doesn't use a constructor for legacy reasons and only because of some last minute discussions before the v3 release which couldn't reach a conclusion about constructor priority, given things like SPI flash. cyg_flash_init() is going to be properly eliminated in due course.

These issues don't really affect your layer so much as you don't have any legacy burden, so moving straight to a constructor is better.

Does your implementation _require_ a BBT in its current implementation? For simpler NAND usage, it may be overkill e.g. an application where the number of rewrites is very small, so the factory bad markers may be considered sufficient.

This is a bit hairy in my opinion, and one reason is that there is no Standard Layout for the spare areas. One case where a BBT is forced: my BlackFin NFC can be used to boot from NAND, but it enforces a spare layout that is incompatible with MTD or anybody. It is even incompatible with most chips' specification that the first byte of spare in the first page of the block is the Bad Block Marker. BlackFin's boot layout uses this first byte in a way that suits it, and it may be 0 -- which would otherwise mean Bad Block.

I infer that your layer can cope with that? I didn't see the handling for that in io_nand_chip_bad_block.c.

Is your BBT compatible with Linux MTD? Including your use of a mirror?

Also, what to do if a block grows bad during usage, and that block doesn't allow writing a marker in its spare area? BBT seems a solution.

Well I was making the explicit assumption that it wasn't rewritten very often in the lifetime of the device. Think of things like in-field firmware upgrades.

(b) Dynamic memory allocation
R's layer mandates the provision of malloc and free, or compatible
functions. These must be provided to the cyg_nand_init() call.
That's unfortunate - that limits its use in smaller boot loaders - a key application.
Well, it is certainly possible to calculate statically how much space R's NAND layer is going to use, to allocate that statically, and write a tiny function to hand it out piecemeal at the NAND layer's request.

If you know what it's going to be (at most), it could just be allocated statically and just used directly surely? That's got the lowest overheads.

E's implementation had a good idea of a CDL variable for the maximum supported block size. Then individual HALs or driver packages can use a CDL 'requires' to ensure it's >= the block size of the chips really in use.

E's doesn't; instead it declares a small number of static buffers.
I assume everything is keyed off CYGNUM_NAND_PAGEBUFFER, and there are no other variables. Again I'm thinking of the scenario of single firmware - different board revs. Can you confirm?

Andrew Lunn opined on 6/3/09 that R's requirement for malloc is not a major issue because the memory needs of that layer are well-bounded; I think I broadly agree, though the situation is not ideal in that it forces somebody who wants to use a lean, mean eCos configuration to work around.

The overhead of including something like malloc/free in the image may compare badly with the amount of memory R's needs to allocate in the first place. I also note that if R's implementation has program verifies enabled it allocates and frees a page _every_ time. If nothing else this could lead to heap fragmentation.

Program verifies should be considered a very deep debugging trait.

I'm not sure about that. Experience with NOR Flash has shown that despite promises of error reporting in the datasheets, sometimes the only way to be sure of data integrity is an explicit verify step. It's up to the user, but I would consider it to have more use than just for debugging a driver.

Still, another possible implementation for this page buffer would be on the stack (not!), or in the controller struct. That would grow then by 8KB + spare.

Or a single one for all chips maybe (since chances of clashes seem pretty small, so just protected with a mutex). And only if the program verify option is enabled of course. As per above, the page buffer size could be derived from the configuration, with appropriate CDL.

[snip]

- R's model shares the command sequence logic amongst all chips, differentiating only between small- and large-page devices. (I do not know whether this is correct for all current chips, though going forwards seems less likely to be an issue as fully-ONFI-compliant chips become the norm.)

Hmm. Nevertheless, this is a concern for me with R's. I'm concerned it may be too prescriptive to be robustly future-proof.

Well, there is no way I can see into the future, but I definitely think that the wire command model for NAND chips is going to stay -- it is in ONFI, after all. Besides, all except the 1 or 2 most pioneering museum NAND chips use it too.

I don't entirely disagree. But people do have a habit of inventing new things, particularly if it allows them to differentiate their products from their competitors.

There are chips that use a different interface, like SSD or MMC or OneNand, but then these chips come with on-chip bad block management, wear leveling of some kind, and are completely different in the way they must be handled. I'd say E's and R's implementations are concerned only with 'raw' NAND chips.
One could say that makes it a more realistic emulation. But yes I can see disadvantages with a somewhat rigid world view. Thinking out loud, I wonder if Rutger's layer could work with something like Samsung OneNAND.

See my comment above. The datasheet on e.g. KFM{2,4}G16Q2A says: "MuxOneNAND™‚ is a monolithic integrated circuit with a NAND Flash array using a NOR Flash interface."

OneNAND isn't like SSD or MMC which essentially provide a block interface and an advanced controller hiding the details of NAND. It isn't like NOR flash because you can't address the entire array - as shown by the fact it only has a 16-bit address bus. Instead with OneNAND you get an SRAM buffer as a "window" into the NAND array. There are commands to load data from NAND pages into the SRAM buffers, or write them back. It has onboard ECC logic, but it has a very different way of controlling the NAND. You do get access to both data and spare areas too.

You can consider this the sort of thing I mean when I say that manufacturers can come up with interesting things which break rigid assumptions of how you talk to NAND chips. So my concern is not (just) that your layer can't support OneNAND, but it couldn't support anything which also had a different interface.

Obviously you already support small versus large page, which require different protocols, but they are still relatively similar in how they're controlled. Would it even be possible to sensibly extend your generic layer to support something like OneNAND? Without having a large number of kludges?

I would certainly appreciate feedback from anyone who has used R's layer. What you say would seem to imply that both small page and OFNI are untested in R's layer.

That is correct. I would love some small-page testing. I have seen no ONFI chips on the market yet, so testing will be future work for both E and R.

Ross said that the Samsung K9 is pretty similar to ONFI, other than how you read the device ID etc. Is your layer equally close?

Thanks,

Jifl
--
--["No sense being pessimistic, it wouldn't work anyway"]-- Opinions==mine

Follow-Ups:
- Re: NAND technical review
  - From: Rutger Hofman
- Re: NAND technical review
  - From: Rutger Hofman

References:
- NAND technical review
  - From: Jonathan Larmour
- Re: NAND technical review
  - From: Ross Younger
- Re: NAND technical review
  - From: Jonathan Larmour
- Re: NAND technical review
  - From: Rutger Hofman

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]