This is the mail archive of the ecos-devel@sources.redhat.com mailing list for the eCos project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: AW: contributing a failsafe update meachanism for FIS from within ecos applications

From: Andrew Lunn <andrew at lunn dot ch>
To: "Neundorf, Alexander" <Alexander dot Neundorf at jenoptik dot com>
Cc: Andrew Lunn <andrew at lunn dot ch>, Slawek <sgp at telsatgp dot com dot pl>,ecos-devel at sources dot redhat dot com
Date: Thu, 28 Oct 2004 15:58:02 +0200
Subject: Re: AW: contributing a failsafe update meachanism for FIS from within ecos applications
References: <5A8A17126B73AC4C83968F6C4505E3C5013D1932@JO-EX01.JENOPTIK.NET>

On Thu, Oct 28, 2004 at 02:43:16PM +0200, Neundorf, Alexander wrote:
> 
> > Von: Andrew Lunn [mailto:andrew@lunn.ch]
> > 
> > On Thu, Oct 28, 2004 at 01:32:46PM +0200, Neundorf, Alexander wrote:
> > > For this I'd like to add a function which simply returns a pointer
> > > to the current valid fis table. Then this can be iterated (and
> > > modified) in "userspace".
> > 
> > I don't like that. You are spreading around the knowledge of what the
> > FIS image looks like, how you operate on it etc. Thats bad engineering
> > practice. There currently exists an abstraction on top of this which
> > does most of what you need. Just extend the abstraction a little.
> 
> You mean hal_if.c flash_fis_op() ?

> Instead of returning struct fis_image_desc it returns the members of
> a fis_image_desc one by one. So the app still has to know which
> members a fis_image_desc has.

It however does not need to know where in the structure they are, what
other member of the structure are, etc. By passing the fis_image_desc
structure back and forth you are tying redboot and the application
together. They both need to have the same definition of the structure.
Maybe somebody adds a new member to the structure, recompiles redboot
and installs it. You application is now broken since it does not know
about this new member and it access the wrong things withing the
structure. If however you just use the VV get functions everything
works fine, because redboot knows how to get what you need from the
structure. The implementation changes, the abstract interface stays
the same. Classic object orianatated approach.

> This way one would not have to call 8 functions to collect all
> members for one entry.

In practice you actually only need two. The base and the length. The
other members are of no use to a filesystem.

> > I guess your createImage() call accepts the complete image. Many
> > systems don't have enought RAM to be able to hold the complete image
> > when doing an upgrade, but they do have enought RAM to hold one flash
> > block. What they want to do is download the new image from a server a
> > packet at a time and pass it to the filesystem to write to the
> > flash. The filesystem caches the writes until it has a complete flash
> > block and then writes it to flash.
> > 
> > How do you handle this case with your createImage() call?  
> 
> Currently not at all and for our solution we definitely won't put
> any filesystem cache/logic in between (I already mentioned earlier
> ;-) Anyway, if the createImage() function doesn't come from redboot
> (as you suggest if I understand correctly), there's nothing which
> hinders implementing writing block by block.

OK. Maybe a terminology problem here. What does your createImage() do?
Does it create a new entry in the fis directory? Does it write a new
image into flash at the location the fis entry says belongs to this
image?

For me these are two different things. If there was a filesystem
creat() call it would do the first. A number of write()s would do the
second. 

I also want to make sure that the design you propose is flexiable
enough to support other peoples needs. So it seems you have enough
memory to hold a complete image, but i want to ensure the same design
can do multiple writes in a clean way using the same API. I would also
like it to work without actually needed the redundant FIS block. Not
everybody is so paranoid about power failure, but would like to be
able to upgrade there application from within the application.

> So how about this:
> erase:
> app to redboot: I want to erase foo
> redboot: erases foo from the fis table and marks it as in progress
> app: erases foo contents from the flash
> app to redboot: I'm done with it
> redboot: mark it valid

I don't see why the application has to be involved in the erase. All
you need is that the removing of the FIS entry is atomic with respect
to a power cut. Redboot can do that.

> create:
> app to redboot: I want to create "foo" at this and that address and that's the crc (all parameters preferably transferred in one function call, be it a fis_image_desc or something else)
> redboot: creates foo in the fis table and marks it as in progress
> app: writes foo contents on the flash, all at once or block by block
> app to redboot: I'm done with it
> redboot: mark it valid

You are again breaking the abstract. You are doing the CRC creation in
the application where as it should be redboot doing it. 

Let me describe how i see it working. I will talk in terms of
filesystem operations. So when i say open() i mean exactly what
man 2 open says.

Assumption 1. All the needed FIS entries exist. 
Assumption 2. Your boot script is:
fis load app
go
fis load app.bak
go

When redboot loads app it wil check the crc. If the crc fails, the go
will refuse to start the application and instead fall through to the
next command. So it then loads app.bak and starts that. The current
redboot does all this already. No changes needed here.

open(/foo) does two VV call to get the start and length of the image
in flash and allocates the block cache.

write() would copy the data into the block cache. If this fills the
block cache it simply erases and then writes. As soon as the erase
starts, the CRC is wrong. So in terms of redboot, this image is now
corrupt. 

close() flushes the block cache. It then does VV calls to ask redboot
to recalculate the CRC and put it into the in memory copy of the FIS
directory. It then calls a VV function to commit the FIS directory.
Redboot does an atomic write, with respect to power failure, of the
FIS directory using the valid fields in the redundant FIS blocks etc.

So how do you do a safe upgrade of the application:

open("app");
write();         CRC is now wrong, so app.bak would be booted. 
write();         CRC is now wrong, so app.bak would be booted. 
write();         CRC is now wrong, so app.bak would be booted. 
close();         CRC is now valid, so the new image would be booted. 
open("app.bak");   
write();         CRC is now wrong, but it does not matter, app is valid
write();         CRC is now wrong, but it does not matter, app is valid
write();         CRC is now wrong, but it does not matter, app is valid
close();         CRC is now valid and we have two identical apps.

So you need 4 VV calls, or which 2 already exist. 

        Andrew

References:
- AW: AW: contributing a failsafe update meachanism for FIS from within ecos applications
  - From: Neundorf, Alexander

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]