This is the mail archive of the ecos-devel@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: AW: contributing a failsafe update meachanism for FIS from within ecos applications


> createImage() does: create a new entry, writes the data, marks the new entry 
> as valid.
> 
> It consists of the following steps:
> startUpdate  (redboot) - modify the fis table contents in RAM and flash, mark
>                          them in progress
> writeData    (app) - either all at once, or in flash block sized chunks
> finishUpdate (redboot) - mark the new fis table as valid in flash

So the first step maps to open()
The second step to write()
and the third set to close().


> > I also want to make sure that the design you propose is flexiable
> > enough to support other peoples needs. So it seems you have enough
> > memory to hold a complete image, but i want to ensure the same design
> > can do multiple writes in a clean way using the same API. I would also
> > like it to work without actually needed the redundant FIS block. Not
> > everybody is so paranoid about power failure, but would like to be
> > able to upgrade there application from within the application.
> 
> Well, paranoid...
> If it fails the device doesn't work anymore...
> 
> Without redundant fis: 
> startUpdate doesn't change the flash contents, the new fis table contents are 
> written in finishUpdate, so it will work too (except that power failure.... 
> well you know).

The point is the user gets the choice. They can have a totaly safe
system, or a system that works 99.9% of the time but needs one less
flash block.

> > You are again breaking the abstract. You are doing the CRC creation in
> > the application where as it should be redboot doing it.

> My main reason for this: I'd like to have the new fis table already
> completely correct on the flash except the valid_flag before the
> actual writing process starts, so that the final step really only
> has to set the valid_flag to valid.

I cannot think of a reason why this is actually needed? But maybe im
missing something.

> Apart from that, is it possible for redboot to calculate the crc if
> it doesn't have enough ram to hold the complete image while updating
> and if the application is responsible for the actual writing ?
> Which ram is actually available in a VV function ? (sorry for stupid
> questions)

In redboot side of the VV: Only the stack and any variables in
redboots BSS. But it does not need any RAM. The image is in flash so
it justs runs the CRC over that.

> [OT] why is crc32 used instead of the posix crc ?

Redboot came before posix crc. It also make little difference. crc32
is OK. Its the same one used on ethernet frames.
 
> ...
> > Assumption 1. All the needed FIS entries exist.
> > Assumption 2. Your boot script is:
> > fis load app
> > go
> > fis load app.bak
> > go
> 
> This second step is cool :-)
> 
> > open(/foo) does two VV call to get the start and length of the image
> > in flash and allocates the block cache.
> >
> > write() would copy the data into the block cache. If this fills the
> > block cache it simply erases and then writes. As soon as the erase
> > starts, the CRC is wrong. So in terms of redboot, this image is now
> > corrupt.
> >
> > close() flushes the block cache. 
> 
> Is this is all done in the application ?

Well this is the API between the application and the fisfs. I've
described the actions that fisfs does for each API call. open needs to
call a VV, but write can do all the work without calling redboot.
 
> > It then does VV calls to ask redboot
> > to recalculate the CRC and put it into the in memory copy of the FIS
> > directory. It then calls a VV function to commit the FIS directory.
> > Redboot does an atomic write, with respect to power failure, of the
> > FIS directory using the valid fields in the redundant FIS blocks etc.
> >
> > So how do you do a safe upgrade of the application:
> >
> > open("app");
> > write();         CRC is now wrong, so app.bak would be booted.
> > write();         CRC is now wrong, so app.bak would be booted.
> > write();         CRC is now wrong, so app.bak would be booted.
> > close();         CRC is now valid, so the new image would be booted.
> > open("app.bak");
> > write();         CRC is now wrong, but it does not matter, app is valid
> > write();         CRC is now wrong, but it does not matter, app is valid
> > write();         CRC is now wrong, but it does not matter, app is valid
> > close();         CRC is now valid and we have two identical apps.
> 

> I would prefer an obviously different API for the updating process
> since it is "dangerous" for the whole system.  With my createImage()
> which writes a complete image at once there is also ensured that
> there can be at most one corrupt image at a time. When splitting
> open, write and close there can be more than one corrupt
> image. open() for writing should check that there is no other file
> open.

That is not a problem. The filesystem can easily enforce this and
return EAGAN when open() is called when another file has been open'd
for writing.

        Andrew


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]