This is the mail archive of the ecos-discuss@sourceware.org mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Deadlock scenario on close() in io/fileio


Good morning list,

I ran into a deadlock with io/fileio on invoking close(fd) in io/fileio/current/src/fd.cxx.

The problem arises because close() grabs fd_lock, then calls cyg_fd_free() which in turn calls the file system's fo_close(), which will flush any pending data; this flush invokes write() which will again try to grab fd_lock. Twice the same lock in one thread -> deadlock.

The stack trace on deadlock clearly shows this:

close()
  cyg_fd_free()
    lock(fdlock)       <======================================
    fd_close()
      fp_ucount_dec()
        cyg_yaffs_fo_close()
          yaffs_close()
            yaffs_FlushFile()
              yaffs_UpdateObjectHeader()
                readwritev()
                  cyg_fp_get()
                    lock(fdlock)  <===========================

The code of close() even has a comment in the call to cyg_fd_free() that points out that the file's fo_close may be called.

Now, this scenario can be circumvented by having close(fd) first of all call fsync(fd), which will enforce the flush before closing. When I inserted this fsync(fd) call, the deadlock disappeared.

But I think this is a patchwork solution. There is *no guarantee at all* which code fo_close() will choose to call. It might try to flush, to open a metanode, etc etc.

I think there are a few possible solutions:
1) a full-fledged continuation mechanism where all locks are released when a layer is left;
2) allow fdlock to be grabbed recursively, as in Java-style synchronized locking;
3) check if this is the only occurrence of this deadlock scenario, and check if the lock can be released in fp_ucount_dec without impairing atomicity


1) is a lot of work
2) sounds good to me; the mutex.cxx type could be subclassed
3) Specifically in io/fileio/.../fd.cxx, the call to fp->f_ops->fo_close(fp) in fp_ucount_dec(fp) can be done with fdlock released. From browsing through the code, I think it seems possible to do *all* critical operations *before* any call to fp_ucount_dec(); if this is true, fdlock can be unlocked/relocked around the call to fo_close() without impairing atomicity. But I am not sure this is the only place where this deadlock scenario can occur.


Questions:
- what about cyg_file_lock() in fp_ucount_dec() ? Should that also be handled in this way?
- what about LOCK_FILE() everywhere in io.cxx?


Rutger Hofman
VU Amsterdam

--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]