This is the mail archive of the
ecos-discuss@sourceware.org
mailing list for the eCos project.
Deadlock scenario on close() in io/fileio
- From: Rutger Hofman <rutger at cs dot vu dot nl>
- To: ecos-discuss at ecos dot sourceware dot org
- Date: Thu, 11 Dec 2008 11:37:42 +0100
- Subject: [ECOS] Deadlock scenario on close() in io/fileio
Good morning list,
I ran into a deadlock with io/fileio on invoking close(fd) in
io/fileio/current/src/fd.cxx.
The problem arises because close() grabs fd_lock, then calls
cyg_fd_free() which in turn calls the file system's fo_close(), which
will flush any pending data; this flush invokes write() which will again
try to grab fd_lock. Twice the same lock in one thread -> deadlock.
The stack trace on deadlock clearly shows this:
close()
cyg_fd_free()
lock(fdlock) <======================================
fd_close()
fp_ucount_dec()
cyg_yaffs_fo_close()
yaffs_close()
yaffs_FlushFile()
yaffs_UpdateObjectHeader()
readwritev()
cyg_fp_get()
lock(fdlock) <===========================
The code of close() even has a comment in the call to cyg_fd_free() that
points out that the file's fo_close may be called.
Now, this scenario can be circumvented by having close(fd) first of all
call fsync(fd), which will enforce the flush before closing. When I
inserted this fsync(fd) call, the deadlock disappeared.
But I think this is a patchwork solution. There is *no guarantee at all*
which code fo_close() will choose to call. It might try to flush, to
open a metanode, etc etc.
I think there are a few possible solutions:
1) a full-fledged continuation mechanism where all locks are released
when a layer is left;
2) allow fdlock to be grabbed recursively, as in Java-style synchronized
locking;
3) check if this is the only occurrence of this deadlock scenario, and
check if the lock can be released in fp_ucount_dec without impairing
atomicity
1) is a lot of work
2) sounds good to me; the mutex.cxx type could be subclassed
3) Specifically in io/fileio/.../fd.cxx, the call to
fp->f_ops->fo_close(fp) in fp_ucount_dec(fp) can be done with fdlock
released. From browsing through the code, I think it seems possible to
do *all* critical operations *before* any call to fp_ucount_dec(); if
this is true, fdlock can be unlocked/relocked around the call to
fo_close() without impairing atomicity. But I am not sure this is the
only place where this deadlock scenario can occur.
Questions:
- what about cyg_file_lock() in fp_ucount_dec() ? Should that also be
handled in this way?
- what about LOCK_FILE() everywhere in io.cxx?
Rutger Hofman
VU Amsterdam
--
Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos
and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss