Selective waste collection
Written on 2021-10-21 by Daniel 'jackdaniel' KochmaĆski
When an object in Common Lisp is not reachable it is garbage collected. Some implementations provide the functionality to set finalizers for these objects. A finalizer is a function that is run when the object is not reachable.
Whether the finalizer is run before the object is deallocated or after is a nuance differing between implementations.
On ABCL
, CMU CL
, LispWorks
, Mezzano
, SBCL
and Scieener CL
the
finalizer does not accept any arguments and it can't capture the finalized
object (because otherwise it will be always reachable); effectively it may be
already deallocated. As the least common denominator it is the approach taken
in the portability library trivial-garbage
.
(let* ((file (open "my-file"))
(object (make-instance 'pseudo-stream :file file)))
(flet ((finalize () (close file)))
(trivial-garbage:set-finalizer object (lambda () (close file))))
On contrary ACL
, CCL
, clasp
, clisp
, corman
and ECL
the finalizer
accepts one argument - the finalized object. This relieves the programmer from
the concern of what should be captured but puts the burden on the programmer
to ensure that there are no circular dependencies between finalized objects.
(let ((object (make-instance 'pseudo-stream :file (open "my-file"))))
(flet ((finalize (stream) (close (slot-value stream 'file))))
(another-garbage:set-finalizer object #'finalize)))
The first approach may for instance store weak pointers to objects with registered finalizers and when a weak pointer is broken then the finalizer is called.
The second approach requires more synchronization with GC and for some strategies makes it possible to absolve objects from being collected - i.e by stipulating that finalizers are executed in a topological order one per the garbage collection cycle.
In this post I want to discuss a certain problem related to finalizers I've encountered in an existing codebase. Consider the following code:
(defclass pseudo-stream ()
((resource :initarg :resource :accessor resource)))
(defun open-pseudo-stream (uri)
(make-instance 'pseudo-stream :resource (make-resource uri)))
(defun close-pseudo-stream (object)
(destroy-resource (resource object))))
(defvar *pseudo-streams* (make-hash-table))
(defun reload-pseudo-streams ()
(loop for uri in *uris*
do (setf (gethash uri *pseudo-streams*)
(open-pseudo-stream uri))))
The function reopen-pseudo-streams
may be executed i.e to invalidate caches.
Its main problem is that it leaks resources by not closing the pseudo stream
before opening a new one. If the resource consumes a file descriptor then
we'll eventually run out of them.
A naive solution is to close a stream after assigning a new one:
(defun reload-pseudo-streams/incorrect ()
(loop for uri in *uris*
for old = (gethash uri *pseudo-streams*)
do (setf (gethash uri *pseudo-streams*)
(open-pseudo-stream uri))
(close-pseudo-stream old)))
This solution is not good enough because it is prone to race conditions. In the example below we witness that the old stream (that is closed) may still be referenced after a new one is put in the hash table.
(defun nom-the-stream (uri)
(loop
(let ((stream (gethash uri *pseudo-streams*)))
(some-long-computation-1 stream)
;; reload-pseudo-streams/incorrect called, the stream is closed
(some-long-computation-2 stream) ;; <-- aaaa
)))
This is a moment when you should consider abandoning the function
reload-pseudo-streams/incorrect
and using a finalizer. The new version of
the function open-pseudo-stream
destroys the resource only when the stream
is no longer reachable, so the function nom-the-stream
can safely nom.
When the finalizer accepts the object as an argument then it is enough to
register the function close-pseudo-stream
. Otherwise, since we can't close
over the stream, we close over the resource and open-code destroying it.
(defun open-pseudo-stream (uri)
(let* ((resource (make-resource uri))
(stream (make-instance 'pseudo-stream :resource resource)))
#+trivial-garbage ;; closes over the resource (not the stream)
(flet ((finalizer () (destroy-resource resource)))
(set-finalizer stream #'finalizer))
#+another-garbage ;; doesn't close over anything
(set-finalizer stream #'close-pseudo-stream)
stream))
Story closed, the problem is fixed. It is late friday afternoon, so we eagerly push the commit to the production system and leave home with a warm feeling of fulfilled duty. Two hours later all hell breaks loose and the system fails. The problem is the following function:
(defun run-client (stream)
(assert (pseudo-stream-open-p stream))
(loop for message = (read-message stream)
do (process-message message)
until (eql message :server-closed-connection)
finally (close-pseudo-stream stream)))
The resource is released twice! The first time when the function run-client
closes the stream and the second time when the stream is finalized. A fix for
this issue depends on the finalization strategy:
#+trivial-garbage ;; just remove the reference
(defun close-pseudo-stream (stream)
(setf (resource stream) nil))
#+another-garbage ;; remove the reference and destroy the resource
(defun close-pseudo-stream (stream)
(when-let ((resource (resource steram)))
(setf (resource stream) nil)
(destroy-resource resource)))
With this closing the stream doesn't interfere with the finalization. Hurray! Hopefully nobody noticed, it was late friday afternoon after all. This little incident tought us to never push the code before testing it.
We build the application from scratch, test it a little and... it doesn't work. After some investigation we find the culpirt - the function creates a new stream with the same resource and closes it.
(defun invoke-like-a-good-citizen-with-pseudo-stream (original-stream fn)
(let* ((resource (resource original-stream))
(new-stream (make-instance 'pseudo-stream :resource resource)))
(unwind-protect (funcall fn new-stream)
(close-pseudo-stream new-stream))))
Thanks to our previous provisions closing the stream doesn't collide with finalization however the resource is destroyed for each finalized stream because it is shared between distinct instances.
When the finalizer accepts the collected object as an argument then the solution is easy because all we need is to finalize the resource instead of the pseudo stream (and honestly we should do it from the start!):
#+another-garbage
(defun open-pseudo-stream (uri)
(let* ((resource (make-resource uri))
(stream (make-instance 'pseudo-stream :resource resource)))
(set-finalizer resource #'destroy-resource)
stream))
#+another-garbage
(defun close-pseudo-stream (stream)
(setf (resource stream) nil))
When the finalizer doesnt't accept the object we need to do the trick and finalize a shared pointer instead of a verbatim resource. This has a downside that we need to always unwrap it when used.
#+trivial-garbage
(defun open-pseudo-stream (uri)
(let* ((resource (make-resource uri))
(wrapped (list resource))
(stream (make-instance 'pseudo-stream :resource wrapped)))
(flet ((finalize () (destroy-resource resource)))
(set-finalizer wrapped #'finalize)
stream))
#+trivial-garbage
(defun close-pseudo-stream (stream)
(setf (resource stream) nil))
When writing this post I've got too enthusiastic and dramatized a little about the production systems but it is a fact, that I've proposed a fix similar to the first finalization attempt in this post and when it got merged it broke the production system. That didn't last long though because the older build was deployed almost immedietely. Cheers!