ScoutFS is a new open source scalable clustered file system designed and implemented for archiving large data sets to low cost external storage resources such as tape, disk, object, and cloud. Versity uses ScoutFS to power its subscription archiving application called Scout Archive Manager (). In order for the the ScoutAM archiving application to perform work, it must be able to discover large numbers of changes within the file system quickly and efficiently. Legacy archive systems are unable to scale beyond relatively low numbers of files partly due to a reliance upon file system scans implemented through standard POSIX interfaces. The elimination of file system scans was a critical design goal for ScoutFS.
From edge triggered to level triggered
To highlight the power of the new ScoutFS query interface, let’s consider the current generation file system. The edge triggered Versity file system makes the archive workflow scalable by creating event notifications. File system changes are organized into events that are sent to a list of files that need to be examined saving an entire file system scan for each pass of the archive application.
Share this blog post: Tweet
But occasionally events can be lost, resulting in the need for a full metadata scan. Here’s what happens: events are buffered in a fixed size ring buffer. If the archive application is not running or is not sampling the file system fast enough, the ring buffer can overflow and lose events. The archiver gets a flag signaling the overflow, but the lost events can not be recovered. When the application detects the overflow it has to fallback to a full metadata scan.
The new architecture is able to fully support a system with up to one trillion files. A level triggered design lets the archive application query into the time ordered transaction sequence of modified inodes starting at any arbitrary location within the sequence. The transaction sequence is a monotonically increasing count over time that will never be reused. Once the archive application completes work for a given inode in a given sequence, it knows that nothing can change earlier in the sequence. If a file is modified later on, it will show up again in the sequence with a new, higher transaction sequence number. In this architecture, the application only needs to record where it left off in the sequence, and only ever needs to start over at that location.
ScoutFS stores the items for the query interface in a modified log structured merge tree. The details of the structure and interactions are detailed in the ScoutFS whitepaper. The characteristics of the structure allow for fast search, fast ordered retrieval starting at any point, and high likelihood of accessing cached items already in memory. When a file is modified the associated inode is moved to the new position in the sequence with an updated sequence number. And likewise when the inode no longer exists in the filesystem, it will no longer be returned within the results from the query.
Using the query interface
The query interface for the metadata sequence can be thought of as a persistent and reliable inotify system across the entire file system. This allows the userspace application, ScoutAM, to retrieve the sequence of modifications to the filesystem. It can start anywhere in the sequence, so it can wait at the point where no more results are returned and just poll for future transactions. This allows the application to discover any work that it should accomplish as files are created or modified. In this example, we can keep a running count of the files created or modified.
The following section details how the Go library works for interacting with the ScoutFS query interface. If the inner workings aren’t your thing, skip to the next section on an example of the interface at work.
The query will give us walk inodes entries like this:
This is from and documented in /usr/include/scoutfs/ioctl.h in the scoutfs-devel package. For the metadata sequence, major is the sequence number, minor is always 0, and ino is the associated inode.
We’ll be constructing our example application in Go, so we’ll use the corresponding struct to represent our entries.
The query interface allows for a fast retrieval of multiple entries within a single ioctl call. The ioctl argument is a struct with the necessary information. The first field is filled out with the fields representing the first index entry that can be returned. This is how we tell the query where in the sequence it should start. No entries will return with values less than this entry. The last field represents the last sequence entry that can be returned, usually the max uint64 value indicating we want everything after the first. The caller must allocate a buffer large enough to hold the requested number of entries, and pass the pointer to the buffer along with the number of entries the buffer can hold. The index field is the type of sequence that we’re using in our query. For now, the possibilities are metadata sequence (0) and data sequence (1). We’ll be interested in the metadata sequence for this example:
Notice the size of this C struct since the struct is packed. Go does not allow for packed structs, so we need to pack our corresponding struct in code. The example code omits the error handling for clarity, but this is handled correctly in the actual code repo on github.
Now we can define the ioctl call that will return the number of entries populated in our buffer.
The following C ioctl command and index types...
are translated into constants in Go:
Our Go library interface will have a convenience function to initialize the query. The ByMSeq option lets us specify where in the sequence to start from and where to end.
The following is the library call that actually queries the filesystem returning slices of entries in batches based on our buffer size. This must reset the new starting place appropriately for subsequent calls. (Error handling is omitted for clarity.)
And finally a cleanup function closes the file handles:
An example application
The following demonstrates a simple use case for keeping track of how many files and directories are modified and displays the latest count on a webpage.
We’ll run a query for all inodes in the file system in the metadata modified sequence order in a goroutine that will loop on the ScoutFS query. This will keep track of the file count returned until the count is 0 (no more files are left in the query). It will then send the updated count to a channel that the HTTP handler can do something with.
We can create a server type to manage the channel we’ll receive file counts on and to keep track of the last count we had.
We can define the ServeHTTP method so that we can pass the server as an http.Handler to the http.ListenAndServe function. This example uses select to either update the count if there’s a new value on the channel, or if not, it will use the previous value. The function will write a simple webpage that asks the browser to refresh the page every second and print out the latest count.
Finally, our main function creates the channel for the server, and starts the query in a goroutine for the filesystem given from the command line argument and start the HTTP server.
Running the example
To run this application, we’ll need to configure a ScoutFS file system. There’s a quick start on how to get this up and running in the README.md in the git repo (https://github.com/versity/scoutfs-kmod-dev#quick-start). Mine is mounted at the mount point /mnt/scoutfs. I can then build and run the example:
If all went well, you should be able to point a browser at http://localhost:8080 and see our current count of 1 (assuming there’s no activity yet in the file system).
Files/Directories updated: 1
We can create some files and watch the count increase. . .
and the browser window will update the count.
Files/Directories updated: 504
If you’re wondering where that 504 came from if we started at 1 and created 500 files, remember that we created a directory before the files as well. Creating that directory also updated the contents of the top level directory. This leaves us with 500 files + the “lots” directory + the top level directory. That accounts for a count of up to 503, but we ended up with 504. While this ran, I stored a log of the sequences to see what was happening in more detail. There are a total of 4 sequences in the log. The first sequence is the original top level directory. Before anything else is created in this directory, it’s ino 1 in sequence 0. The next sequence is 3, which contains the “lots” directory, and is inode 2, and the top level directory. The next sequence is 4 and contains 25 files and the “lots” directory. This was likely a background sync that happened on my system while I was creating the files. The “lots” directory was included because it was updated with the 25 files. The final sequence is 5 and contains the “lots” directory again with the remaining 475 files. So the duplication of the “lots” directory in multiple transactions has caused our count to increment higher than we might have expected.
So what happens if we restart the application at this point? What will the new count be?
Files/Directories updated: 502
We get 502 because starting the application over again restarted the query back at 0. And part of the query characteristic is that it will only return the latest version of any inode. So we no longer see inode 2, the “lots” directory, in transactions 3 and 4. It only shows up with the latest version of 5. Similarly, if we were to remove all of the files and only leave the “lots” directory and top level directory we and restart the application, we’ll end up with just a count of 2. This is because a query won’t return any inodes that no longer exist.
Files/Directories updated: 2
We hope this overview has given you insight into the power and behavior of the Accelerated Query Interface. The example maintained a count of created and modified files and directories to keep this simple, but any workload that needs to do something to any newly created or modified file can be applied here--anything from maintaining a checksum database, or running a policy engine for an archive workload, to post processing files and automating metadata tags. The interface was intentionally made generic enough to allow for workloads beyond the needs of Versity’s application.
A working version of the ScoutFS Go library and demo application can be found at: https://github.com/versity/scoutfs-go.
Share this blog post: Tweet