The code repo for this project is now at Googlecode. I was facing some issues checking in my code and waited for some time for it to get resolved. Unfortunately it did not happen, and I had to shift my code base elsewhere. :(

This project's home has been shifted to the MnemonicFS site. Please visit it to get the latest updated information.

Project Description

Project Name

Referred to as either MnemonicFS or The Mnemonic Filing System.

License

MnemonicFS is both open source and free software, licensed under the New BSD License. This essentially means that you are free to use it for whatever purpose you see fit, commercially or non-commercially, without paying anybody fat (or even “thin”) royalties. You are also free to browse the source code if you have the inclination to do so. You are also not obligated to make your own product open source if you decide to add this library to it. Make sure you read and understand the implications of the said license.

What is MnemonicFS

MnemonicFS, or the Mnemonic Filing System, is a community project that attempts to address the problem of information glut or data proliferation at the user level. It does this by using an innovative method for filing of information, using aspects during information storage, and crosscut queries during information retrieval. Here, the word “information” could mean any kind of information, viz., files containing any kind of data, regardless of their format, contact information, urls, or even “sticky notes” kind of information.

Languages & Platforms

MnemonicFS is being built using the C# programming language on the dotNET platform. Hopefully, it should work on Mono too.

End-user

Please note that MnemonicFS will build as a software library, and not as an application. We do intend to have a running application for various platforms as a sort of “reference implementation” that demonstrates how the library is meant to be used by application developers; that, however, will be a separate project by itself.

What problem MnemonicFS attempts to solve

One of the biggest problems enterprises and individuals face today is information glut or data proliferation, i.e., excess data, redundant data, and the resultant problems faced during information retrieval; in short, information overload. As a security expert puts it “Information is the pollution of the information age.” This has resulted in enterprises failing to find information as and when they need it, and even when they do, multiple version of the same document results in failure to find the most authoritative one. The analyst firm IDC estimates that more than 150 billion gigabytes of information were produced in 2006. By 2011, it is estimated that this figure will reach 1,800 billion gigabytes of data. That’s more than a tenfold increase.

This situation was inevitable considering the multifarious tools available today for creating greater and greater content. There just seems to be too much data and too few tools to manage it. This document describes a strategy (The Mnemonic Filing System) that will attempt to address precisely these problems, although it does not promise to be a silver bullet that will end it. However, to a very large extent it attempts to at least mitigate it at the individual user’s level.

Quick Description

At the simplest level, MnemonicFS can be described as an abstraction over the operating system’s file system.

The vision for MnemonicFS is to have the user or the information owner use this abstraction and eventually “unlearn” the standard file system as provided by the operating system. Please note that MnemonicFS is not a replacement for the native file system as provided by the operating system; rather it is meant to complement it as an abstraction over and above it. Thus, the reader is urged to note that the term is The Mnemonic Filing System, and not The Mnemonic File System. Both terms have very different connotations in that that a file system is an abstraction provided via the operating system to enable user applications ease of storage access, thus doing away with their having to deal with lower-end mechanisms during storage and retrieval of files. This in turn enhances the application developer’s productivity since they do not have to deal with these low-level issues during application development. We are not concerned with the file system in this project insofar as the concepts are concerned.

Rather, MnemonicFS is a filing system, which essentially means that it is a way of storing information on a conventional operating system-provided file system to enable very quick and rapid retrieval of documents and other such data. In the following section on MnemonicFS features, we will see how it achieves this.

Features of MnemonicFS

MnemonicFS has the following features:
  1. No notion of directories;
  2. Heavy reliance on aspects; and
  3. Usage of crosscut queries during retrieval.

We next explore each of these concepts in more detail.

No Directories

Directories are passé, purely from an information-organizing perspective. The reason for this statement is that the amount of information that the user needs to store has gone up tremendously, and using directories to sort information just doesn’t seem to work anymore. Of course, directories are necessary at the operating system level; however, MnemonicFS per se does away with the notion of directories completely at its presentation level to the user. (Internally though, MnemonicFS does use the conventional file system and directories as provided by the operating system.)

Conventional file systems that use directories essentially present information to the user in two-dimensional format, i.e., the top-level root directory (for example, C:\ on Windows or / on Unix) has a whole list of “branches” in the form of directories, and each of these directories in turn has its own set of sub-directories. MnemonicFS, however, presents information to the user in three- and even four-dimensional formats. We’ll understand how when we talk about aspects and crosscut queries next.

Aspects

An “aspect,” as the word implies, is a view into the MnemonicFS’ corpus of documents. It is a method for labeling files within the MnemonicFS storage, and is one of the most important featured components within its subsystem. When a file is to be stored within MnemonicFS, the user can apply one or more aspects to it. Each of the aspects applied on the file identify one facet of its nature. An aspect can even be thought of as a label.

To take a practical example, a user may have defined several disparate aspects within his system, like say, “Supplier Acme,” “Supplier Emca,” “Work,” “Home,” “University,” “Family,” “Friends,” “Invoices,” “Addresses,” “Customers,” “Proposals,” “Presentations,” “Product Routers,” “Product Servers,” and so on, within their MFS system. Each of these aspects is quite different from the other, and in fact, a point to note is that no two of them are related.

Let’s say that the user now wants to save a file, “Proposal.doc” to his system. This file is from his supplier Acme, and contains a commercial proposal for routers. In this case, during storage, the user will have to take care to apply some aspects to this file to enable quick retrieval later on. For example, the aspects that the user could apply to this file would be “Work,” “Supplier Acme,” “Proposals,” and “Product Routers.” Having applied these aspects to the file, the user will thus save it to the system.

Next, the user may want to save a document containing the address of a friend from the university that he attended. To this document, he will apply the following aspects: “Friends,” “Addresses,” and “University.”

Yet another document could be the contact information (address) of his partner company Emca that supplies servers. To this, he will apply the aspects: “Supplier Emca,” “Work,” “Addresses,” and “Product Servers.”

Another document could be his home’s utility bill. To this, he could apply the aspects: “Home” and “Invoices.”

Another document could be an invoice to his customer for servers. To this document, the user will apply the aspects: “Work,” “Invoices,” “Customers,” and “Product Servers.”

Thus, the same aspect could be applied to multiple documents, each of a diverse nature that has nothing to do with another that has the same aspect applied on it.

This way, the user keeps applying aspects to each and every document that he saves, thereby making retrieval far easier than it otherwise would have been. The point is that the user needs to invest just a little extra time when he saves a document (typically in the order of a few seconds); however, this extra time that he puts in initially yields benefits in far greater proportion than imagined. Another feature of this system is that the user can also define the time-stamp that he would like to have it saved at. These options can have an enormous positive impact during the file retrieval process.

Later on, when the user needs to retrieve any of these documents, he does so using crosscut queries (discussed next).

Crosscut Queries

A crosscut query is a method using which aspected files can be later retrieved from MnemonicFS, and the file set thus retrieved represents a crosscut. For example, taking the user’s example from the previous section, if he later wants to retrieve his supplier Acme’s proposal for routers that he had received from them in, say, 2003, he can do so using a crosscut query. A crosscut query can be made as sharp and narrow as the user would like, or as blunt and broad as needed.

To begin with, assuming that the user recalls only the fact that the document in question was from his supplier Acme, had to do with routers, and it was a proposal. The year it was received in was 2003.

The user then proceeds to create a crosscut query from whatever bits of information that he has at his disposal at that point in time. He creates a crosscut with the following aspects: “Supplier Acme,” “Product Routers,” and “Proposals.” He also includes the year as 2003 within the crosscut query. Please note that each and every piece of information provided by the user within the crosscut query is optional.

He then runs this crosscut query against MnemonicFS which, using this information retrieves all the documents that fall within these aspects, within the year queried. If the user wants, he can fine-tune the crosscut even further by including the month and day, and indeed, even the hour that the file was saved in. The result returned by a crosscut query is a set with zero, one, or more (possibly even hundreds of) members (files) in it.

How it works

To understand how crosscuts work in three dimensional space, imagine that the entire corpus of the user’s documents is represented by a pie.

When an aspect is applied to a file, imagine that the file “moves” slightly within this pie to be included within the “slice” that represents this aspect. Now, if another aspect is applied to the same file, it “moves” yet again to become a part of the new “slice” (aspect), while still retaining its membership of the previous aspect. This way, as newer aspects are applied to a file, it keeps “moving” within the pie to become a member of the newer aspects, while still being a part of the previous ones. Think of it as a bunch of Venn diagrams in 3D.

During retrieval, the user defines a crosscut query by including aspects within it. (A crosscut query takes a set of aspects as input and returns a set of files – a crosscut – as output.) When the user adds the first aspect to the crosscut query, the MFS includes all the files that belong to that aspect within the output set. When he adds another aspect to the crosscut query, the MFS takes an intersection of the two file sets thus retaining only the common files within the output set. When he adds yet another aspect to the input crosscut query, MnemonicFS creates yet another intersection of common files and retains these common files within the output file set. This way, MnemonicFS returns the common set of files that represents the least common denominator among all the files that have a membership of each aspect within the input crosscut query.

To visualize this in 3D, imagine the same pie again. When the user creates a crosscut query, a section of the pie is sliced off. When the user adds another aspect to the crosscut query, yet another sub-slice is sliced off from this slice. Each sub-slice that is thus created represents the set of output files within the input aspect set. Thus, the user keeps slicing this pie further and further by adding more and more aspects to the input query, eventually arriving at the “bit” of the pie that he needs. This “bit” may be comprised of one file, hundreds of files, or even no (zero) files. It all depends on how judiciously the user had applied aspects to his files when adding them to the system.

When the user creates a crosscut query along with a time dimension defined within the crosscut query, this has the effect of creating a four dimensional view of the document set within MnemonicFS, with time representing the fourth dimension. Each snapshot of its entire corpus of documents represents a point in time. Adding a time dimension to the crosscut query results in the output document set getting narrowed down and restricted to the document corpus within that particular snapshot in time.

Of course, there are no dimensions at the point where actual storage takes place, since physical storage memory is always linear. However, MnemonicFS is essentially an abstraction over the operating system’s file system, and only conceptually presents a multi-dimensional view to the user that aids in better visualization of the data to be stored.

What else other than files?

Other than files, MnemonicFS also allows the user to store notes, very similar to sticky notes, within the system.

Versioning

MnemonicFS also allows users to version their documents, with each version bearing a unique integer value. Thus, the first version of the document stored within MnemonicFS has version number zero, the next version has version number 1, the one after that has version number 2, and so on. The mechanism for achieving this is enumerated in the section Developer-specific Information below (to be added).

Briefcases & Collections

MnemonicFS also supports the notions of briefcases and collections. As the word implies, a briefcase is a container for documents, while a collection is an abstract view into a set of documents. Needless to add, a document may be contained within one briefcase only, while multiple collections may refer to the same document.

MnemonicFS Word Etymology

As can be seen from the previous sections, the Mnemonic Filing System presents an advanced classification and filing system to users for organizing their information. This is where the word mnemonic applies. To take an everyday example, if a person would like to remember something like, for example, taking her house keys before she leaves her home, a common trick is to tie a knot around one of her fingers. This serves as a reminder for her when she is exiting her house that the knot on her finger is a pointer to something important that should not be missed. Similarly, MnemonicFS provides facilities to the user in the form of aspects to create multiple “knots” within the system so that she does not forget where she has kept her files. An aspect serves as a mnemonic insofar as it describes one part of the nature of the document being stored.

Multiple aspects on a document are pointers to multiple aspects meaningful to that document and tend to create a signature of sorts that reveal a unique fingerprint of that document. MnemonicFS exploits and leverages precisely this nature of documentation to create a system that thus provides a tremendous amount of traction to the user during document retrieval.

Features

Feature: Standalone Filing System
Benefit: MnemonicFS should be considered as a standalone storage component, in much the same as a file system. (Of course, any similarity with a standard file system ends there, since the MFS is a filing system, and not a file system.) For all practical purposes, MnemonicFS should be usable by application programs as is. Just configure the mfs.config file, and you're good to go.

Feature: Advanced Document Classification System
Benefit: Elimination of file redundancy; less confusion for the user since the problem of multiple copies of the same file in disparate places is eliminated. Consequently, there is a greater saving is storage requirement. MnemonicFS has specifically been created keeping in mind the aim to reduce data proliferation and eliminate redundant data.

Feature: Multi-dimension Data Storage
Benefit: The MnemonicFS storage can be perceived as a multi-dimensional storage and filing system, with up to four dimensions of perception. This greatly enhances the user's ability to retrieve files and information even several years down the line, with a minimal amount of input information required for retrieval.

Feature: Library, not an Application
Benefit: Wide Application for MnemonicFS, since it builds as a software library, and can therefore be linked with any application that would like to use its capabilities. It would especially be good for Document Management Systems, which need to have an advanced filing and classification system that MnemonicFS provides.

Feature: Multi-user System
Benefit: MnemonicFS is a multi-user system that allows the user application to initialize an MFSOperations object using a unique user string. for example, the user string may be something like the user's email id. It is entirely up to the client application to decide what string to use for creating and using an account. This is an especially useful benefit for enterprise application developers.

Feature: User Isolation
Benefit: MnemonicFS will never interfere allow one user's subsystem to interfere with another's. User isolation is guaranteed. This has the benefit of disallowing one user from "peeking" into another user's subsystem.

Feature: Support for file versioning
Benefit: MnemonicFS supports file versioning, so that as users keep making changes to the original file, the newer versions are each saved with a version-specific number and comment. This helps the user trace the entire history of a document's changes, including user comments for each change and the date-time when these changes were made. Another benefit of this is that at any point in time, a specific version of the document can be retrieved by the user.

Feature: Conflict and Merge Features
Benefit: While saving a file version, if one user has made changes and saved a newer version of the document, no other user will be able to save their own modifications as a newer version unless they incorporate the first user's changes. This feature has the benefit of preventing fragmentation of document versions.

Feature: Support for Briefcases
Benefit: Any document that has been saved to MnemonicFS can be put into a briefcase which presents another level of classification for the user. This can be especially useful when the user would like to "carry" a briefcase with him/herself to another location. Just an "Archive Briefcase" command would do the trick.

Feature: Support for Collections
Benefit: A collection helps a user to group different documents into a collection, and unlike as in a briefcase, a file may belong to multiple collections. This can be very useful for adding a richer set of dimensions to a set of documents.

Feature: Multiple Encryption Levels
Benefit: Since MnemonicFS has multiple layers of encryption, the user's meta-data information is more secure. The meta-data store itself is encrypted, and the files too will be in a future release.

Feature: Test-driven Development
Benefit: Since the entire code-base of MnemonicFS has been developed using the TDD agile methodology, the client application can be guaranteed that the public interfaces of the library work as promised (at least to the extent as demonstrated by the tests). This also has a very pleasant side-effect in that that the application developer has a whole set of usage scenarios available to them. This greatly reduces the latency in time required between understanding the library and its actual usage in the end application.

Feature: Open Source
Benefit: Since the library is completely open-source and licensed under the BSD license, anyone is free to view and make changes to the source code. This renders the library less susceptible to bugs, making it vastly superior to closed-source and proprietary systems.

Feature: BSD license - Business-friendly
Benefit: The BSD license permits programmers to incorporate MnemonicFS into their own applications without having to make their entire product open-source. This makes MnemonicFS an extremely attractive option for including into their programs.

Ongoing project. More Features to be added soon.
MnemonicFS is an on-going project by the community to enhance this library. Please see the Wish list section for more features that will be added soon.

Developer-specific Information

Please note that the entire code-base (thus far) has been developed using Test-driven Development (TDD). So you can check out the code and run the tests to verify that the public interfaces do what they promise to do. One good thing about TDD is that it does not just facilitate development of bug-free code, the tests also act as demonstrators of how the APIs should be used by the client application.

Another thing to bear in mind is that the code is currently alpha.

Meta-data store

MnemonicFS uses Sqlite as the met-data store. Please make sure that you read the file "Database Design Standards.doc" within the <Project Home>/Docs folder to get an understanding of the motivation for using the table naming conventions used.

Source code

Already checked in. Use any SVN client to check out the source. Please refer to the file Readme.txt within the trunk for compilation instructions.

Last edited Jun 2, 2011 at 7:06 AM by najeebshaikh, version 33