Cloudia
The following is a concept for a software / software architecture whose working title is “Cloudia”. Previous names include “humanID” and “virtualme” and might already give you an idea of what it’s all about. So far, there is no source code available but I hope this will change quickly as soon as I manage to merge all the many, many different thoughts I have into a sound concept.
In the following I’m going to briefly describe / advertise the core idea and provide specific examples for practical application of the software. Hopefully, I manage to still attract your attention for the parts that follow. Those are rather long and go into detail of how things are supposed to work and look like in the finished product. There might also be technical issues involved, so be warned. ;)
Finally, please keep in mind that I write about all ideas that have crossed my mind although they may still not fit perfectly to each other. Feel free to drop me a line if you have questions or suggestions.
Introduction
In the very essence, the idea of Cloudia is to share data – among your devices (your PC, laptop / netbook, eBook reader, cellphone et cetera) and among your contacts latter of which play an important role. “Data” is intentionally broadly phrased because Cloudia doesn’t limit you to share just contact information, pictures and videos like most social networks do. “Data” is rather an arbitrary file plus semantic meta information (author, license, links to other data / files; music album title, people one can see on the picture, …) which you can easily publish just for own purposes (that is for using it on another device) or for the whole world. There’s no steep learning curve involved here – you’re just one click away from showing your friends last Saturday night’s pictures. Both sides – total privacy on the one and total transparency on the other hand – can be accomplished equally easily.
Just as the many clouds in the sky would suggest, Cloudia is built in a decentral way to protect your privacy. You can freely choose a provider that fits to you and you feel secure with. This is a notable difference to “clouds” administered by a single company (Google, Amazon, Facebook, …) which require you to trust the national privacy law they are subject to. Instead, you can pick a company that is located in your home country. Nevertheless, you still have access then to the whole network with all users on all carriers. Those help you make your data available and store it in the cloud (i.e. online) and also retrieve it from that place.
As already mentioned, contacts play an important role. You do not just make your data available to them but they can also decide to subscribe to you and be informed as soon as there’s something new. Furthermore, you can also choose to have them collaborating with you on certain files.
Practical examples
The first and most obvious application probably is the creation of a distributed, secure social network. You store your contacts in the cloud and, on the other hand, give your contact information to them. Exchange messages, share files with them and go look for new virtual friends.
Now, imagine a micro-blogging service on Cloudia: You just frequently publish your status as a new file of type “status” and as soon as your contacts have subscribed to this file type they will receive every new status. Though, you may again choose to make some statuses only available for some people or just yourself (as kind of a personal diary).
If you like writing articles and have a personal blog to publish them on, you might be interested in this one: Wordpress and similar blogging software becomes an application for Cloudia which allows you to publish articles (files of type “article”) in the cloud whereas your website uses the Cloudia API to retrieve them from there and finally display them to the user. RSS wouldn’t be necessary anymore because people could just subscribe to your Cloudia account. Also, you could use Cloudia to publish your texts on more than one site, e.g. your personal blog and an online magazine you’re writing for.
If you’re rather into collaboration with others on a text, a software or something similar, you could use Cloudia as a revision control system.
Finally, for the heavy users among you ;), Cloudia might also be considered as a peer-to-peer network for file sharing.
User perspective
Cloud architecture
The user signs up at a server / provider of his choice. The provider is the intermediary for all out-going communication and transmission to other contacts who may be registered at the same or another server. Therefore, the whole cloud is structured in a decentral social network.
Furthermore, the user may choose to register his devices (his mobile, his PC, …) as clients at his provider to have data exchanged and synchronized between them.
Concerning the nature of files & data…
What’s annoying in today’s file systems is that every information one can provide about a file (like author, description, ID3 tags, people in a picture) depends on the format being used. Why is that so? Like these things would change if a file was converted from .mp3 to .ogg! But for music players it does make a difference.
Also, why aren’t things like author / company, the application’s icon, the license actually part of the file’s content? Even better, why aren’t these things outsourced? Interpret, album art and license are mostly equal for every title of an album, which leads to the question if we were not better off storing those in a central location. What’s that central location? The Internet. Or rather: The cloud.
In Cloudia, all songs composed by the Red Hot Chili Peppers would reference the file 86g98gf65dfa566asd5123 (see filenames below) which, being stored somewhere in the cloud (and not necessarily on the user’s device), contained information about the band itself. Some of those songs would also reference file 123ds46sdfj674fdh35sdu6443 which in turn would provide information about the album “Blood Sugar Sex Magik” including a further reference to the album art. (My own personal social profile, however, would contain a reference “girlfriend” to my-girlfriend@her-provider.net.)
Now, that does sound good but there’s still the problem that we had to agree on a consistent format for such interpret or album information. Also, where do we store the file’s meta information (including those references to other files)? An external database not interfering with the files themselves wouldn’t do the trick. If a file was transferred then we’d always have to provide the respective database entry, too. What I have in mind is a .zip container cointaining the original file together with a meta.xml file. The resulting container with – let’s say – Red_Hot_Chili_Peppers_-_Can’t_Stop.ogg inside could then be named Red_Hot_Chili_Peppers_-_Can’t_Stop.ogg.mc (mc = meta container). As for Windows (just trying to help you get the essence; the same obviously applies to other operating systems), we’d still need some kind of Windows Explorer driver hiding the ending .mc from the user and enabling him to open, edit and save those files just like he’s used to to keep general compatibility with the applications he uses. But that should be possible.
Finally, XML appears to be the best option for the storage of meta information as well as files containing album (or similar global) information. My long-time goal therefore are standardized XML formats for file meta data.
Why do we need meta information, let alone file containers?
As I mentioned in the previous paragraph, the lack of meta information is constantly bugging me. However, it wouldn’t just be “cool”. There are several other reasons:
Reason #1: Author and license information
Nowadays, there is an intense debate on whether to and, if so, how to adjust copyright and intellectual property rights to the Internet. In my opinion, no three strikes law can actually force the user to respect and credit an author’s work. (If the ISP is scanning my traffic, why shouldn’t encrypt it?) At the moment, users simply aren’t aware of what they are legally allowed to do with a digital copy of such a work.
I want to support license concepts like Creative Commons that enable a fair use and finally lead to a more open-minded society that respects authors’ rights.
Reason #2: Offline availability
I’m constantly thinking about reorganizing my pictures folder but as there is no standardized and accessable format in which to store date, geo location, description, notes regarding persons and other spots on the picture, I don’t do it. Also, I don’t want to keep all those things up-to-date at multiple places (my PC, social networks, …) because Facebook doesn’t interact with, let’s say, Adobe Lightroom. Neither do I want to be the only one doing it (or rather: being even able to do it because I’m tech-savvy).
Ultimately, for the borders between online and offline to blur, meta information must be available everywhere.
Reason #3: Being disconnected from the cloud
Meta information must be available without even having a client for Cloudia installed. It must not get lost when a users decides to opt out from the network / cloud. Also, other programs are supposed to use it, too, and shouldn’t need to access a foreign (= Cloudia’s) database (no matter if it is local or online). Every file must be be usable independently from
Besides, where to store previous file revisions elsewhere than in a container? (For more information on version control proceed to the next big paragraph.) If meta information is located in a database the user doesn’t have access to, it would eventually disappear when transmitting or backing up data the “normal” way, i.e. copying the files onto a system that doesn’t have a Cloudia client / database installed.
Reason #4: Searching
There is no way a search engine could identify a binary file if it doesn’t know its proper context, i.e. its meta data. (In contrast to that, Google is “just” searching the web for textual information being kind of self-explanatory.) Though, search for data and subscribing to data channels is supposed to become one of the key concepts of Cloudia.
Version Control & unique files vs. evolving files
Being able to share documents with your contacts only makes sense to a certain degree if you cannot also collaborate with them. As already mentioned, revision or version controlling is supposed to permit exactly that. For this reason, the meta container file, which is introduced above, also contains former versions of the text files. To be able to address a single version of a file an identifier is added to the file’s id, like so: 123ds46sdfj674fdh35sdu6443:313.
However, the user can also decide to finalize a certain file. This goes along with the insertion of the file’s checksum (or part of it) into the id, ensuring that a file’s content being transmitted to the user always matches exactly the file the user asked for. This guarantees that no one modified the file after finalization and that references like those exemplified above always point to the same data.
Syncing between own devices
The user may choose which files should stay on a single device (client), which of them should be completely synchronized between the clients and which should also be saved on the server to allow easier and faster access for himself (and his contacts if he wishes so). This is done via tagging.
Not having every file synchronized between all clients doesn’t limit search capabilities (s. below). However, synchronized files are always directly available whereas there’s only a preview for foreign files that aren’t stored on the current client. This enables the user to have all files stored on his personal computer whereas bandwidth capabilities are certainly limited on his cellphone which is why he only has the option to search & preview there (to download single files in case he (desperately) needs them).
Tagging & Sharing
Tags in Cloudia differ from usual tags you got to know in software like Wordpress. They are not used for describing the content – it already contains all information, so why should the user bother to update the tag list whenever he makes changes to the content (read: Why should the user have to tag every song from the Red Hot Chili Peppers as “Red Hot Chili Peppers” although this information is actually already provided by the meta data)?
Also, in my experience, people either tend to overuse the tag feature or they don’t use it at all which is equally bad. In either case, the user isn’t provided with an intuitive way to organize files. It’s the machine that profits from this categorization (when performing a search et cetera). But actually, the machine should adapt to the user, not vice versa. The feature of tagging content is thus to be replaced by a (powerful) search engine.
Tags, however, are only used to declare with whom to share the files:
- Share files decorated with this tag with the users Anthony, Michael, John and Chad.
- Make files decorated with this tag public for everyone in the cloud (and not just the contacts).
- Exclude files decorated with this tag from the cloud. / Make them invisible for other devices and people. (Thus, keep them local to the current device.)
… and to define further settings:
- Synchronize files decorated with this tag between my home PC and my laptop (but not my cellphone).
Concerning the sharing aspect, we have several options:
- Is every contact considered a tag that triggers sharing for every file decorated with it?
- Do both, the contact and the file to be shared with him, need to be decorated with the same tag which automatically enables sharing?
- Does only the file need to be decorated with a tag and its settings include a “Share with: …” option (like above)?
Searching
As we previously concluded, tags aren’t the best option when it comes to comfortably organizing files. Thus, a search engine must take their place. However, the fact that Cloudia is a distributed system makes it hard to think of a way to efficiently search for data in the entire cloud.
The first idea that crossed my mind is that every server / node would be responsible for providing other servers with search results from their data upon request. Yet, there could be millions of nodes the server which a request originates from doesn’t know (because none of its users has contacts on those nodes / servers).
However, if every server was to create a search index of the user data it stores, to look through it upon request from another server as well as to forward it to every other server it knows, a single search request from a single user would cause a whole wave of further search requests going through the Cloud. Every search request of every user in the cloud would finally also reach the smallest server which is only known by a few others. And with millions of requests coming in every second, a DDoS for some of them would be more than likely.
Is there an elegant and tamper-proof way of limiting the deepness that a search request triggers (or rather the distance a wave travels)? Such as, only asking servers till the 10th level to go look for results and not to forward the request any further? This way, users would only have access to a certain part of the cloud, though. Let alone those users having an own server – they’d never appear in the search results as they are only known by a few people (and, thus, a few servers). Finally, this would result in more and more users registering at the big providers instead of in an increased diversity (which is desirable for the security and load balance reasons already mentioned).
So, does it come down to third-party search and crawling engines which servers give access to their public search index? This would also allow cross-referencing the records to provide better search results. Yet, the price is high: The independence of the cloud.
The information provided so far should be consistent. What follows beyond this page, however, may not.