#matrix , #ipfs , #scuttlebutt and now #mastodon – We're living in awesome
times! centralization < decentralization/federation < distribution!
#lovefortech
(me, April 10, 2017, on mastodon)
The idea
With the rise of protocols like the matrix protocol, activitypub and others,
decentralized social community platforms like matrix, mastodon and others gained
power and were made real.
I consider these platforms, especially mastodon and matrix, to be great steps
into the future and am using both enthusiastically.
But can we do better? Can we do more distribution,? I think so!
So far we have a twitter-like thumbleblog platform (mastodon), a chat platform
(matrix) and facebook-like platforms (diaspora and friendica) which
are federated (some form of decentralization). I think we can make a
completely distributed social network platform reality today.
Let me reiterate on that: I think, we can make a facebook/googleplus/etc clone
which works without a central component, today. And I would even go one step
further and state: All we need for this is IPFS (and
related technology like IPLD and IPNS)!
This platform would feature personal profiles, publishing
articles/posts/images/videos/voice messages/etc, instant messaging, following
others, and all the things one would want in such a platform.
How would it work?
What do we need for this? Well, as stated before: not much!
From what I can think of, we would need IPFS, some sort of public/private key
functionality (which IPFS already has), a nice frontend-framework and that's
basically it.
Let me tell you how I think such a platform would work.
The moment a user starts the application, the application would boot an IPFS
node.
The username and all other information about the profile are added to IPFS as
structured data.
If the profile changes because the user edits it, it is added to IPFS again,
using IPLD to link to its previous version.
If a user adds a post to her profile, that post is added to IPFS as well and
linked from the profile via IPLD.
All other nodes are informed about the new content via pubsub and are free to
pin the new content (the new profile version) or only cache it for a while (or
to not care at all).
The post itself could add a link to the IPNS hash of the profile under which the
post is published. This way, a link from the post to the current version of the
profile would always exist.
Because the profile always links to its previous version as well as to the
post content, that would imply that the node the user of the profile runs would
always keep all data the user adds to the network.
As the data is only kept by links, the user is free to drop published
content at any point in time.
This means that basically each operation would “generate” a new profile, which
is of course published as an IPNS name.
Following others would be a matter of subscribing to their “pub” channel (as
in “pubsub”) or their IPNS name.
Chat
A chat application using IPFS is already implemented with
orbit, so that's a matter of integrating
one application into another.
Peer-to-Peer (or rather Profile-to-Profile) messaging is therefore no problem.
All the data would be saved in a structured format. For example Json (though
order of serialization is important, because of cryptographic hashes) or Bson
or any other data serialization format that is widely adopted.
Sidenote: As long as it is made clear that any client must support all
formats, the format itself doesn't matter that much.
For simplicity of this article, I stick to Json (and also because it is most
widely known).
A Profile(-version) would look roughly like this (consider 'ipfs hash'
to
mean “some kind of IPLD link” in this context):
{
"previous": [ "<ipfs hash>" ],
"post": {
"type": "<post type>",
"nodes": ["<ipfs hash>"],
"metadata": {
"date": "2017-12-12T12:00:00+0200",
"tags": [],
"category": "kittens",
"custom": {}
}
}
}
Let me explain:
- The
previous
key would point to the previous profile version(s).
It would only contain IPFS hashes (Why plural, see below in
“Multi-Device Support”).
- The
post
key would contain information about the post published with this
profile version.
- The
type
of the post could be “article”, “image”, “video”... normal stuff.
But also “biography” for the biography shown on the profile or other things.
Even “username” would be possible, for adding a user name to the profile.
- The
nodes
key would point to an IPFS hash containing the actual payload;
either the text of the article (only one hash then) or the ipfs hashes of
the pictures, the video(s) or other binary content.
Of course, posts could be formatted using Markdown, reStructured Text or
whatever format one likes to use. It would be a clients job to render it
properly.
- The
metadata
field would contain plain meta information, like
published date, tags, category and also custom metainformation as
key-value pairs.
Maybe a version
attribute for protocol version could be added as well.
Of course, this should be considered an incomplete example, as I almost
certainly forgot things here.
The idea of linking the previous version of a profile from each new version of
the profile is very much blockchain-like, of course, with the difference that
nobody needs to fetch the whole chain but only the latest one to get a
profile.
The more content a viewer of the profile wants to see, the more she needs to
traverse the graph of profile versions (and automatically caching the content
for others).
This would automatically result in older content beeing “forgotten” slowly
(but the content would not be forgotten until the publisher itself and all
other “pinners” drop it).
Because the actual payload is not stored in the fetched data, the actual
amount of data which is required to simply view a profile is rather small.
A client could be configured to fetch all textual content of a file, but not
more than 10 versions, or one screenpage, or something like that. The
possibilities are endless here.
Federated component
One might think “If I go offline with my node, my posts are not accessible if
nobody else is online having them”. And that's true.
That's why I would introduce a federated component, which would run a
stripped-down version of the application.
As soon as another instance connects and a new post is announced via pubsub,
the instance automatically pins or caches it.
Of course, this would mean that all of these federated instances would pin all
content, which is surely not nice.
One (rather simple and maybe even stupid) option would be to roll a dice and
make the chance that a post is pinned a 50-50 thing, or something like that.
Also, posts which are pinned for a certain amount of time are most likely
distributed well enough so the federated component nodes can drop them...
maybe after 90 days, maybe after 10... Details!
Blockchain-Approaches
The fundamental problem with Blockchains is that every peer in the network
hosts the complete content. Nobody benefits from that, especially if you think
of a social network which should also work on mobile devices.
With users loading up images, videos and other large blobs of data, a
blockchain is the wrong approach.
That's why I think a social network on Euthereum, Bitcoin or any other
crypto-currency/blockchain is not an option at all.
IPLD
IPLD can be used not only to link posts and profiles, but
also to link from content to content. Namely to link from one post to another,
from a post to an image, a video, a voice message,...
but also to link from one post to a git commit, an euthereum transaction or
any other IPLD-supported data structure.
Once nice detail is that one does not have to traverse these links.
If a user sees a post which links to other posts, for example, she does not
have to fetch these links to see the post itself, only if she wants to see the
linked content.
Caching nodes, on the other hand, can automatically traverse the whole graph
and fetch all the content into their cache.
That makes a IPLD-based linking approach really beneficial.
Scuttlebutt
Scuttlebutt is a first step into the right direction.
One can say what one wants about electron and the whole technology stack which
is used in Scuttlebutt (and like or dislike the whole Javascript world), but
so far Scuttlebutt seems like the first social network that is completely
distributed.
I thought about whether it would be a great idea to port Scuttlebutt to use
IPFS in the backend.
From what I know right now, it would be a nice way of bringing IPFS and IPLD
to the mix and therefor enhancing and extending the capabilities of
Scuttlebutt itself.
I have not final conclusion on that thought, though.
Problems
There are several problems one has to think about when designing such a
system.
Comments on Posts (and comments)
Consider you want to comment on a post. Of course you create new content,
which links to the post you just commented.
But the person who wrote the original post does not automatically link to your
comment, so is neither able to find the comment (which could be solved via
pubsub), nor are others able to find them.
The approach to this problem is simple: Notification about comments can be
done via pubsub.
And, if a user gets a notification about a new comment, she can approve it and
automatically publish a new version of her post, with some added meta information:
- A link to the comment
- A link to the “old version of the content in IPFS”
Now, if a client fetches all posts of a profile, it resolves all entries for
their newest version (so basically the one entry which does not link to an
older version of itself) and only shows the latest versions of it.
Comments on comments (and so on) would be possible with the exact same approach.
That would, of course, cause a whole tree of comments to be rebuild every time
a new comment is added.
Maybe not the best idea in that regard.
Multi-Device Support
There are several problems regarding multi-device support.
Publishing content
Publishing from multiple devices with the same profile is possible – one just
needs to import the private key for the signatures and the profile information
to the other device.
Though, this needs some sort of merging mechanism if two posts are published
from two devices (or more) at the same time / without the other devices beeing
online to get notifications of the new point of truth.
As creating two posts from two seperate devices would create two new versions of
the profile (because of IPLD linking), which means two points of truth suddenly
exists, a merging-mechanism must be implemented to merged multiple points of
truth for the profile.
This could yield a rather large network of profile versions, but ultimatively
a DAG (Directed Acyclic Graph).
Profile Init
^
|
Post A
^
|
Post B <----+
^ |
| |
+-----> Post C Post C'
| ^ ^
| | |
Post D Post D' Post D''
^ ^ ^
| | |
| +--------+
| |
| Post E
| ^
| |
+----------+
|
|
Post F
A scenario like the one above (each Post
also represents a new version of
the profile) would be easy to create with three devices:
- One starts using the network on a notebook
- Post
A
published from the notebook
- Post
B
published from the notebook
- Profile added on the workstation
- Post
C
published from the notebook while off of the internet
- Post
C'
published on the workstation
- Profile added to the mobile phone (from the notebook)
- Post
D
published from the mobile while off of the internet
- Post
D'
published from the notebook while off of the internet
- Post
D''
published on the workstation
- Notebook comes back online, Post
E
published, merging the state from
Post D''
from the workstation and Post D'
from the notebook itself.
- Phone comes online, one of the devices is used to publish
Post F
, merging
the state from Post D
and Post E
.
In this scenario, there would still be one problem, though: If the profile is
published as an IPNS name, branching off of versions would be problematic.
If C
is published while C'
is published, both devices would publish their
version as an IPNS name.
Now, first come first serve applies. And of course that is problematic,
because every device would always see one of the posts, but no device could see
the other.
Only at E
(in the above example), when the branches are merged, both C
and
C'
would be visible (though D
wouldn't be visible as long as it isn't
merged into the chain).
But how does a device discover that there are two “current” versions which
have to be linked to the new post?
So, discoverability is an issue in this approach. Maybe someone can come up
with a clean and easy solution that would work for netsplit and all those
scenarios.
One idea would be that there is a profile-key which is used to publish profile
versions under an IPNS name as well as a device-key, which is used to
announce profile versions as a seperate IPNS name.
That IPNS name could be added to the profile, so each other device can find it
and fetch “current” versions from each device.
Only the initial setup of a new device would need to be made carefully then.
Or, maybe, the whole approach is wrong and another approach would fit better
for this kind of problem. I don't know.
Subscribing
Another issue with multi-device support would be subscribing. For example, if
a user (lets call her Amy) subscribes to another user (lets call him Sheldon) on
her Notebook, this information needs to be stored somehow.
And because Amys machines do not necessarily sync with each other, her
mobile phone may never know that following Sheldon is a thing now!
This problem could by solved by storing the “follow”-information in her public
profile. Although, some users might not like everyone to know who to follow.
Cryptographic things could be considered to fix visibility.
But then, users may want to “categorize” their friends, store them in groups
or whatever. This information would be stored in the public profile as well,
which would create even more noise on the network.
Also, because cryptography is hard and information would be stored forever,
this might not be an option as some day, the crypto might be broken and reveal
all the things that were stored privately before.
Deleting profile versions
Some time, a user may want to remove a biography entry or a user name she once
published.
Because all the information is chained in a long chain of versions, one may
think that deleting a node is not possible.
But it is!
Consider the following (simple) graph of profile versions:
A<---B<---C<---D<---E
If the user now wants to delete node C
in this graph, she simply drops it.
Now, E
beeing the latest point of truth, one may think that finding B
and
A
is not possible anymore. That's true. But why not shipping around this by
creating a new profile version and linking the previous versions:
A<---B <---D<---E<---F
\ /
-----------------
Of course, D
would now point to a node which does not exist. But that is not
a problem. Indeed, its a fundamental concept of the idea – that content may be
unavailable.
F
must not contain new content. It even should not, because dropping F
because of its content becomes harder this way. Also, new versions of the
profile is simple and cheap.
Problems are hard in distributed environments
I do not claim to know the final solution to any of these problems. Its just
that I think of them and would love to get an open conversation started on the
whole subject of distributed social networks and problems that come with them.
tags: #distributed #network #open-source #social #software