Source code Discindo
Share all-sizes files using reconstructable manifests
This project is no more under active development.
This may affect its usage in unpredictable and unguaranteed ways.
About
Discindo (originally Chopper) is a CLI application which has been built to provide a simple way to exchange all-sizes files over the internet.
It relies on generic free storage providers which offer lot of little space: Discindo basically distributes all the parts of the files over those storages building a manifest that keep tracks of all the pieces.
The resulting manifest file can then be exchanged and used to let Discindo reconstruct the original file.
How to use
The way it can be used is very simple. If you want to chop a file (upload it and generate an exchangeable file):
discindo filename.mp4
If you want to increase chop redundancy:
discindo -r 3 filename.mp4
If you want to rebuild a file starting from its manifest:
discindo build filename.chop
Installation
Discindo is written using Python and has been thought to be free from every external plugin. So you won't need anything else but its source code to run it.
Package manager
Installation from repositories is only available for Solus-Project users which have enabled Theca repository:
eopkg it -y discindo
Run latest source code
git clone https://github.com/streambinder/discindo.git
make -C discindo install || python3 discindo/setup.py install
discindo -h
Engineering
Discindo operates in two different modes: chopping and glueing mode. Since the latter is very easy and most of the engineering relies on the first one, that will be ignored.
Knifing
The first part of the process relies on chopping the file in several chunks, whose size depends on which storage provider is going to be used.
Basically, Discindo will use Knife
component to always request another chunk: this chunk is a byte sequence. In case of binary files, this chunk is translated to simple text using base85 algorithm. The single drawback of this, is that it results in a bigger overall file size. For instance, for every 1024 bytes of binary data requested, Knife
will provide 1280 bytes of base85 encoded text.
The consequence, in case of binary files, is that for every X file data requested for a chunk, the effective payload will be slightly bigger: (X / 4) * 5.
That's because base85 algorithm needs an additional byte for every 4 bytes it's going to encode.
This means that chopping a 10MB file will, at least, upload 12.5MB of data.
Uploading and downloading chunks
The chunk upload phase is actually depending on the provider the chunk is being pushed onto.
In fact, every storage provider is extending a generic (and abstract) Provider
class which imposes to define many methods, such as the most important upload()
and download()
ones, but also many others to make them be properly handled by the whole process, such as the following:
enabled()
: used to indicate whether the provider is usable or notnice_name()
: used to represent in a human-readable way the provideris_supporting()
: used to ask a provider if it's actually able to handle a URI to download content (chunks) from itmax_chunk_size()
: used to indicate the maximum byte size sequence allowed by the providerthrottle()
: used to throttle requests to the provider when aThrottlingException
gets caught
Upload
The entry gets a payload as argument - which is assumed to be at max as the maximum size allowed by that provider - and pack it up in a request that will be done the way the provider class is taught to.
Every upload call must return a URI - if no exception is thrown - that can be passed to the download()
method to get the content back.
Download
The download method accepts a URI string as argument and must always return a byte sequence.
Supporting new providers
The whole thing has been thought to be as extendible as possible: this means any provider can use its very own logic inside every method it must implement and override.
Manifest
A manifest is a base64 encoded content which represents a JSON structured this way:
{
"chunks": [{
"md5": "3b33399a12f208075a2413114220f46c",
"origins": [
"https://provider1/chunk1",
"https://provider3/chunk1"
]
},
{
"md5": "3b33123a12f208075a2413114220f46c",
"origins": [
"https://provider2/chunk2",
"https://provider1/chunk2"
]
}
],
"filename": "file.md",
"binary": True
}
Redundancy
In order to assure more redundancy over the data, Discindo has been taught to provide the possibility to upload every chunk on several storage providers: this obviously increase the amount of data that is being pushed, along with an increase of the probability the file will be kept safe.