blob: 982e9a0a0109173aa12611d9c1ee572ae77a2b7e (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
# Replace IPFS
IPFS has some nice features exposing hashed files through HTTP. The overall system, however is overengineered and puts load on the server. Also it is hard to deal with private data.
Our requirements:
* Expose public files over the web
=> https://files.genenetwork.org/
* Serve them under a hash, so we can guarantee/encourage reproducibility
* Deduplication on the disk drive
* Make versioned hashes easy to find (IPFS does not expose these)
## Create a hash of the directory
```
find current/ -type f -exec md5sum {} \;|awk '{ print $1 }'|md5sum
```
Then move the files into a dir of that name and expose through, say nginx.
## Deduplication
This can be handled later. One option is to symlink identical files. Another is to use a deduplicating file system or even borg with fuse mounts. git-mount and gitfs may do the job too, though git is not particularly great at handling large files.
## Info
Public files are hosted on tux02
=> https://files.genenetwork.org/
I have added a git repo that is now 11Gb. Looks like it can perform alright, as long as we expose textual files. A cgit frontend can deal with versions.
=> https://git.zx2c4.com/cgit/
See also
=> https://eris.codeberg.page/spec/ Block level storage with URN
|