GitLab Backups for Bitcoin Core repository

(This is a Meta post but for Bitcoin Core, so I wasn’t sure where to put it. Feel free to move it but I think it might also be a good idea to have “Dev Tools” category or something similar.)

Last year, I set out to investigate using a self-hosted GitLab instance as an option to create backups of the Bitcoin Core GitHub repo which could not only persist the information but also be browsed and used as a basis to continue development quickly, no matter whether the project might want to switch any time in the future or be forced to do so.

While the setup of a GitLab server and initial trial runs looked like this project might be a matter of days, getting a full backup with all data to succeed took a lot of research, conversations with GitLab support, and also much trial & error. Earlier this week was the first time a full backup run succeeded. The result can be seen here with the latest data being from 2024-02-26: bitcoin / bitcoin · GitLab

I have documented all the relevant findings in this gist: Self-hosting Bitcoin Core on GitLab · GitHub It should have all the necessary information for others to set up their own GitLab backup server. If anything is unclear or doesn’t work, please send me your feedback.

I will leave the backup untouched at least until the end of next week so that people can look at the data and give feedback. I will then run a script that performs the backup regularly and persists it as well. But the downside of this will be that the data can not be viewed anymore most of the time since this is not possible while the import to GitLab is running and the import process apparently takes up to 36 hours due to GitHub API restrictions.

Last but not least, of course, GitLab can also be used to back up other repositories that are not Bitcoin Core. But in that case, you probably don’t even need all of the special configurations because these are mostly needed due to the scale, contributor base and level of activity on Bitcoin Core.

9 Likes

Thanks for looking into this and sharing what you’ve found. Sad to hear that a continues import between GitLab and GitHub isn’t possible.

Would it be possible to run two instances where one is importing while the other is displaying and then switch them over once the other is done importing and so on? Could also do this on a 48h timer (you mentioned the import takes 36h).


I’ve also set up some GitHub backup and mirroring of Bitcoin related repositories last year. I think putting some links here makes sense in case someone is looking for backups or a mirror. Sorry for hijacking your post.

I have the backups and a mirror on https://mirror.b10c.me. For example, https://mirror.b10c.me/bitcoin-bitcoin shows a GitHub like read-only bitcoin/bitcoin repository mirror. I’m also rewriting issue/PR links to not link back to GitHub. There’s also a Tor hidden service for the mirror if anyone happens to need it: e3y5vky4v7snefqyhbn6kcmyl5fo4cnk3a2irh2ttvwua46ww5ubl6qd.onion.

I also push the backups to GitHub (duh) and GitLab:

Code for the backups can be found in GitHub - 0xB10C/github-metadata-backup: Download issues and pull-requests from the GitHub API and store as JSON files. Supports incremental updates. and for the mirroring in GitHub - 0xB10C/github-metadata-mirror: GitHub metadata mirror - gohugo.io site to generate a GitHub metadata (Issues/PRs) mirror - uses https://github.com/0xB10C/github-metadata-backup data. I’ve also written a few words on it here: GitHub Metadata Backup and Mirror

Thanks for the comments!

Yes, that would be possible, we wouldn’t even need to switch, we could have one that constantly clones from Github and then every time it is done the second one can get a GitLab to GitLab clone which is a lot faster because it’s just dumping the databases and not using any APIs. This should be more comfortable for viewers because they would only have one go-to url. Though I am not sure if it’s worth the effort honestly. I don’t expect a lot of people to want to look at the data before we actually need it. Did you have specific use cases in mind for making the data viewable? I guess we should somehow be checking that the backup still works and there is no garbage data coming in but I think that should also be doable via a script.

1 Like

Noting here as well that I talked about this at the Optech podcast and there were some good questions from the audience that are interesting considerations for next steps, so definitely worth a listen: Bitcoin Optech Newsletter #292 Recap Podcast | Bitcoin Optech

I don’t expect a lot of people to want to look at the data before we actually need it.

The mirroring and using that mirror is part of my process to make sure the backups work. My bitcoin-bitcoin mirror is my go to place on mobile to check when wanting to stay up to date with new PRs and issues. A script should also work, yes.

1 Like

It might make sense to add a section on existing backups in GitHub alternatives for Bitcoin Core · bitcoin-core/bitcoin-devwiki Wiki · GitHub. What do you think?

Yeah, definitely. I had forgotten about that page honestly :smiley: Let me know if you want to give it a go first and ping me for review or if I should make a first draft.

I’ve added a section on backups and tooling to the wiki: GitHub alternatives for Bitcoin Core · bitcoin-core/bitcoin-devwiki Wiki · GitHub. Feel free to add your’s too.

I still have my mirror/backup of some of the bitcoin repositories up as well: http://nxshomzlgqmwfwhcnyvbznyrybh3gotlfgis7wkv7iur2yj2rarlhiad.onion

It doesn’t provide a nice user interface like gitlab, though. I’m hesitant to run anything that requires dynamic server functionality.

I should probably add the new metadata mirroring, as I’m still using the ancient bitcoin-gh-meta script.

4 Likes