(This is a Meta post but for Bitcoin Core, so I wasn’t sure where to put it. Feel free to move it but I think it might also be a good idea to have “Dev Tools” category or something similar.)
Last year, I set out to investigate using a self-hosted GitLab instance as an option to create backups of the Bitcoin Core GitHub repo which could not only persist the information but also be browsed and used as a basis to continue development quickly, no matter whether the project might want to switch any time in the future or be forced to do so.
While the setup of a GitLab server and initial trial runs looked like this project might be a matter of days, getting a full backup with all data to succeed took a lot of research, conversations with GitLab support, and also much trial & error. Earlier this week was the first time a full backup run succeeded. The result can be seen here with the latest data being from 2024-02-26: bitcoin / bitcoin · GitLab
I have documented all the relevant findings in this gist: Self-hosting Bitcoin Core on GitLab · GitHub It should have all the necessary information for others to set up their own GitLab backup server. If anything is unclear or doesn’t work, please send me your feedback.
I will leave the backup untouched at least until the end of next week so that people can look at the data and give feedback. I will then run a script that performs the backup regularly and persists it as well. But the downside of this will be that the data can not be viewed anymore most of the time since this is not possible while the import to GitLab is running and the import process apparently takes up to 36 hours due to GitHub API restrictions.
Last but not least, of course, GitLab can also be used to back up other repositories that are not Bitcoin Core. But in that case, you probably don’t even need all of the special configurations because these are mostly needed due to the scale, contributor base and level of activity on Bitcoin Core.
Thanks for looking into this and sharing what you’ve found. Sad to hear that a continues import between GitLab and GitHub isn’t possible.
Would it be possible to run two instances where one is importing while the other is displaying and then switch them over once the other is done importing and so on? Could also do this on a 48h timer (you mentioned the import takes 36h).
I’ve also set up some GitHub backup and mirroring of Bitcoin related repositories last year. I think putting some links here makes sense in case someone is looking for backups or a mirror. Sorry for hijacking your post.
Yes, that would be possible, we wouldn’t even need to switch, we could have one that constantly clones from Github and then every time it is done the second one can get a GitLab to GitLab clone which is a lot faster because it’s just dumping the databases and not using any APIs. This should be more comfortable for viewers because they would only have one go-to url. Though I am not sure if it’s worth the effort honestly. I don’t expect a lot of people to want to look at the data before we actually need it. Did you have specific use cases in mind for making the data viewable? I guess we should somehow be checking that the backup still works and there is no garbage data coming in but I think that should also be doable via a script.
Noting here as well that I talked about this at the Optech podcast and there were some good questions from the audience that are interesting considerations for next steps, so definitely worth a listen: Bitcoin Optech Newsletter #292 Recap Podcast | Bitcoin Optech
I don’t expect a lot of people to want to look at the data before we actually need it.
The mirroring and using that mirror is part of my process to make sure the backups work. My bitcoin-bitcoin mirror is my go to place on mobile to check when wanting to stay up to date with new PRs and issues. A script should also work, yes.
Yeah, definitely. I had forgotten about that page honestly Let me know if you want to give it a go first and ping me for review or if I should make a first draft.