Solved Git deep clone vs. shallow clone

a6h · Apr 22, 2021

Git deep clone vs shallow clone, what's the difference between the two, and what are the limitations of choosing Shallow Clone over Deep Clone?
I've read https://docs.freebsd.org/en/articles/committers-guide/#git-primer and few other FreeBSD related Git blogs, but I'm still confused.

mickey · Apr 22, 2021

I was struggling with that one too. Basically it's about using the --depth <n> option to limit the history to the last n commits, so with --depth 1 you will only get the most recent commits, but no history. Use of the --depth option also implies --single-branch (unless explicitly told otherwise) so git will in addition only fetch the specified branch but no other branches. The advantage is a saving in required disk space and probably bandwidth too. Just compare a deep clone vs a shallow clone using du(1). If you are not actively working on the sources and are just interested in say keeping your /usr/src or /usr/ports up to date, to build your system from those, a shallow clone will probably work just fine. I was however having a hard time switching my shallow clone of releng/12.2 /usr/src to releng/13.0 and ended up just doing a fresh clone at some point, cause looking for a solution had already taken more time then doing a fresh clone.

zirias@ · Apr 22, 2021

Shallow clones aren't recommended for two reasons:

Accessing a different branch will be cumbersome (in a nutshell, you have to update your refs to include the branch you want and issue a fetch)
The commit count that's computed for inclusion in uname will not work correctly.

That said, you can use a shallow clone of course to save even more disk space. But a full clone with all branches will not consume more space than a svn working copy of a single branch did.

a6h · Apr 22, 2021

mickey said:
I was however having a hard time switching my shallow clone of releng/12.2 /usr/src to releng/13.0 and ended up just doing a fresh clone at some point,

Could you please post the "du" of your complete/finale Deep Clone?

a6h · Apr 22, 2021

Zirias said:
The commit count that's computed for inclusion in uname will not work correctly

That was the part which was disturbing. I think I have to go with Deep Clone.

Zirias said:
But a full clone with all branches will not consume more space than a svn working copy of a single branch did.

In that case, I think the problem solved. Thanks.

Jose · Apr 22, 2021

mickey said:
...I was however having a hard time switching my shallow clone of releng/12.2 /usr/src to releng/13.0 and ended up just doing a fresh clone at some point...

This makes sense if you think about it. The commits for releng/13.0 were not fetched by your initial shallow clone because they were not referenced in the branch you shallow-cloned.

mickey · Apr 23, 2021

Zirias said:
The commit count that's computed for inclusion in uname will not work correctly.

Where is this commit count in the uname output? Looking at my uname -a, I don't see any. Just some hash that's included and the usual kernel build number that increases if you rebuild a kernel. And also that reproducible builds seems to be off by default now on 13.0, i.e. I get the build host, time and other information.

Jose said:
This makes sense if you think about it. The commits for releng/13.0 were not fetched by your initial shallow clone because they were not referenced in the branch you shallow-cloned.

Yes in a way it makes sense, but there still has to be a git way of switching a shallow clone to another branch? I have tried git checkout as well as git switch and some git fetch command I found mentioned somewhere, but ultimately none of it worked. So I just did a fresh clone, which was done in a matter of seconds. git is fast, I give you that.

As for the disk space requirements... interestingly, a bare mirror of the FreeBSD src repository only takes up about 1.42G of disk space but doesn't really compress well:

Code:

% du -hs /home/git/repos/freebsd/src/
1,4G    /home/git/repos/freebsd/src/

% zfs get used,logicalused,compression,compressratio sys/home/git
NAME          PROPERTY       VALUE           SOURCE
sys/home/git  used           1.40G           -
sys/home/git  logicalused    1.42G           -
sys/home/git  compression    lz4             local
sys/home/git  compressratio  1.02x           -

I believe my previous SVN mirror, updated through svnsync was way over 6G in size when I switched to git.

zirias@ · Apr 23, 2021

mickey said:
Where is this commit count in the uname output?

Here:
FreeBSD [...] 13.0-RELEASE #15 config-n244734-3873806c629: Fri Apr 9 [...]
Blue: branch from which was built
Red: commit count
Green: commit hash

mickey said:
Yes in a way it makes sense, but there still has to be a git way of switching a shallow clone to another branch?

It involves adding the missing refs, so you can fetch the other branch you want. See for example this answer on SO: https://stackoverflow.com/a/17937889

mickey said:
As for the disk space requirements... interestingly, a bare mirror of the FreeBSD src repository only takes up about 1.42G of disk space

Solved - How much HDD space do I need to Git clone main (deep vs. shallow)

How much HDD space do I need to 1. Git deep clone main. 2. Git shallow clone main.

forums.FreeBSD.org

Resorting to only cloning a single branch, or even doing a "shallow" clone, only makes sense if you really can't afford the disk space. A full git clone (including a working copy) is still smaller than a svn working copy of a single branch.

mickey · Apr 23, 2021

Zirias said:
Here:
FreeBSD [...] 13.0-RELEASE #15 config-n244734-3873806c629: Fri Apr 9 [...]
Blue: branch from which was built
Red: commit count
Green: commit hash

Then I dont have that commit count. My output looks like:

Code:

FreeBSD [...]13.0-RELEASE FreeBSD 13.0-RELEASE #0 releng/13.0-ea31abc26 [...build info...]

Zirias said:
It involves adding the missing refs, so you can fetch the other branch you want. See for example this answer on SO: https://stackoverflow.com/a/17937889

See also my answer here:

Solved - How much HDD space do I need to Git clone main (deep vs. shallow)

How much HDD space do I need to 1. Git deep clone main. 2. Git shallow clone main.

forums.FreeBSD.org

That looks cumbersome, to put it politely

Zirias said:
Resorting to only cloning a single branch, or even doing a "shallow" clone, only makes sense if you really can't afford the disk space. A full git clone (including a working copy) is still smaller than a svn working copy of a single branch.

The required disk space is not exactly what I am concerned about. Required bandwidth on the other hand is, that's why I am keeping an on-premises mirror of the FreeBSD src git repo, that other machines use to update their /usr/src from. I am not working on those sources, they are merely there to pull updates and then rebuild the system (kernel/world) from it. There is hardly any point in ever going back, the only direction is forward (for example when 13.0-p1 will become available). So I just dont see the need to have that kind of history in something that is merely supposed to be a working copy.

Newer versions of git seem to imply --single-branch when using --depth <n> unless you explicitly use --no-single-branch. But would it make any difference in regards to the ability of switching to another branch when using git clone --no-single-branch --depth 1 ...?

Jose · Apr 23, 2021

mickey said:
The required disk space is not exactly what I am concerned about. Required bandwidth on the other hand is...

Full clone makes sense for you then. Git is very efficient at sending you only new changes when you do a fetch.

mickey · Apr 23, 2021

Jose said:
Full clone makes sense for you then. Git is very efficient at sending you only new changes when you do a fetch.

Basically what I have is a full clone minus the checkout of any branches, a bare mirror of the FreeBSD git repo which I update through git remote update --prune that is then made available to my local network via git-daemon(1). All my other machines clone/update their /usr/src from this mirror, so it doesn't consume any external bandwidth at all. I just don't think each /usr/src on every machine really needs to be a full clone, when all I do is build from it.

Jose · Apr 23, 2021

mickey said:
Basically what I have is a full clone minus the checkout of any branches, a bare mirror of the FreeBSD git repo which I update through git remote update --prune...

It's a clone done with the -mirror option?

mickey said:
...I just don't think each /usr/src on every machine really needs to be a full clone, when all I do is build from it.

So do a shallow clone on the leaf nodes.

mickey · Apr 23, 2021

Jose said:
It's a clone done with the -mirror option?

Yes. When the switch to git came, I wanted to have an on-premises mirror just like I had with SVN, updated through svnsync before and CVS using cvsup before that.

Jose said:
So do a shallow clone on the leaf nodes.

Can you elaborate on what exactly those leaf nodes are? So far what I've used to clone /usr/src on my machines was git clone -o freebsd --depth 1 -b releng/13.0 <url> <dir>, just that for the <url> part I dont use git.freebsd.org but my internal mirror instead.

Jose · Apr 23, 2021

mickey said:
Can you elaborate on what exactly those leaf nodes are? So far what I've used to clone /usr/src on my machines was git clone -o freebsd --depth 1 -b releng/13.0 <url> <dir>, just that for the <url> part I dont use git.freebsd.org but my internal mirror instead.

You're doing a shallow clone on your leaf nodes. What I'm proposing is do a fresh shallow clone when you want to switch branches.

Code:

rm -Rf <dir>
git clone -o freebsd --depth 1 -b releng/13.1 <url> <dir>

If I understand your topology correctly, you have one machine on which you care about bandwidth, but not disk space. You have a full clone --mirror there. You have many machines on which you care about disk space, but not bandwidth. Do a fresh shallow clone on those when you need to switch branches.

mickey · Apr 23, 2021

Jose said:
You're doing a shallow clone on your leaf nodes. What I'm proposing is do a fresh shallow clone when you want to switch branches.

Code:

rm -Rf <dir> git clone -o freebsd --depth 1 -b releng/13.1 <url> <dir>

That's what I ended up doing when switching from releng/12.2 to releng/13.0. Seems this is the quick and painless way of doing it then.

Jose said:
If I understand your topology correctly, you have one machine on which you care about bandwidth, but not disk space. You have a full clone --mirror there. You have many machines on which you care about disk space, but not bandwidth. Do a fresh shallow clone on those when you need to switch branches.

As this is my home network I would not exactly call it many machines, but in principle yes. My internet connection with ~7Mbps downstream is rather slow compared to any modern standards whatsoever and I'd hate to see bandwidth wasted, so it seems logical to do it that way.

grahamperrin · Jan 9, 2022

Security

I should not recommend shallow clones.

Git, shallow clone hashes, commit counts and system/security updates (2021-03-02)

advice from Kevin Oberman
tl;dr "… shallow clone hash will not include the commit count which will be used in future security updates …"

More recently, from the FreeBSD Handbook (24.4.3. The N-number), with added emphasis:

Usually this number is not all that important. However, when bug fixes are committed, this number makes it easy to quickly determine whether the fix is present in the currently running system. Developers will often refer to the hash of the commit (or provide a URL which has that hash), but not the n-number since the hash is the easily visible identifier for a change while the n-number is not. Security advisories and errata notices will also note an n-number, which can be directly compared against your system. When you need to use shallow Git clones, you cannot compare n-numbers reliably as the git rev-list command counts all the revisions in the repository which a shallow clone omits.

Thanks to Warner Losh for his June 2021 update to the Handbook.