Other git: correct way to merge/pull commits/branches into master retaining history?

sko · Sep 4, 2024

I need a bit of guidance regarding the correct way to merge a bunch of branches/patches in a git repository and create a new release/tag in the master branch.
My usual usage of git is *very* rudimentary and I've never dealt with merging branches (or never cared how the log looks afterwards), so I'm a bit lost here.

I have forked a git repository of a tool where I've been writing patches and additions to for quite some time. I never touched the "master" branch but always created a distinct branch for each new patch or feature. From those I created pull requests to the original project (which has been abandoned for quite some time now) and diffs for patch files that I use for my own poudriere builds via a 'local patches' hook.
The 'master' branch is still at the same state from which all those branches were created.

I'm at a point where all those lingering branches and patch files are becoming tedious to work with, so I want to create a new 'release' (or a tag? what's the actual difference or better option?), for which I have to incorporate all those branches (mostly holding 1-2 commits) into the master branch.
The master branch should continue to reflect all commits with their original author, date and description, so it shows the full history and not just one huge commit of 'merged branch XY'. Given that the 'git log' of the master branch already looks like that (all commits with their original author, date, description), I suppose this is the 'standard way' of doing this - but I couldn't find out how.
As a bonus, I have at least one branch/commit from another remote repository (which also was a PR to the original project) that I want to include, again preserving the original commit with the original outhor, date and description.

When searching for this topic, one gets half a dozen different opinions with a dozen different 'type-those-commands' walk-throughs (without much explanation) and even more how-tos that heavily vary in complexity and often seem to address a very special edge-case.
I already tried various options to 'git merge', but I always only get a single merge-commit that doesn't reflect the actual commits.
Do I have to create 'local pull request' within my repository and pull them into the master (or for now, my 'newversion') branch? Cherry-pick every single commit? Or can one cherry-pick branches to pull in all commits from that branch?

zirias@ · Sep 4, 2024

Git doesn't enforce a specific workflow, you're pretty free to do anything you want. From what you write, you probably want a "rebase and fast-forward merge" workflow (I personally think that's the sanest option anyways). Say you have some master branch you didn't touch so far, and your own branches feature_a and feature_b, then you'd do the following:

Bash:

git checkout feature_a
git fetch origin master:master # to make sure to have the latest master
git rebase master
git checkout master
git merge feature_a # when fully rebased, will just add the commits to master
git branch -d feature_a # fully merged branch can be deleted, it's redundant now
git checkout feature_b
git rebase master
git checkout master
git merge feature_b
git branch -d feature_b
# [...]

Regarding your question about releases, no such thing exists according to git. A "release" is a feature of e.g. github, and it's based on a tag, allowing to add extra info and distfiles. So, first you'd create a tag. I'd recommend a gpg-signed one, e.g. git tag -s v1.0.

sko · Sep 4, 2024

zirias@ said:
Git doesn't enforce a specific workflow, you're pretty free to do anything you want.

That seems to be the 'problem' - everyone doing it wildly different and possibly not in a sane way

Thanks for your explanation. I think I now understand where my error in thaught was... I *occasionally* managed to get a merge with the exact outcome I wanted and without git asking for a commit message for the merge.
It seems not all of my branches (mabye due to re-cloning the repo from remote?) are still considered to be descendands of master any more:

Code:

% git checkout fix_rollback
Switched to branch 'fix_rollback'
Your branch is up to date with 'origin/fix_rollback'.

For other branches where a 'clean' merge is possible I get th expected:

Code:

% git checkout allow_mlock
Switched to branch 'allow_mlock'
Your branch is ahead of 'origin/develop' by 1 commit.

So I suspect I have to git rebase master every branch before merging it to get a 'fast-forward' merge? At least this seems to work and give me the clean git log I want.

The only case where this doesn't work is the remote branch where I want to pull one commit from. This branch was created from an earlier point of the master branch, and if I checkout that branch I get a warning about 'detached HEAD' state. So rebasing to 'master' doesn't work and creating a new branch will result in a new commit by me instead of the original author.

EDIT: Seems like cherry-picking is the way to go in that case. This retains the original commit with author/date/description and doesn't trigger a new commit.

zirias@ · Sep 4, 2024

Wait, no, there are a few wrong assumptions. I'll start with a very rudimentary "how does git work" because I think that helps: It's not your classic source code management and/or revision control system. It's instead a very elaborate patch management system.

Every commit in git is a patch (and identified by a hash). The patch is applied on top of the "parent commit" (another patch of course). Except for the special "merge commits", every commit has exactly one parent (or none when it starts an "orphaned" branch, which the first branch in a repo always is).

A branch in git isn't really different from a tag. It's a name given to one specific commit, like a pointer. When you add another commit "to a branch", it means to add another patch on top of whatever the branch name currently points to (so this becomes the parent of the new commit), and then move the branch name to this new commit. If another branch happens to point to exactly the same commit before, it's still not touched of course.

Note that this design makes it impossible in general to ever know "on which branch" a specific commit was introduced. All you have is the "parent commit" relationship and the names (branches and tags) pointing to specific commits.

----

With that intro:

sko said:
It seems not all of my branches (mabye due to re-cloning the repo from remote?) are still considered to be descendands of master any more:

I don't think so. As explained, there is no such thing as "descendent of a branch", it's impossible to know. A fast-forward merge is only possible when a branch is fully rebased. That means its commits are all "re-played" (you know, series of patches) on top of the commit the "target branch" (here: master) points to. Then a merge is straight-forward, all you have to do is "move where master points to", and you won't need any merge commit with more than one parent.

As for the examples you show, this looks to me more like you pushed some branches to your remote repository while other branches only exist locally, another completely different thing...

sko · Sep 4, 2024

Thank you again for that detailed explanation. I had some fragments of that already somewhere in the back of my head, but not that all branches (and hence their 'git log' timeline?) only boil down to "parent commit" relationships of each single commit.
If I got this right: A branch is merely a pointer in the line(s) of commits that are 'linked' together by their individual parent commit.
So a rebase simply changes the initial "parent" of the first commit in that branch to the commit where the initial branch currently points to. Merging then moves the pointer of the branch (e.g. master) forward to the commit the (diverged) branch points to, which automatically inherits all "parent commit" relationships, and because it is a direct descend it works without any intermediate "merge commit".

And yes, I don't have all my local branches pushed to remote. This is true e.g. if they are purely experimental or at a very early stage but mainly if those branches contain just very small patches that I don't want to clutter the remote repository with.

The solution for me was still to just checkout each branch, rebase it with master, checkout master, merge branch [repeat for each branch]. The single commit from a foreign fork could be inherited via cherry-pick.
This way I now ended up with all the 'original' commits and no squashed "merged XY" commits in master (or actually the "develop" branch for now, until I worked through all items I want to add/fix for the new tag).

zirias@ · Sep 4, 2024

sko said:
If I got this right: A branch is merely a pointer in the line(s) of commits that are 'linked' together by their individual parent commit.
So a rebase simply changes the initial "parent" of the first commit in that branch to the commit where the initial branch currently points to. Merging then moves the pointer of the branch (e.g. master) forward to the commit the (diverged) branch points to, which automatically inherits all "parent commit" relationships, and because it is a direct descend it works without any intermediate "merge commit".

Sounds right to me, yes!

As far as I'm concerned, this "rebasing" workflow is most often the superior way to work with git. What I think makes it very nice is that you'll run into conflicts during rebase, which only affects the branch you work on, not the branch you want to "merge back" to. So, when you're done, there just can't be any conflicts (your first commit on your feature branch already has the tip of the master-branch as its parent), and the resulting "history" on this master-branch is perfectly clean, just a series of patches that fit onto each other.

The only drawback of the rebasing workflow is that it doesn't scale with team size. Rebasing is actually a "history rewrite" ... every single patch (commit) is re-applied, which possibly changes its contents, and even if it doesn't, the fact that it has a different parent now already changes the hash, so, you'll have a completely new and distinct commit after rebasing. Therefore, if other people work on the same branch, you have to coordinate rebasing, so nobody loses their changes or end up into a hell of conflicts. That's no issue at all when you work alone on a branch, it's probably still feasible with 2 or 3 people, it's certainly unmanageable when 10 or more people work on the same feature branch. Git does also have "merge commits" (which are commits with more than one parent), and if you really have to work on a branch with lots of other people, that's probably the way to go instead ... much closer to what classic RCS do.

patmaddox · Sep 6, 2024

Yeah this is always an interesting thing... zirias pretty much covered it, but here's what I do:

- global configure --ff-only so I don't get merge commits accidentally (I can still merge with --no-ff if I want an explicit merge)
- main tracks upstream, I don't touch it
- a branch per feature: feature-a and feature-b
- pull main, rebase feature-a and feature-b on it in their respective branches
- a makefile for my personal branch which hard resets to main and then cherry-picks distinct commits from feature-a and feature-b

In the end what I have is "my main" - which is upstream main + my commits, as though I had developed them that way.

sko · Sep 6, 2024

zirias@ said:
What I think makes it very nice is that you'll run into conflicts during rebase, which only affects the branch you work on,

I ran into this with one branch because commits touched the same line of code, and yes, it's nicer to deal with that in the branch during rebase and not during the merge to master, which might mess up it's history.

zirias@ said:
The only drawback of the rebasing workflow is that it doesn't scale with team size.

Currently my team consists of exactly one person (me), an that won't change in the forseeable future. Any patches that might come from outside are via pull-requests, and they are usually against the latest end of the master (or develop) branch, so they merge cleanly.

patmaddox said:
- global configure --ff-only so I don't get merge commits accidentally (I can still merge with --no-ff if I want an explicit merge)

thanks for that hint, I just added 'merge.ff' to my git config

patmaddox said:
In the end what I have is "my main" - which is upstream main + my commits, as though I had developed them that way.

that's how my ports trees on my buildhosts look like - "origin/master" (or the current quarterly) + my local changes that don't work via a 'local-patches' hook in poudriere. The rebasing allows me to always do a "git pull" to get the tree updated, and my local commits stay on top of the history.