I'm surprised git caught on despite mercurial being much superior (hadn't heard of fossil before). Git has the following shortcomings which are major (some shared by other VCS too)
1) UI- terrible, terrible UI
2) Unncessarily complex data model
3) Doesn't scale well to large repos (until Microsoft's VFS- windows only)
I dread every time I have to use hg (and when I can use the git bindings for it).
And I was using hg before I was using git.
git has a terrible UI, granted, but I find the hg UI pretty terrible too.
I really hate their approach to branching (tho that is remedied these days with e.g. bookmarks to some extent).
Speaking of data model, I find gits model to be a lot saner. I REALLY hate that hg spams my disk with tons and tons of files within the .hg directory. Ever cloned e.g. the mozilla hg repos? ugh. Bonus points for their fancy name escaping mechanism in those files, which had me run into "path too long" issues on windows boxes a couple of times already.
Mercurial's bookmarks tend to get into a bad state for me. With Git it's trivial to fix things. With Mercurial I just can't figure anything out, and I think that's because Mercurial tries so hard to hide how things work and to tell me how I should do my work.
Hmm? It’s just a directed graph of SHA1s under the hood. Seems pretty simple to me once you understand that. My understanding was that Hg’s data model is actually way more complicated with more pointers.
As everyone else is chiming in, the reason git won was speed. I haven’t used mercurial in a few years but at the time when I was looking to replace SVN, git did everything seconds faster than Hg which made it the clear winner.
It wasn't just speed, it was also that the Linux kernel was developed under git, that Linus created git, and the existence of github, which a lot of people liked.
After a while, it also just developed a critical mass where the attitude of people/companies that weren't using it almost inevitably became one of "everybody else is using git, so we should too".
All these concepts are extremely useful in practice. I do not see them in any way as “needlessly” complex. Difficult to learn... maybe, but not just for the hell of it.
The Git index is one of its most powerful features. Try doing a `git add -e` sometime. You get to do a bunch of work in the workspace then split it up usefully at commit time.
I think git and hg are both pretty bad in different ways.
Both have pretty terrible UI but so long as one uses magit, git comes out way on top.
The data models are different and suffer different problems. A main issue with git is that it is stupid about file copies and renames. An issue with hg is that it doesn’t work well with long running forked histories (i.e. like git branches) because it stores the set of revisions of a file as a list of blocks of “complete file” or “diff from previous version in this list”
Both have scaling problems to large repos and algorithm/data structure problems which cause too many operations to be e.g. O(size of history) at least. I suppose this is better than Darcs’ model of “commit on Friday and hopefully it will be done by Monday.” If hg we’re naturally good at scaling than e.g. Facebook wouldn’t be putting so much effort into trying to make it scale (e.g. using inotify instead of looking throughout the tree for changes (which I think shouldn’t count as any vcs gains from this), having a mergeless history, rewriting a ton of hg in rust (git was always partly in C and there is now also libgit2)).
The thing that makes me most sad about hg is the lack of a really good (ie good and emacs-based) ui.
I’m all for different vc systems being developed and I think it would be good to see some real innovation potentially break up the current git-hg hegemony.
I think there are lots of good things about fossil (e.g. using an actual database that is going to scale well and avoid data corruption instead of using a specialised data structure that is hard to change and likely not so corruption resistant or battle tested or scalable but maybe let’s your data structure be “faster” for certain operations)
Another interesting vc system being developed at the moment is pijul which can be simply described as “like darcs but fast and more likely to be correct”. It feels a bit like it’s fitting in with the current trend of CRDTs, although it’s core data structure is not a CRDT as that would imply that all merges have some deterministic resolution (ie merge conflicts do not happen) and that is not the case, instead files are allowed to be merged into a first-class conflicted state which can then be resolved by later patches.
> A main issue with git is that it is stupid about file copies and renames.
Could you elaborate on this? As far as I know, file copies and renames will still use the same blob, but the tree referencing the blob can reference it using a different path in the case of a rename or reference it more than once in the case of a copy.
If you tell hg to rename a file, e.g. hg mv foo bar, it will generate a patch which essentially just says “foo was renamed to bar”, and when you look at the diff the only thing that has changed is the name.
If you merge this with a patch that changed foo then hg will do something sensible (ie either merge the changes into bar or give a merge conflict).
Git has no first-class concept of file name changes. Instead it tries to use heuristics to spot renames and sometimes they work and sometimes they won’t. Maybe if you merge a patch renaming foo to bar with one that changes foo the second patch will be applied to bar, but maybe it will behave as if you are merging changes to foo with deletion of foo.
Merging is already hard, dangerous, and non associative. The danger is less that you get lots of annoying merge conflicts than that you don’t get a merge conflict when you should (and therefore you risk accidentally changing the meaning of the merged files without knowing), e.g. if you merge “rename foo to bar” with “delete foo” and git didn’t spot the move then the merge might leave bar untouched when really there should be a conflict between keeping/deleting bar. Having wrong merges happen automatically can be a big risk when software is supposed to be very reliable.
"Git has no first-class concept of file name changes. Instead it tries to use heuristics to spot renames and sometimes they work and sometimes they won't."
Git has the "mv" command. If you "git mv" a file, why would git have to guess or use heuristics to figure out that the file was renamed?
> Git has a rename command git mv, but that is just for convenience. The effect is indistinguishable from removing the file and adding another with different name and the same content.
git diff, merge, and related tools have heuristics for detecting file moves (off by default, turned on with e.g. git diff -M) but they tend to break if a file is both moved and modified in the same commit.
Yeah exactly why I always commit renames right away and atomically. I mean it's a _relatively_ rare thing and when I have to do it I just want to make and record the change and then move on. Renaming and then modifying a file in the same commit is slightly sloppy imo. Not to say that the tool couldn't be doing a better job.
Because git stores sets of files. If you move a file and make a new commit, it's just a new set of files which says "this old set is my parent". There is nothing in there about the renamed file.
I don't know enough about Git internals in this regard, but it's possible that in fact the index is just being compared to HEAD to infer that information. Index has a file called "bar" and none called "foo". HEAD is vice versa.
Yes, Git has "mv". Git also detect file rename. I'm not aware of the method though. I'm talking based on experience. I renamed some file normally, without Git. When I checked the `git status`, Git says it was renamed.
It has nothing to do with the Windows filesystem; Git simply cannot support a 5 GB working tree on any filesystem. You can call this "pathological" but this throws a lot of shade on monorepos without much critical examination of how or when they might be useful.
To be precise, git's scaling issues mostly relate to centralization and file count rather than raw size.
* Pushing changes to a central repo requires including upstream commits. With 1 commit/s to that central repo, all developers are stuck in a loop until their push succeeds. It is a human spinlock with high contention.
* Some algorithms scale linearly with the number of server branches, such as pull-without-specifying-a-branch, which becomes too slow with 100K branches (a consequence of central repos).
* Some algorithms are linear with the number of files, like git status.
* Binary files don't compress nor deduplicate well, slowing pull and clone.
> It has nothing to do with the Windows filesystem; Git simply cannot support a 5 GB working tree on any filesystem.
Can you provide a reference? I was searching a bit and only things I found was bugs in windows[1] for git lfs.
> You can call this "pathological" but this throws a lot of shade on monorepos without much critical examination of how or when they might be useful.
Windows codebase has 3.5 million files and its repo is 300GB in size. It is not normal. This is google or MS type of problem and not average git user. MS instead changing workflow decided to create GVFS[2]
> Can you provide a reference? I was searching a bit and only things I found was bugs in windows[1] for git lfs.
Apologies, I hastily mistyped, I meant 500 GB, not 5. (5 GB is about the size of my repository, which is not really so big at all and certainly something git can cope with on its own).
This series of articles should illustrate some of the issues that VFS for Git tries to address. ("GVFS" is now called "VFS for Git".)
And this is a series of articles from an engineer who's been working on improving perf in large repositories in general, not strictly related to the Windows repository:
> Windows codebase has 3.5 million files and its repo is 300GB in size. It is not normal. This is google or MS type of problem and not average git user. MS instead changing workflow decided to create GVFS[2]
I didn't say it was normal. Indeed it's uncommon. I said it wasn't pathological.
Plastic SCM claims that 5TB works and that 50GB is the average size in their cloud offering. It seems that the Free Software world does not care about such use cases.
I am not a beginner and not trying to boast but pretty smart and experienced.
The problem is that the edge cases that come up have solutions which need to be looked up- not derived from understanding. And when you're scared of data loss, its a very frustrating situation
Interesting ... One sane thing about git is, it is very difficult to lose data. You have to work out of your way to lose data like delete your local and remote histories. Even if that is the case, if someone else has branched meanwhile, it can be restored without any fuss.
Git makes it difficult to lose committed data. It's easy to lose uncommitted changes. Also, someone can know that changes are in the reflog but not how to recover them without making a bigger mess.
A versioning file system tracks all changes to each file. On an old VMS machine, you might have X;1, X;2, and X;3 where the number after the ';' is the version number. Normally directory listings only show the 'X', which refers to the most recent version, but you could have it display all the files.
This makes it easy to compare, say, the state of the file now with the state of the save from 3 hours previous.
https://en.wikipedia.org/wiki/Versioning_file_system points out "Subversion has a feature called "autoversioning" where a WebDAV source with a subversion backend can be mounted as a file system on systems that support this kind of mount (Linux, Windows and others do) and saves to that file system generate new revisions on the revision control system."
> the use case for this feature can be incredibly appealing to administrators working with non-technical users: imagine an office of ordinary users running Microsoft Windows or Mac OS. Each user “mounts” the Subversion repository, which appears to be an ordinary network folder. They use the shared folder as they always do: open files, edit them, save them. Meanwhile, the server is automatically versioning everything. Any administrator (or knowledgeable user) can still use a Subversion client to search history and retrieve older versions of data. ...
> however, understand what you're getting into. WebDAV clients tend to do many write requests, resulting in a huge number of automatically committed revisions. For example, when saving data, many clients will do a PUT of a 0-byte file (as a way of reserving a name) followed by another PUT with the real file data. The single file-write results in two separate commits. Also consider that many applications auto-save every few minutes, resulting in even more commits.
It adds that Clearcase supported a similar feature.
The git data model is certainly complex, but I'd be curious to hear why you think it's 'unnecessarily' so. I often try and drum up a new SCM in my head and the data model gets pretty complex every time.
Git has the advantage of being the right kind of terrible.
If you only push commits to a branch and use GUI tools like Github or Gitea for merging, chances are you're never going to be exposed to anything more complicated than 'add; commit; push'.
Even merging isn't that terrible and while still being painful, git makes it somewhat clear what you want.
The problem is; if you want people to use something else, you need to improve over git in the areas that matter to most people (ie, 'add; commit; push').
Git is terrible but just good enough that improvement over that terrible will be hard to accept for the mainstream.
I have been using git since it was released and have never needed to think about its underlying data model in order to accomplish dev tasks. Also I suspect that any complexity in the data model was quite necessary to implement its api and features in a performant way.
Git won over Mercurial simply due to Github. There were some other minor contributing factors - association with Linus, speed - but they are insignificant compared to how popular Gihub was (for good reason) and therefore how many people were exposed to git.
The Mercurial alternatives like bitbucket just didn't have the same spread, and we got stuck with year after year of teaching new people a difficult interface.
Git was more popular for C, Perl, and Ruby. Mercurial was more popular for Java and Python. It was far ahead on Windows. Google and Atlassian bet on Mercurial well after GitHub existed. The idea that only one could win would have been strange to a lot of people at the time.
I still push for SVN when we're doing work that centralization makes a great deal of sense. For example, I look at SVN+puppet to be an exceptional combination... and I really don't need that puppet repo to be distributed.
Then again, if I'm working with people in different areas, and want them to have a full reproducible copy of the repo, git it is.
I maintain a "use the right tool for the right job". Sometimes the Cathedral wins out, and other times the Bazaar wins out (NO! not the Bazaar source control!).
To give a personal, subjective point of view of why I switched from hg to git:
- hg was horribly slow compared to git;
- I love the branchs model used in git to let several persons work on different parts of the same project, and I could never find a satisfactory equivalent using hg idiomatisms.
It is true that hg has a far better UI in general, but Magit fixes this problem for me.
The UI thing is more than superficial though. Yes, the command line is horribly inconsistent (and that can be fixed with Magit), but the real issue is that if you want to do anything non-trivial you have to understand how git works - what the object model is, how refs work etc.
I used mercurial successfully, quite heavily, and I couldn't tell you much about how it's implemented.
To add more praises about Magit: I would really love to use other DVCS, especially Fossil because it has built-in issue tracker and Wiki and all, but to be unable to use Magit with Fossil is a bigger drawback for me to make a switch.
At this point, I can basically do most of Git commands in Magit with just muscle memory; say, stage everything in the tree, commit amend, reset author and dates, then force push (I know) to remote I would just press with evil-magit: <SPC>gsSc-Racp-fpy (<SPC>gs for invoking Magit, S to stage everything, c to enter commit mode, -R set the reset author flag, a amend commit, p enter push mode, -f set the force flag, p push to origin, y confirm force push)
It may sounds complicated, but the Magit UI is discoverable, and once you're used to it you can do anything without even looking at the UI...
I'm not sure any VCS scales well to huge repos, and the MSFT work in this space is truly amazing. I love, for example, their use of a Bloom filter to make git blame fast!
I actually like the git UI. Like many git power users, I've come to terms with a subset of the UI that I know how to use very well. The thinness of Git's abstractions lets me think of complex VCS operations as I do when reading or writing code. If the cognitive load is too high, you can always just use a merge-heavy workflow and stop thinking about the mechanics, but I recommend instead to understand the mechanics.
The data model in Git is hardly more complex than Fossil's or Mercurial's, and it's copy-on-write all the way, which makes it very safe (think ZFS).
The simple data model is the best part of git so I have no idea what you are referring to. I do not see how one could make it any simpler and still use it to implement a DVCS. The main reasons I picked Git over Mercurial were the data model and the performance.
Re 2: HG doesn't have a published data model. I believe they have created 2 or 3 models at this point so far, but I happens under the hood. They can do this as there is only 1 implementation of HG, so no need to worry about compatible as much.
It is rare that adaption has anything to do with the strength of the technology involved. Usually the winners are those that are early to market, are adapted early by industry leaders and/or have better marketing.
In this case the reason is most definitely the fact that it used in Linux, easily one of the biggest open source projects ever. I would wager a guess that if Mercurial was created earlier, Git would probably never have been created, let alone be adapted for Linux.
Mercurial and Git started
around the same time. Linus was concerned that Mercurial was similar enough to BitKeeper that BitMover might threaten anyone who worked on it or at least anyone who had used BitKeeper.
I would rather like a new option which was designed as a VCS from day one, that is user friendly and fast. Fossil is actually nearly there (I used it for a bit for some private projects)
1) UI- terrible, terrible UI
2) Unncessarily complex data model
3) Doesn't scale well to large repos (until Microsoft's VFS- windows only)
(and many others...)