Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google Drive: Shortcuts replacing files and folders stored in multiple locations (support.google.com)
138 points by cube00 on April 12, 2022 | hide | past | favorite | 71 comments


I worked on the original (long since superseded) implementation of the metadata store for Google Drive, i.e. the system which was responsible for tracking file / folder relationships. The requirement to allow an item to appear in multiple locations was a huge complication, in part because of the way it interacted with permissions being inherited from a folder to the items in that folder. I imagine this change may be motivated by a desire to move away from that complex model, and whatever team owns that system now may be very happy to see it going away.

(IIRC, the requirement stemmed from the need to support the various applications that were being folded into / integrated with Google Drive, such as Photos which of course allows a photo to appear in multiple albums.)


Googler, opinions are my own.

This was my understanding as well. The original Drive was built effectively as a directed graph (with cycles allowed). Any file or folder could be stored in multiple locations. And permissions were at a per-file basis, so 2 people viewing the same folder may see different sets of files.

And permissions were definitely a hard part of it, as if you applied new permissions to a folder and all children, it had to walk the entire graph to update the permissions.

This is the advantage of the Team Drive style structure that the Drive team put out. It follows the classic filesystem design of a tree, which allows for easier permissions modeling, among other things. It's also why all "hard links" are now becoming shortcuts / Soft-links.


Forgive my ignorance (and there's a lot of it), but navigating a graph and setting permissions doesn't seem like a terribly difficult problem, especially for Google - the king of graphs, in a word. I would think maybe the issue has to do with how permissions of a file/folder apply to different users, but then isn't that just exactly the raison d'etre of permission systems?

This isn't an area I'm qualified to have a technical opinion on but if you'd care to elaborate I'd be interested to learn more.


Googler opinions are my own. I've never worked on Drive.

It's not hard, but expensive.

If you have a file system where it is a proper tree in any given node within that tree only has One parent, and walking up all the parents you always only have one parent up to the root node, you gain some useful properties from that. You can build a permission model where you just have to check the permissions of all of your parents to see if someone has access to a given file deep within the system. This means that if you Grant Reed access on any node within that line of parents, all children implicitly get access.

Now imagine a directed graph, and you set permission on a node and you want to say all connected nodes also should get this permission. You could definitely walk all the nodes and apply that permission, but now you have to write said permission to every node, rather than being able to set it on a parent and let all children inherit it. Even if you know all of your parents, then all of those nodes have multiple parents. And you can have cycles. Checking parents for permissions becomes way more complicated for users to understand, which is why Drive likely didn't do this.


Can you explain why team drives (now called shared drives) have a limit of 400.000 files while there are no limits and unlimited sharing for files in "my drive"?


I worked on another sync client's representation of filesystem structure, and came to the same conclusion. Hard links enable some cool behavior, but in retrospect added more complexity than anyone expected. Migrating to shortcuts / soft links seems very reasonable - I wish I had started there.


What kinds of cool behavior do hard links enable?


Every file is essentially the master pointing at the same inode. So the file remains as long as any of the hard links exist. With soft links if you delete the master file all the soft links stop working.


Apple's Time Machine uses hard-linked directories to replicate your home directory again and again, but with only the changed directories/files actually taking up space. The unchanged directories are just hard-linked to the previous versions, so any one version looks like your entire drive, but it takes much less space than full copies.


Used to, the new APFS implementation uses snapshots.


> various applications that were being folded into / integrated with Google Drive

The Photos/Drive integration was removed a long time ago. What other integrations were behind the original requirement? I'm curious to know if the extra complication was worth it in the long run and how long the integrations that needed this feature hung around for.


I don't specifically remember whether there were other motivations for that requirement. Possibly a desire to support "tag" style interfaces, where an item can have multiple tags applied. But it may have mostly been the Photos integration.

And no, definitely the complication was not worth it in the long run. :-) That project got way more complicated than anyone wanted (not that there's anything unusual about that).


I actually remember early Google Drive to have only tags for files, and no folders. Apparently, the idea was to distance from the file tree paradigm, but eventually the traditional approach proved to be more user friendly.


So is this basically a switch from hardlinks to softlinks?


It's a switch from inference to direct collection of intent from the user.


How so? Are hard links still allowed?


Yes.


Shouldn't de-duping be an implementation detail of the virtual file system, completely hidden from the user?

Isn't that how file sharing/syncing services have worked since the MegaUpload days?

If I upload the same file to two separate folders it's because I want two separate copies. If I change one of the copies, I don't want it to change the other copy.


> If I upload the same file to two separate folders it's because I want two separate copies. If I change one of the copies, I don't want it to change the other copy.

If that's what you were doing, you won't get affected by this. This is about files/folders that were "hardlinked", which was difficult to do by accident. I think you had to hold Ctrl while dragging the file into another folder, or something like that. (The key to notice is that they're talking about one file being in multiple directories, not multiple files with identical contents.)


>If I upload the same file to two separate folders it's because I want two separate copies. If I change one of the copies, I don't want it to change the other copy.

That's precisely the behavior you'll get. You were allowed in the previous implementation to upload a file to one location and then put it in 2 locations, such that you would have changes in either location reflected the same way. This wasn't 'copying' a file, it was multiparenting it.


This is a concern of presentation, not the backing implementation: How to present (approximately) hardlinked files to users, with the possible complications of varying support for hard and soft links or other mechanisms across the various client implementations of Google Drive.

There's advantages and disadvantages to using either.


I see. I don't use Drive via the web/app interfaces much but essentially, hard links are a thing already and users must've inadvertently created them anyway when doing actions via web/app.

Any idea if these hard links can be created with Drive clients on Mac/Win?


They can't. The data model in Drive will no longer support hardlinking, period.


Precisely. I can understand if they want to change the wording on their interfaces going forward to promote folks to use "shortcuts" but it sounds awful if they're going to do this to existing files/directories.


If someone has a Drive desktop client installed, has two source-code directories with some identical files in them, and modifies one of the identical files in just one of the directories, I can imagine they'd be very surprised when the other copy in the untouched directory also changes.

I'm on Linux where there's no official Drive client, so this won't happen to me. (I use Syncthing instead.)


What you are describing sounds like two distinct files with the same content. The change only affects the same file that has been "hard linked" into two separate folders. Copies of files are unaffected.


Was this hard-linking only possible when performing an action via the web/app interfaces? Because the wording makes it extremely confusing. Even I thought this was going to affect copies of the same file in multiple places. I don't notice any points in their support doc which tells me otherwise.


I'm not sure how the sync worked. But for your use case I suspect you won't have this problem.

I'm 99% sure that this is talking about "hard links" not "files with the same contents".


Thanks for clarifying and I do think your explanation(s) make sense. Kinda wish it was explicitly stated in the support docs as well.


I definitely see where your confusion is coming from even if I didn't think of that use case initially. It would definitely be a good idea for them to explicitly call out the difference.


There was a keyboard shortcut Shift+Z to add a file to another location (i.e. create a hard-link). Not sure if it still works. The sidebar also has a list of locations that a file is in. Also there are public APIs and documentation for everything.


I see. I think your "hard linked" term is the difference between the "Add shortcut to Drive" and "Make a copy" options when right-clicking a file in the web UI. If this announcement affects only files created with "Add shortcut to Drive," and uploaded files that happen to have the same content as another file aren't automatically turned into shortcuts, then I'm less alarmed by the change.


No, "hard link" here essentially means "folders are really just (tree of) labels", so a single file can be assigned to many folders. All references are shared, and a file is only deleted if you delete it, not merely remove it from a folder (label).


Note that "Add shortcut to Drive" is the new behaviour and new action. It was called something different before. Maybe "Add to folder" or something like that. I believe it was always distinct from the "copy" feature.


Unless it happens on the backend - which is what the article appears to say.

I'm on Linux with Syncdocs for syncing Google Drive so will wait and see how it handles things.


Google Drive changing causing issues like this is why I moved to Syncthing. In google drive every so often I would have to de-duplicate a bunch of files appended with (1).


I think this is a good move – the cloning UX experience was a nightmare. I've moved many shared files to Team Drives because the language is easier for most of understand.

I imagine this was a tough call for a PM, with a lot of cases to consider and account for given this is so embedded in the Drive product DNA.


"The process will replace all but one location of files and folders that are currently in multiple locations. The files and folders will be replaced with shortcuts."

Is there any way for the user to specify that they want a full copy of a file?

What happens if another user makes a copy of the file and alters it? Are both copies changed?

"The replacement decision will be based on original file and folder ownership, and will also consider access and activity on all other folders to ensure the least possible disruption for collaboration."

"You can’t opt-out of the replacement."

This might be a deal-breaker for some users. Why not just ask the user if they want a replacement versus a full copy?


> This might be a deal-breaker for some users. Why not just ask the user if they want a replacement versus a full copy?

Shortcut preserves semantics: working on the original file or working on a shortcut to the original file will both modify the same document. Fully copy (create a new document with same contents as original document at a point in time) would bring new semantics.


It would also have quota implications - if I had a 5GB file multiparented in 3 locations, I wouldn't want to suddenly be over quota because Google decided I really wanted it copied to 3 locations.


You can still use File: Copy for a second, unrelated copy.


So, hard links are being replaced symbolic links.

Most people I know prefer symlinks for most uses, so this feels like better UX.


What happens if you put a file in both the House Projects folder and the Current Projects folder, and then want to delete one of them?


One folder contains the original file, and one folder contains a shortcut to the file. There's no concept of a file existing in multiple folders with this change, only 1 source file and shortcuts linking to it. If you delete the original file, I assume the shortcuts are deleted too.


Yes, and that's annoying.


I couldn't figure this out from reading the article but perhaps someone here knows. Say I want to create a new edited version of a document without changing the original, so I first duplicate that doc and then edit the new copy of it. Does this new Drive behavior mean that the original document I copied from will be changed as well?


No, you just need to make a copy of the file, not a shortcut to it.


Google have been working up to this change for quite a while now. Rclone supports shortcuts from version v1.54.0 released on 2021-02-02. I've been impressed with the communication from Google and the care they've taken to keep things working in what must be a difficult transition.

I hope that this change can finally unlock the API to be able to return all the children of a given node recursively. Multiple parents make this much harder.

The drive API doesn't have that at the moment and it makes traversing deep directory trees really painful.

An API search term to find the objects which have a given ID as an ancestor at any depth would be fantastic.


I use Google Drive as a trailing safety backup. Client installed on a machine that updates once a month with all the files from drive. Manual process. In case something happens and a folder gets wiped off of Google Drive I can walk over and transfer the files from the trailing safety backup since Google Drive actually copies the files.

Under the new process the files will no longer exist on the drive, but will now be links to files on Google Drive. Is that correct?


If you're actually copying the files then no; this removes the ability for a file ID to be able to exist in two places at once, but if you're copying drive->another medium->drive a new file ID is being created on the user's drive. This is regardless of whether or not the file is being de-duped behind the scenes.


Thank you for the clarification. I should have specified I turn on sync through Google Drive once a month and not a direct copy. Going forward if I stay with Google Drive I'll need to switch off sync and just do a direct manual copy.


Until today, I didn't know you could create hard links (ie "multi-parent" files/folders) in Google Drive.

How did one achieve this? I would like to know, because I'm wondering if I have unintentionally done so. I gather this is completely different from "Make a copy" in the web UI. So how did you do it?


There were many ways, all of them hidden very well, one of them was pressing Shift+Z in the web interface while selecting a file or directory.


So, if I intentionally have the same photo in two different directories, because I want to edit one while keeping the other as a fallback, GDrive has just killed off my failsafe. Do I understand this correctly?

If so, what more compelling reason could there be to go migrate right away to Dropbox (or a similar service)?


No you did not understand that correctly. You are confusing the same file stored under two locations and two different files stored under two locations with the same contents. This has been discussed multiple times in other comments on this thread.


What happens to a file if post-replacement the file is modified? Does it modify the link target, or is it copy-on-write?

I might want to have different copies as "snapshot" and "working", de-duping them makes any version-control-like system mutable, doesn't it?


If you were storing the same document in two folders before, you did not have two different (snapshot/working) copies. You had one document.

There is no de-duping mentioned anywhere in the Google support page.


ok, so previously Google Drive used hard link equivalents and now they're using soft-link equivalents?

thanks!


Slightly related, has anyone moved off Google Drive and into NextCloud or similar and been happy with it?

I'm losing access to the unlimited Google Drive storage that my uni provided and trying to figure out where I should move to.

A NAS would be great but at this moment I'm too nomadic to want to worry about that.

I'm fine with paying but would rather pay an organization that's very respecting of privacy and less likely to nuke your account without warning if you do something they don't like.

Only need a few hundred GB of space.


I switched from Dropbox to Nextcloud, and within a couple of months from Nextcloud to Google Drive. Really didn't like the software and it was so slow since I wasn't self-hosting it, but renting it from Hetzner.

Currently thinking of switching from Google Drive to Syncthing, since the new Google Drive clients suck and Google is going to be making my service worse with the new G Suite changes.


The new Google Drive client is horrible with lots of files. It is doing something very wrong in its sqlite database and completely killing performance of the drive where AppData lives, even if the Google drive folder is on a different disk.


can confirm this exact behaviour. its supposed to do intelligent caching/streaming but as mentioned in parent, my DriveFS %APPDATA% filled up with hundreds of GB of temp files. had to uninstall, deltree, reinstall to fix. bit of a pain.


Nextcloud is too slow and bloaty.

I'm using https://filebrowser.org/

You can run it on a VPS, NAS or homeserver.

If you want something managed, you can pay Hetzner for managed nextcloud.


Note rclone has been aware of shortcuts for some time and if you wish to sync the original file there is a flag.


How will this affect the average user? Will it no longer be possible to store duplicate files as original files under different directories?


There is File:Copy to create unrelated copies.


great so now we can explain what a symbolic link is to cynthia


So dropbox always deduped data at it’s backend without any such user facing change. How is this better than that?


This is unrelated to that. It isn't about de-duping data, it's about symbolic links ("shortcuts") vs. hard links ("multi-parenting"). You can still make copies of files as usual, and the content can be de-duped (or not) transparently to users.


This whole article sounds like "we're remaking the backend, and no longer support the same file being in multiple locations, so we're just going to break any users using that feature."


My understanding is that this is mostly an attempt to improve the UX around permissions. The inherited sharing and permissions for files in multiple folders were incredibly confusing.

They have disallowed this for Shared Drives from the start as shared drives have ownership and strictly hierarchical permissions. Now they want to bring this UX simplification to everywhere in drive.

I'm sure they are happy to simplify the backend but this definitely makes the product less confusing. It does however make some rare workflows very complicated.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: