← All posts

Git LFS for engineering

Git LFS lets engineering teams version large binary files without bloating their repository. Learn how it works, common pitfalls, and how OpenVault configures it automatically for CAD, drawings, and simulation data.

By Jon Klinger · Blue Dog

Engineering files aren't like source code. A CAD assembly can be hundreds of megabytes. A high-resolution scan is gigabytes. When you try to version these files with plain Git, the repository balloons instantly. Clone times stretch from seconds to hours. Every engineer's disk fills up with data they don't need. The real problem arrives at merge: Git can't actually merge binary files, so branching becomes risky. Two engineers edit the same part, one person's work disappears, and nobody notices until you're three steps downstream.

This is why Git LFS exists. The solution is unglamorous, but essential for any engineering team using Git rather than a dedicated PDM system.

What Git LFS actually does

Git LFS stands for Large File Storage. The idea is simple but elegant. Instead of storing the full binary file in your Git repository, Git LFS stores a tiny text pointer file. That pointer contains the file's hash and size. The actual file lives in object storage somewhere else, pulled down only when you need it.

Here's the mechanical part. When you commit a 500 MB STEP file:

  1. Git LFS calculates a SHA-256 hash of the file.
  2. It writes that hash to a small text file (the pointer) and commits the pointer instead.
  3. The actual STEP file goes to an LFS server, usually hosted alongside your Git remote.
  4. When another engineer clones or pulls, they get the pointer immediately, then fetch only the files they're working on.

The repository stays lean. Clone is fast. Your disk doesn't fill up with every variation and iteration of every part that anyone ever built.

Why this matters for engineering specifically

Software teams can often live without Git LFS. Source code compresses well, changes are small line-level edits, and even a 10 GB repository is manageable with modern internet. Engineering is different.

A SolidWorks assembly with rendered previews can exceed 200 MB. A STL export of a scanned surface is often larger. PDF scans of hand-sketches and vendor drawings add up. A simulation result dataset might be 5 GB. A PCB layout file with embedded libraries and cached DRCs can be tens of megabytes. A typical project folder touches a mix of these formats, and they compound fast.

Without Git LFS, a team working on even a modest product ends up with a repository that takes thirty minutes to clone and an hour to fetch new changes. Branching becomes expensive because you have to shuttle all that data back and forth. Developers stop branching and start working on main, undoing one of Git's most valuable features. The whole version control experience degrades into something barely better than the shared drive.

How merge conflicts actually work with binary files

This is the hard truth that separates hype from reality. Git LFS doesn't solve the real problem of merging divergent binary data. It solves the storage and network problem. It doesn't solve the geometry problem.

When two engineers check out the same STEP file on different branches, edit it in SolidWorks, and try to merge, Git can't actually compare them. It can't say "they both added a hole here, one is 12mm and one is 10mm, which one wins?" A STEP file is a dense binary format. Changing a single dimension causes a cascade of byte changes throughout the file.

Any tool that claims to automatically merge binary CAD files is either lying or hasn't encountered a real conflict yet. Silent auto-merges of geometry produce corrupted files that load fine in the UI but have subtle geometric errors that only show up downstream during manufacturing.

The right approach is the one Git LFS takes by default: when two branches diverge on a binary file, flag it as a conflict and ask a human to resolve it. That human, the engineer who actually understands the design intent, can load both versions in the CAD tool, understand what each branch was trying to achieve, and decide the correct way forward. That's slower than automatic merge, but it's correct.

Common pitfalls with Git LFS

The most common mistake is forgetting to configure LFS before committing large binaries. If you add a 500 MB file to Git without LFS tracking it first, Git will commit it to the repository in full. You've now created a bloated repository that costs every engineer forever. The fix is painful: you need to do a force-push and rewrite history.

The second pitfall is assuming LFS handles merge automatically. Two branches will conflict on a binary file just as badly as they would in plain Git. The difference is that at least the file itself is small (a pointer), so the repository stays intact. You still have to resolve the conflict by hand.

The third is misunderstanding .gitattributes. LFS works by configuring patterns in .gitattributes to say "these file types are large, track them with LFS instead of Git." You need this file in your repository, it needs to be committed, and it needs to be correct. A common mistake is adding LFS tracking to .gitignore instead of .gitattributes. Your big files then don't get tracked at all, which means they disappear when someone clones.

How OpenVault handles LFS automatically

This is where OpenVault adds real value. OpenVault is an open-source version control system built specifically for engineering data, and it runs on top of Git and Git LFS under the hood. The key detail: it pre-configures LFS for engineering file types automatically.

When you initialize an OpenVault repository, it already knows that STEP files, IGES files, SolidWorks parts and assemblies, STL exports, PDF documents, and simulation data should be tracked with LFS. You don't configure .gitattributes yourself. OpenVault does it. You add a 400 MB assembly, commit it, and Git LFS handles the heavy lifting without you thinking about it.

This matters because the configuration is the hardest part of Git LFS for most teams. The concept is straightforward, but getting the patterns right and making sure every engineer has the same configuration takes coordination. OpenVault bakes it in.

If you're working with SolidWorks, OpenVault recognizes the full set of SolidWorks file types: part files, assemblies, drawings, macros, and simulation results. The same applies to other formats. The tool learns what you're building and configures itself accordingly.

Start with what you actually have

You don't need a six-figure PDM deployment to have good version control for engineering files. Git LFS is free. OpenVault is open source. Both are built on tools that have proven themselves millions of times over in the software world.

The discipline of committing with a message, branching to explore variants, and keeping an audit trail of who changed what and when is the same in engineering as in software. The only difference is acknowledging that binary files need special handling. Git LFS does that. OpenVault does it automatically.

If you're using Git for engineering data, you're probably using Git LFS already or wishing you had. If you're not using Git at all, it's worth knowing that the gap between "shared drive full of revisions" and "version control that understands your data" is much smaller than it used to be.

Start today. pip install openvault and initialize a repository. You'll have your first commit with full LFS support in minutes, and every engineer on the team will have the same configuration. The shared drive full of _FINAL_revised files can wait in the past where it belongs.

Git LFS for engineering files and CAD | Blue Dog