Git and normalization of line-endings

Git and normalization of line-endings

A few months ago, I spent hours trying to decide about the best way to deal with line endings and how to switch a repo to using .gitattributes. I just found those comprehensive notes, and thought they’d …


This content originally appeared on DEV Community and was authored by Kevin Shu

Git and normalization of line-endings

A few months ago, I spent hours trying to decide about the best way to deal with line endings and how to switch a repo to using .gitattributes. I just found those comprehensive notes, and thought they'd be easier to find here than buried in my notes...

TL;DR

  • line-ending normalization is about converting LF <=> CR+LF, for cross-platform compatibility
  • .gitattributes file is the safest git mechanism to manage line-endings normalization
  • updating normalization settings is tricky because git may report changes on unmodified files, and it is totally not obvious what is happening
  • there are some tricks to help understand what is happening and to fix things

Line endings and Operating systems

When you press <Enter> in your text editor, the file is modified with invisible characters that represent the new line. This invisible thing is most commonly represented in two ways:

  • ASCII character LF (aka Line Feed)
  • ASCII character CR+LF (aka Carriage Return + Line Feed)

Historically, most systems used to require CR+LF, and Unix systems decided in the 1980s to remove the CR character to simplify things and save disk space.

In practice, Windows is the only modern operating systems that still uses CRLF line endings.

When developing in a team, you will end up with people working on Windows and other operating systems, and you will need to manage this difference in your source control system.

Git's core.autocrlf

This settings is defined via git config. It applies globally or per repo. When enabled, it applies normalization on all files detected by git as text

  1. Checkout Windows-style, commit Unix-style (core.autocrlf=true)

    • Git will convert LF to CRLF when checking out text files.
    • When committing text files, CRLF will be converted to LF.
    • This is the default value pushed by the installer on Windows systems
  2. Checkout as-is, commit Unix-style (core.autocrlf=input)

    • Git will not perform any conversion when checking out text files.
    • When committing text files, CRLF will be converted to LF.
    • some people recommend using this when developing on Unix systems
  3. Checkout as-is, commit as-is (core.autocrlf=false)

    • Git will not perform any conversions when checking out or committing text files.
    • This is default value if the setting is not defined.

Some people consider it is not git's responsibility to do line-ending normalization. It could be tempting to go "checkout as-is commit as-is" in order to disable git's normalization. But it cannot be commited to a repo, so it is dependent on developer workstation settings ====> fragile

Regardless of what is defined in people's local core.autocrlf setting, individual repository maintainers can override the behavior via the .gitattributes file, which is the most robust way to go.

.gitattributes

.gitattributes assigns attributes to file types. The text and eol attributes are used to control the end-of-line normalization process

  • -text : disable normalization for this type of file. Should be used for any type of binary file.
  • text : normalizes this type of file using core.eol, which defaults to OS native (core.eol should not be touched in normal situations)
  • eol=lf : forces lf on all systems, regardless of OS
  • eol=crlf : forces crlf on all systems, regardless of OS

  • global wildcards

    • * text=auto
      • lets git detect file type and apply normalization accordingly
      • similar to setting core.autocrlf=true
    • * -text
      • people have tried to use this to emulate "checkout as-is commit as is"
      • but they had various levels of success
    • you can always use these wildcards in addition to more specific overrides

Beware, line-endings normalization must not be enabled on any binary file.

Updating the normalization settings

If you change the normalization settings (either core.autocrlf or .gitattributes), you will have some work to do on your local repository, on your remote repository, and on your colleagues workstations.

You can also just leave it be, but you expose yourself to weird git behaviors (untouched files reported as changed, among others) or other issues.

Your first reflex would be to look at the line endings in your code editor and play around with the different git commands you'll find online, but it can very quickly become very confusing.

View the difference between Index and Workspace

You will find quantities of "solutions"/tutorials in stackoverflow or other websites that tell you what to do, but they always miss some edge cases.

Git normalization does not happen in the workspace, but during the transition into or out of the index, so you need a way to view line-endings in both the index and the workspace before acting.

Here is the thing that should be checked to understand what is happening, and most tutorials don't talk about it:

git ls-files --eol

Which could result in this type of output:

i/lf    w/crlf  attr/                   Applications/K8S/versions.tf
i/lf    w/lf    attr/text eol=lf        .gitignore
i/-text w/-text attr/                   Services/SMB/hosts-2022-10-20.xlsx
i/lf    w/lf    attr/                   .gitattributes
i/crlf  w/crlf  attr/                   Applications/K8S/ci/backend.tfvars
i/lf    w/crlf  attr/text eol=lf        Legacy/Modules/Keyvault/.gitignore
  • i/ tells you how the file is saved in the index
  • w/ tells you how the file is presented in the workspace
  • attr/ tells you how the .gitattributes file(s) is (are) hinting git to deal with this file

For a usual Windows developer using the core.autocrlf=true option (which is the default pushed by git installation on Windows), you should normally mostly get a mix of the first three types:

  • i/lf w/crlf attr/ : the file is normalized by git and uses Windows standard line-endings crlf
  • i/lf w/lf attr/text eol=lf : the file is normalized by git and enforced to use lf
  • i/-text w/-text attr/ : the file is autodetected as a binary and not normalized by git

If you ended up with a mix of any other ones, it may be because you or somebody made some changes in the .gitattributes file or the core.autocrlf option.

Repairing i/crlf w/crlf attr/

This file was most probably pushed by someone using core.autocrlf=false and working in Windows. This will typically make git complain about changes on untouched files.

Fix strategies:

  • in any case,

    • make sure your have a clean repo before acting
    • communicate on this change, because people will encounter real conflicts
  • option1: make a commit that will fix all the files in your repo with git add . --renormalize

    • pb: will impede your capability to do a blame
    • you could instruct blame to be more happy with option -w
    • in practice, git GUIs will happily workaround this
    • if necessary you could also use these blame options --ignore-rev, --ignore-revs-file
  • option2: rewrite your history

    • pb: rewriting history is hard.
    • need to synchronize all committers
    • almost impossible in opensource projects

https://www.ofcodeandcolor.com/2013/08/29/normalizing-line-endings-in-git-repositories/
https://www.moxio.com/blog/43/ignoring-bulk-change-commits-with-git-blame

Repairing i/lf w/crlf attr/text eol=lf

In this case, the index is ok, but the workspace is "broken".

This file was probably checked out before attribute eol=lf was specified.
Git will not bother you with this. But maybe your code editor or tool will bug you if it requires crlf line-endings for some file types.

Examples:

  • visual studio may complain or introduce incoherent line-endings if csproj have "wrong" line endings
  • terraform will complain if *.lock.hcl files have wrong line-endings

The fix : delete the local file, and check it out again

Bulk fixing: pipe the output of this command to xargs rm, then do a git reset (with all the precautions needed!!!)

git ls-files --eol | grep "i/lf    w/crlf  attr/t" | cut -f2 -d$'\t'

Repairing i/lf w/lf attr/text eol=crlf

In this case, the index is ok, but the workspace is "broken".

This time, it may be a problem if your tooling or IDE requires lf line-endings.

The fix: same as the other "broken" workspace situation.


This content originally appeared on DEV Community and was authored by Kevin Shu


Print Share Comment Cite Upload Translate Updates
APA

Kevin Shu | Sciencx (2023-04-28T20:12:58+00:00) Git and normalization of line-endings. Retrieved from https://www.scien.cx/2023/04/28/git-and-normalization-of-line-endings/

MLA
" » Git and normalization of line-endings." Kevin Shu | Sciencx - Friday April 28, 2023, https://www.scien.cx/2023/04/28/git-and-normalization-of-line-endings/
HARVARD
Kevin Shu | Sciencx Friday April 28, 2023 » Git and normalization of line-endings., viewed ,<https://www.scien.cx/2023/04/28/git-and-normalization-of-line-endings/>
VANCOUVER
Kevin Shu | Sciencx - » Git and normalization of line-endings. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2023/04/28/git-and-normalization-of-line-endings/
CHICAGO
" » Git and normalization of line-endings." Kevin Shu | Sciencx - Accessed . https://www.scien.cx/2023/04/28/git-and-normalization-of-line-endings/
IEEE
" » Git and normalization of line-endings." Kevin Shu | Sciencx [Online]. Available: https://www.scien.cx/2023/04/28/git-and-normalization-of-line-endings/. [Accessed: ]
rf:citation
» Git and normalization of line-endings | Kevin Shu | Sciencx | https://www.scien.cx/2023/04/28/git-and-normalization-of-line-endings/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.