One-liner: Remove first two characters of every line in thousands of files

In a project creating a Ladino dictionary in which I have a few thousands of YAML files. They used to include lists of values, but a while ago I split them up into individual entries. I did this because the people who are editing them are not used to Y…


This content originally appeared on DEV Community and was authored by Gábor Szabó

In a project creating a Ladino dictionary in which I have a few thousands of YAML files. They used to include lists of values, but a while ago I split them up into individual entries. I did this because the people who are editing them are not used to YAML files and it makes it a lot easier to explain them what to do.

However, the previous change left me with 1-item lists in each file. I wanted to clean that up.

Example files

Here are a few examples files that were also reduced in size for this demo.

- ladino: kaza
- ladino: komer
  inglez: to eat
- ladino: biervo
  inglez: word
# some comment

As you can see each one has an entry for a Ladino expression. Some of the files have translations to English. Other files in the real data-set had further translations to Hebrew, Turkish, French, Portuguese, and Spanish.

Some files had comments.

That dash at the first row and the indentation is the left-over from the time when more than one of these were in each file.

So I wanted to get rid of the first two columns in every line, except when they start with a hash-mark (#).

Here is the Perl one-liner to do so.

perl -p -i -e 's/^[^#].//' *.yaml
  • The '*.yaml' at the end is a shell expression that will list all the YAML files in the current directory as the parameters of this command.
  • The -p tells perl to read the content of each file line-by-line and print it.
  • The -i tells perl to replace the original files with the content that was printed.
  • The -e tells perl that the following string is a perl program and not the name of the file where the perl program is
  • The perl program 's/^[^#].//' will be execute on every line read from the files.
  • The 's///' is regex substitution. It works on the current line and changes the current line. So the lines that are saved back to the files are the modified lines.
  • Between the 1st and 2nd slash is the regex.
  • The first ^ means the match must start at the beginning of the line.
  • The [^#] means that there must be a character that is not #. This will match any character on the first place of the file except #.
  • The . means match any character.
  • The string that is between the 2nd and 3rd slash is the replacement. It is an empty string so if there is a match it will be replaced by the empty string.

That's the whole thing.

Improvement

Now that I am explaining it, it occurred to me that this would be a safer solution:

perl -p -i -e 's/^[- ] //' *.yaml

Here the regex is s/^[- ] // which means the first character must be either a dash or a space and the second character must be a space and those two are replaced.
So if there is anything else as the first two characters the line will not be changed. This is safer as it is more specific as what we would like to match for replacement.

Results

For this article I saved the resulting files in a separate place:

ladino: kaza
ladino: komer
inglez: to eat
ladino: biervo
inglez: word
# some comment


This content originally appeared on DEV Community and was authored by Gábor Szabó


Print Share Comment Cite Upload Translate Updates
APA

Gábor Szabó | Sciencx (2023-03-14T20:00:48+00:00) One-liner: Remove first two characters of every line in thousands of files. Retrieved from https://www.scien.cx/2023/03/14/one-liner-remove-first-two-characters-of-every-line-in-thousands-of-files/

MLA
" » One-liner: Remove first two characters of every line in thousands of files." Gábor Szabó | Sciencx - Tuesday March 14, 2023, https://www.scien.cx/2023/03/14/one-liner-remove-first-two-characters-of-every-line-in-thousands-of-files/
HARVARD
Gábor Szabó | Sciencx Tuesday March 14, 2023 » One-liner: Remove first two characters of every line in thousands of files., viewed ,<https://www.scien.cx/2023/03/14/one-liner-remove-first-two-characters-of-every-line-in-thousands-of-files/>
VANCOUVER
Gábor Szabó | Sciencx - » One-liner: Remove first two characters of every line in thousands of files. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2023/03/14/one-liner-remove-first-two-characters-of-every-line-in-thousands-of-files/
CHICAGO
" » One-liner: Remove first two characters of every line in thousands of files." Gábor Szabó | Sciencx - Accessed . https://www.scien.cx/2023/03/14/one-liner-remove-first-two-characters-of-every-line-in-thousands-of-files/
IEEE
" » One-liner: Remove first two characters of every line in thousands of files." Gábor Szabó | Sciencx [Online]. Available: https://www.scien.cx/2023/03/14/one-liner-remove-first-two-characters-of-every-line-in-thousands-of-files/. [Accessed: ]
rf:citation
» One-liner: Remove first two characters of every line in thousands of files | Gábor Szabó | Sciencx | https://www.scien.cx/2023/03/14/one-liner-remove-first-two-characters-of-every-line-in-thousands-of-files/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.