Specifying a filename prefix for the split files

All the previous split filenames start with x. If we are splitting more than one file, we'll want to name the pieces, so it's obvious which goes with which. We can use our own filename prefix by providing a prefix as the last argument.

Let's run the previous command with the split_file prefix:

$ split -b 10k data.file -d -a 4 split_file
$ ls
data.file       split_file0002  split_file0005  split_file0008
strtok.c
split_file0000  split_file0003  split_file0006  split_file0009
split_file0001  split_file0004  split_file0007

To split files based on the number of lines in each split rather than chunk size, use this:

-l no_of_lines:
 # Split into files of 10 lines each.
 $ split -l 10 data.file 

The csplit utility splits files based on context instead of size. It can split based on line count or regular expression pattern. It's particularly useful for splitting log files.

Look at the following example log:

$ cat server.log
SERVER-1 
[connection] 192.168.0.1 success 
[connection] 192.168.0.2 failed 
[disconnect] 192.168.0.3 pending 
[connection] 192.168.0.4 success 
SERVER-2 
[connection] 192.168.0.1 failed 
[connection] 192.168.0.2 failed 
[disconnect] 192.168.0.3 success 
[connection] 192.168.0.4 failed 
SERVER-3 
[connection] 192.168.0.1 pending 
[connection] 192.168.0.2 pending 
[disconnect] 192.168.0.3 pending 
[connection] 192.168.0.4 failed

We may need to split the files into server1.log, server2.log, and server3.log from the contents for each SERVER in each file. This can be done as follows:

$ csplit server.log /SERVER/ -n 2 -s {*}  -f server -b "%02d.log"       $ rm server00.log 
$ ls
server01.log  server02.log  server03.log  server.log

The details of the command are as follows:

  • /SERVER/: This is the line used to match a line by which a split is to be carried out.
  • /[REGEX]/: This is the format. It copies from the current line (first line) up to the matching line that contains SERVER excluding the match line.
  • {*}: This specifies repeating a split based on the match up to the end of the file. We can specify the number of times it is to be continued by placing a number between the curly braces.
  • -s: This is the flag to make the command silent rather than printing other messages.
  • -n: This specifies the number of digits to be used as suffix. 01, 02, 03, and so on.
  • -f: This specifies the filename prefix for split files (server is the prefix in the previous example).
  • -b: This specifies the suffix format. "%02d.log" is similar to the printf argument format in C, Here, the filename = prefix + suffix, that is, "server" + "%02d.log".

We remove server00.log since the first split file is an empty file (the match word is the first line of the file).