- Linux Shell Scripting Cookbook(Third Edition)
- Clif Flynt Sarath Lakshman Shantanu Tushar
- 431字
- 2021-07-09 19:46:22
Specifying a filename prefix for the split files
All the previous split filenames start with x. If we are splitting more than one file, we'll want to name the pieces, so it's obvious which goes with which. We can use our own filename prefix by providing a prefix as the last argument.
Let's run the previous command with the split_file prefix:
$ split -b 10k data.file -d -a 4 split_file $ ls data.file split_file0002 split_file0005 split_file0008 strtok.c split_file0000 split_file0003 split_file0006 split_file0009 split_file0001 split_file0004 split_file0007
To split files based on the number of lines in each split rather than chunk size, use this:
-l no_of_lines: # Split into files of 10 lines each. $ split -l 10 data.file
The csplit utility splits files based on context instead of size. It can split based on line count or regular expression pattern. It's particularly useful for splitting log files.
Look at the following example log:
$ cat server.log SERVER-1 [connection] 192.168.0.1 success [connection] 192.168.0.2 failed [disconnect] 192.168.0.3 pending [connection] 192.168.0.4 success SERVER-2 [connection] 192.168.0.1 failed [connection] 192.168.0.2 failed [disconnect] 192.168.0.3 success [connection] 192.168.0.4 failed SERVER-3 [connection] 192.168.0.1 pending [connection] 192.168.0.2 pending [disconnect] 192.168.0.3 pending [connection] 192.168.0.4 failed
We may need to split the files into server1.log, server2.log, and server3.log from the contents for each SERVER in each file. This can be done as follows:
$ csplit server.log /SERVER/ -n 2 -s {*} -f server -b "%02d.log" $ rm server00.log $ ls server01.log server02.log server03.log server.log
The details of the command are as follows:
- /SERVER/: This is the line used to match a line by which a split is to be carried out.
- /[REGEX]/: This is the format. It copies from the current line (first line) up to the matching line that contains SERVER excluding the match line.
- {*}: This specifies repeating a split based on the match up to the end of the file. We can specify the number of times it is to be continued by placing a number between the curly braces.
- -s: This is the flag to make the command silent rather than printing other messages.
- -n: This specifies the number of digits to be used as suffix. 01, 02, 03, and so on.
- -f: This specifies the filename prefix for split files (server is the prefix in the previous example).
- -b: This specifies the suffix format. "%02d.log" is similar to the printf argument format in C, Here, the filename = prefix + suffix, that is, "server" + "%02d.log".
We remove server00.log since the first split file is an empty file (the match word is the first line of the file).