Subdirectory text search

    During the work of the administrator, questions arise, the solution of which is periodically postponed for seeming insignificance, but sometimes they unexpectedly find answers. I hasten to share one such question with a simple answer (windows files, the solution is linux, so the bias is more towards linux).

    The question was: go around all the text files in subdirectories and display the values ​​of the text strings by regular expression. (It is clear that no explorer or windows-commander will help here).

    Circumstances:
    A lot of logs in text files. The values ​​of the logs are mainly registry hives FireFox, FlashPlayer, office, etc. in the JSON format. The scripts were written in JavaScript + WMI and placed in Active Directory at startup of the computer and user. Here are some registry keys that were of primary interest:

    HKLM \ Software \ Macromedia \ FlashPlayer
    HKLM \ Software \ Macromedia \ FlashPlayerActiveX
    HKLM \ Software \ Macromedia \ FlashPlayerPlugin
    HKLM \ Software \ Microsoft \ Windows \ CurrentVersion \ Uninstall
    HKLM \ Software \ Mozilla.org
    HKLM \ Software \ Mozilla
    HKLM \ Software \ MozillaPlugins


    Logs were created in text files in the following format\\ serverlog \ logs $ \ [Date] \ [computer name] \ [path to the registry hive without forbidden special characters] .txt . An example of the name of such a file is " \\ serverlegs \ logs $ \ regToFile.ANSI \ 2011-09-13 \ regToFile- [12-143057] [2011-09-03] \ [HKCU] [SOFTWARE] [Macromedia] [FlashPlayer]. txt ". An example of its contents:

    [
     {"path": "HKLM \\ SOFTWARE \\ Macromedia \\ FlashPlayer", "type": "folder"},
     {"path": "HKLM \\ SOFTWARE \\ Macromedia \\ FlashPlayer", "type": "REG_SZ", "name": "CurrentVersion", "value": "9,0,45,0"},
     {"path": "HKLM \\ SOFTWARE \\ Macromedia \\ FlashPlayer \\ SafeVersions", "type": "folder"},
     {"path": "HKLM \\ SOFTWARE \\ Macromedia \\ FlashPlayer \\ SafeVersions", "type": "REG_DWORD", "name": "6.0", "value": 88},
     {"path": "HKLM \\ SOFTWARE \\ Macromedia \\ FlashPlayer \\ SafeVersions", "type": "REG_DWORD", "name": "7.0", "value": 65},
     {"path": "HKLM \\ SOFTWARE \\ Macromedia \\ FlashPlayer \\ SafeVersions", "type": "REG_DWORD", "name": "8.0", "value": 33},
     {"path": "HKLM \\ SOFTWARE \\ Macromedia \\ FlashPlayer \\ SafeVersions", "type": "REG_DWORD", "name": "9.0", "value": 45}
    ]
    


    There are more than one hundred machines in a domain and the number of files has grown rapidly. Having such a set of logs, I sometimes want to make an on-the-fly picture about the contents of the files in approximately the following form:



    But it turned out that if the log files are scattered in subdirectories, then it is not possible to execute the (windows) find command on them - it does not look for subdirectories. Mount the network directory with the logs in Ubuntu ( sudo mount -t cifs -o user =password =, iocharset = utf8 // serverlogs / logs $ / / media / serverlogs /) Trying on linux at first failed. The find command has the same problems there! But linux is good because its console is admin-oriented, although the interface is not at all friendly. In man it is written that the find command has the -exec option . This is just a super option. It would seem that all that remains is to substitute the grep command in this key and we get the cherished result ... But here we are waiting for a little disappointment! Log files were written in UNICODE (maybe my architectural mistake?), And grep point blank does not understand UNICODE (but UTF-8 understands). We develop the thought further: there is an iconv command that can convert encodings on the fly. This is where her opportunity came in handy. Additionally, we use the "pipeline" and get a command of this kind:

    time find /media/serverlogs/regToFile.ANSI/ -name "*.txt" -exec iconv -f UNICODE -t UTF-8 {} \; | grep 'Macromedia\\\\FlashPlayer.*CurrentVersion'

    A little explanation:
    [ time ] - displays the time spent on the command .
    [ find /media/server03-logs/regToFile.ANSI/ -name "* .txt" ] - display all files of the * .txt type that are in the subdirectories [/media/serverlogs/regToFile.ANSI/]
    [ -exec iconv - f UNICODE -t UTF-8 {} \; ] - convert the contents of the found file (one at a time) from UNICODE encoding to UTF-8
    [ | grep 'Macromedia \\\\ FlashPlayer. * CurrentVersion' ] - find the string Macromedia \\\\ FlashPlayer. * CurrentVersion in the converted text

    The desired result has been achieved and looks like the picture above. I think that I am not the only one who had such a problem. If someone comes in handy, I will be glad.

    PS
    After analyzing the comments, man grep -r and help on “System.FileSystemObject” .OpenAsTextStream () came to the conclusion that the problem was originally “hidden” in this OpenAsTextStream () method. It has a format parameter. If it is -1, then the file opens in UNICODE mode, and if 0, then in ASCII mode (but not ANSI !, but utf-8). I had -1. That was the root of the problem. Set it to 0 and grep -r (on linux) and findstr on windows began to work. It is strange, of course, that they do not understand UNICODE. Well, what if I want to do something with the found string before displaying it on the screen, I will use find -exec.

    To display the found lines:

    JavaScript -> "System.FileSystemObject" .OpenAsTextStream (ForAppending, TristateFalse); (TristateFalse for UTF-8 !!!!)
    Windows:
    cd
    findstr / s "text" * .txt

    Linux:
    grep -r "text"

    Continuing the search topic, I converted files with logs of the UNICODE format to the UTF-8 format (in the linux / bash console):

    time find /media/serverlogs/ -name "*.txt" -exec iconv -f=UNICODE -t=UTF-8 {} -o {}.utf8 \; -exec echo {} \;

    I draw attention to the need to use the -exec switch twice to display the name of the converted file to the console . Combining teams using the && method in one key -exec will fail. The -exec switch accepts only one command.

    Also popular now: