Sunday 21 April 2013

Binary search in text files by timestamp or id


Sometimes I have to deal with huge log files (>5GB). Log files are usually gzipped, so grepping them with zgrep takes time. It is probably ok when I know the exact pattern to search. But quite often I just want to check the log around a particular time. zgrep does not help much here. zless is extremely slow for such task as well.

When a file is a plain text I estimated an offset of the lines I am looking for and I used dd to jump quickly. Something like this:

$ dd bs=1024 skip=3000000 if=huge_log.txt | less

I also tried the same approach for gzipped file

$ zcat huge_log.txt.gz | dd bs=1024 skip=3000000 | less

But it is still inconvenient. Estimation is usually wrong and I had to rerun the command several times.

The obvious thing that general purpose utilities does not take into account is the fact that the log file is sorted by timestamp. Log file usually looks like the following:

2013/04/21 20:59:22.234: ConfigParser: reading parameter file_name
2013/04/21 20:59:22.235: ConfigParser: loading extra values

I searched in google and found a lot of questions like grep between date ranges in a log.
But all answers were about using the general purpose utilities or writing a small specific perl script.

So I decided to write my own.
Here is how to use it for the given log file structure:

$ bsearch -p '^$[YYY/MM/DD hh:mm:ss.nnn]' -t '2013/04/21 20:59:22.235' -t '2013/04/21 21:20:00.000' huge_log.txt.gz

in the -p argument I wrote a regular expression with a small enhancement using Y, M, D, h, m, s, n characters inside $[ ] brackets with dollar sign to specify year, month and so on.
-t argument is for the search string. When -t is used twice they are considered as begin/end lines.
bsearch will quickly find and print all lines between 20:59 and 21:20.

Moreover, I decided that it may be used not only for timestamps search.
If you have a log file with lines containing some identifier which you know is growing (like sequence number), you may search for it too. The command will look like:

$ bsearch -p '^.*? sometext id=$[n+]' -t '43567800001' log.txt.gz

If you are familiar with regular expression, you should understand the meaning. The only addition to the normal regular expression is $[n+] part. This part is replaced with (\d+) expression by bsearch when it runs regex search.

I find it is very useful myself. So if somebody finds it useful too, I am pleased to share it: http://code.google.com/p/bsearch/

Saturday 29 September 2012

Using C++ Eclipse IDE for editing code locally and compiling it remotely on linux


In several companies where I worked the target platform for the application was Linux/Solaris but the desktop PC was running Windows. So I was not able to compile code in the IDE running on desktop.

Some people run IDE on Linux server using Windows PC as the XServer in this case. But I don’t like this approach as it is quite slow for me. Another people use terminal based editors like vim or emacs. But I could not get used to them. Here I describe the way I configure my environment.

In brief, I use Eclipse + Cygwin + Unison.

How does it work?

I have an svn copy of the project on my Windows desktop. I have eclipse running on Windows and I use it to browse and edit code in this local svn copy. I configured eclipse to launch an external command on each file save. The external command is unison, which synchronizes the local svn copy with a copy of the same project on remote Linux server. It takes about 1-2 seconds for unison to figure out that you changed just a few files since last run and copy the changed files over ssh to the server. Eclipse runs this command in background so even when I save files each 5 mins, I don’t notice that. I use Cygwin for bash scripting the long command arguments, so in eclipse the command looks like [bash sync.sh]

Also I use custom Eclipse make command which looks like [ssh server “cd svn/location; make”].
Eclipse feeds the output of this command to console and it is able to parse errors and warnings from this output and highlight the relevant lines in code.

So I am compiling code remotely, but use eclipse locally without compromises on IDE speed.

How to install?

1. Install Eclipse IDE for C++ from http://www.eclipse.org/downloads/

2. Install Cygwin from http://cygwin.com/install.html. That is download setup.exe, set all by default (unless you want some changes) and choose in addition openssh, bash, svn and libiconv2 packages.


4. Checkout svn view using either Cygwin command line:

$ svn co svn://server/path/trunk projectX

OR using Eclipse svn plugin Subclipse inside the IDE.

5. I usually configure include paths for C++ project to be able to browse through system or boost libraries header files. So I copy include files from linux to windows and setup the paths in Eclipse.


6. Prepare the synchronization and make scripts

sync.sh:
#!/bin/bash
/path/to/unison –auto –batch H:/svn/projectX ssh://server/path/to/projectX

make.sh:
#!/bin/bash
name=$1
shift
ssh server “cd ~/path/to/projectX/$name/build; make –j $*” 2>&1 | sed ‘s#/home/$USER/path/to##g’ | iconv –c –t ascii

I had to use iconv to convert to ascii encoding as eclipse does not show well some of the symbols of gcc ouput.
sed in make.sh is used to make eclipse possible to identify the file in the gcc output to project files.
My project consists of several apps each one residing in each projectX subfolder, so I configured several make commands and I use make.sh in the following way:

make.sh dir target
e.g. make.sh encoder tests

You must launch unison command for the first time manually without –auto flag. Unison will ask you what files you want to copy to remote server and if you answer to permanently ignore some files it will put them to
%USER_WIN_HOME%/.unison/default.prf

My default.prf file looks like the following:
ignore = Regex .*/CMakeFiles
ignore = Regex .*/build/.*

You also need to copy unison on target linux server and make sure it is in your unix $PATH.

7. Configure Eclipse

7.1 Properties for projectX: C/C++ Build : uncheck “Use default build command”

7.2 Properties for projectX: Builders -> Add New builder -> Program
 
Check items for
- Launch in background
- During auto builds
Other items if desired

7.3 Uncheck all builders except CDT and sync files. For CDT builder uncheck all operations except “During manual builds”
CDT builder is required to launch make targets which we configure on the next step.


7.4 Setup automatic builds on file save

Now when you save the file sync files builder will launch unison in background.

7.5 Window -> Show view -> Make Target:  New target



Now you can launch the compilation from eclipse.

Enjoy!

______________________________
H: in all examples is a virtual drive configured using Windows subst command:
subst H: C:\Users\vadim\work
which I put in Startup windows auto-startup folder.

Saturday 3 March 2012

Saturday 10 October 2009

Friday 9 October 2009

Simple TCP proxy for emulating the disconnection failure

Assume you have an application that connects to some resource via TCP/IP. And you need to test the case when this resource becomes suddenly unavailable - to make sure you application handles it correctly.
The simplest way to do this is to connect to the resource via a proxy:
App -> TCP proxy -> resource
then to shutdown the proxy at a given time.

The simplest TCP proxy for that will be an ssh client configured for port forwarding.
Example:
$ ssh -L localport:resource_ip:resource_port localhost

This will tell ssh to listen for incoming connections on the localport and forward all packets to resource_ip:resource_port.
So you connect your app to localhost:localport, then kill ssh and you emulated the disconnection failure.

The only requirement is you need to have ssh server running on _your_ host.

Thursday 8 October 2009