söndag, maj 27, 2012

Analyzing logfiles with Power Shell

The other day I was tasked with sumarizing statistics about a web application running on IIS version 6.5. This particular application was to be migrated to a different platform and thus some questions about which users and how frequent this application was being used came up. There were about 100 logfiles and a couple of hundred megabytes in total and they all had the same columns and so I started of by concatenating all the files and at the same time dropped any extra headings. Continued from there by trying to aggregate by date, user and nummer of access'. This did not turn out well however, the logic was alright but the Power Shell execution environment was not good enough at this kind of task. Power Shell took all of my RAM, about 3GB, but returned no results. I can see why as it has to process all of the data to be able to yield any trustworthy results (compare this task to distinct, group by or order by in TSQL) but I think the performance shown is really poor, isn't Power Shell supposed to be able to handle this amount of data? What in the end did work was; aggregate on the same level but per file, then concatenate and aggregate once more. I should have just imported the lot into SQL Server.

Inga kommentarer: