Re: [SLUG] perl [pig] duplicate removal with a twist

From: aaron steimle (asteimle@washpat.com)
Date: Tue Jul 25 2006 - 10:26:27 EDT


Kwan Lowe wrote:
>> "criticalpart" per line is usually in the neighborhood of 10-25
>> characters long,
>> however it can be as little as 5 characters or more then 200.
>>
>
> This is almost screaming out to use a hash...
>
>
Yes, hash! and

"The input files I'm processing is currently on the order of 6megs
(~100,000 lines), so
I'm thinking an array is out"

is not an issue. I use a hash with a 142MB file with over 3.8 million
lines in it.
-----------------------------------------------------------------------
This list is provided as an unmoderated internet service by Networked
Knowledge Systems (NKS). Views and opinions expressed in messages
posted are those of the author and do not necessarily reflect the
official policy or position of NKS or any of its employees.



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 15:03:24 EDT