Re: [SLUG] perl [pig] duplicate removal with a twist

From: baris nema (baris_nema@cigflorida.com)
Date: Tue Jul 25 2006 - 12:53:23 EDT


aaron steimle wrote:
> Kwan Lowe wrote:
>>> "criticalpart" per line is usually in the neighborhood of 10-25
>>> characters long,
>>> however it can be as little as 5 characters or more then 200.
>>>
>>
>> This is almost screaming out to use a hash...
>>
>>
> Yes, hash! and
>
> "The input files I'm processing is currently on the order of 6megs
> (~100,000 lines), so
> I'm thinking an array is out"
> is not an issue. I use a hash with a 142MB file with over 3.8 million
> lines in it.
> -----------------------------------------------------------------------
> This list is provided as an unmoderated internet service by Networked
> Knowledge Systems (NKS). Views and opinions expressed in messages
> posted are those of the author and do not necessarily reflect the
> official policy or position of NKS or any of its employees.
assuming size isn't an issue, I've figured out how to put the each
"criticalpart" into a list,
(don't see how a hash will work, as it's a single list of items,
"criticalpart").
how do I search the list to see if "criticalpart" is already in the list?
would it be possible to do:
if (@criticalpartlist =~ m/criticalpart/)
   {
   don't save;
   }
else
   {
   do save;
   }

-----------------------------------------------------------------------
This list is provided as an unmoderated internet service by Networked
Knowledge Systems (NKS). Views and opinions expressed in messages
posted are those of the author and do not necessarily reflect the
official policy or position of NKS or any of its employees.



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 15:03:24 EDT