Re: [SLUG] perl [pig] duplicate removal with a twist

From: Kwan Lowe (kwan@digitalhermit.com)
Date: Tue Jul 25 2006 - 13:34:55 EDT


> assuming size isn't an issue, I've figured out how to put the each
> "criticalpart" into a list,
> (don't see how a hash will work, as it's a single list of items,
> "criticalpart").
> how do I search the list to see if "criticalpart" is already in the list?
> would it be possible to do:
> if (@criticalpartlist =~ m/criticalpart/)
> {
> don't save;
> }
> else
> {
> do save;
> }

Real brief intro to how hashes would be useful here...

Each line in the input file gets converted to a unique fixed-length hash. This hash
is sorted in a table and the sorting method is very efficient.

To check if a new entry already exists is a simple matter of:
1) hash the new line
2) lookup the new hash in the hash table for the old file

Because there's no regex searches or other manipulation it's quick and efficient.

-- 
* The Digital Hermit   http://www.digitalhermit.com
* Unix and Linux Solutions   kwan@digitalhermit.com
-----------------------------------------------------------------------
This list is provided as an unmoderated internet service by Networked
Knowledge Systems (NKS).  Views and opinions expressed in messages
posted are those of the author and do not necessarily reflect the
official policy or position of NKS or any of its employees.



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 15:03:58 EDT