Re: [SLUG] perl [pig] duplicate removal with a twist

From: baris nema (baris_nema@cigflorida.com)
Date: Tue Jul 25 2006 - 09:25:35 EDT


Dylan William Hardison wrote:
> Spake baris nema on Monday, July 24, 2006 at 05:44PM -0400:
>
>> The input files I'm processing is currently on the order of 6megs
>> (~100,000 lines), so
>> I'm thinking an array is out, the output file is smaller, usually.
>>
>> Would it be possible to search the output file without having to close
>> it (maybee using
>> a different file handle)?
>>
>> I'm currently doing the processing using bash, but due to the number of
>> loops want to
>> port it over to perl, I'm quite new at perl, so any code examples would
>> be helpfull.
>>
>> currently it's (perlified pseudo code):
>>
>> read line in from input file;
>> $criticalpart = result of several operations on line;
>> $existstat = `grep -c "$criticalpart" "$outputfile"`; #trying to
>> replace this line with perl code
>> if ( $existstat == 0 );
>> {
>> $processedcriticalpart = [some more operations on $criticalpart];
>> print $outputfile "$processedcriticalpart";
>> }
>>
>
> What does the "critcalpart" look like?
>
> And yes, you can 'rewind' a file and start reading/writing from the
> beginning.
>
> see perldoc -f seek for rewinding. Note, you'd also need to open the
> file for both reading *and* writing.
>
> Anyway, if the critical part is always the same size (or a decidable
> size), you won't have to do this ugly rewinding thing... Give us some
> more details. :-)
>
>
"criticalpart" per line is usually in the neighborhood of 10-25
characters long,
however it can be as little as 5 characters or more then 200.

-----------------------------------------------------------------------
This list is provided as an unmoderated internet service by Networked
Knowledge Systems (NKS). Views and opinions expressed in messages
posted are those of the author and do not necessarily reflect the
official policy or position of NKS or any of its employees.



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 15:03:21 EDT