Re: [SLUG] Powerpoint image extraction

From: Chuck Hast (wchast@gmail.com)
Date: Fri Jul 15 2005 - 17:52:22 EDT


On 7/15/05, Eben King <eben1@tampabay.rr.com> wrote:
> > On Fri, 15 Jul 2005, Chuck Hast wrote:
> >
> > > On 7/15/05, Eben King <eben1@tampabay.rr.com> wrote:
> > > > So a correspondent sent me some images, encapsulated in a .PPT file.
> > > > (Naked JPEGs would have worked fine, but nooooo...) Is there any way, using
> > > > od, grep, dd, or maybe some tool I don't have yet, to get them out? I can
> > > > view it in Windows (I suppose Windows-in-VMware too) using MS*spit*'s free
> > > > "Powerpoint Viewer", but I can't do squat with it.
> > > >
> > > Did you try Open Office? I use it to open company PPT files all of the time,
> > > it should allow you to pull those images out.
>
> I found a way that does not involve OpenOffice. I wrote a scriptlet:
>
> for skipcount in `seq 2 1032700`; do
> echo -n "$skipcount "
> dd if=body_paint.ppt skip=$skipcount bs=1 2>/dev/null | file -
> done | grep -v ': *data$' > /tmp/file2
>
> (1032700 is a nice round number a little less than the file size in bytes)
>
> In /tmp/file2, there is lots of stuff like
>
> 71008 standard input: Sendmail frozen configuration - version þ?5ol÷þŸžqXÿ
> 71042 standard input: DBase 3 data file with memo(s) (16772103 records)
> 71178 standard input: shell archive or script for antique kernel text
>
> and most of it is total BS. But among the dreck, I found
>
> 537 standard input: JPEG image data, JFIF standard 1.01, resolution (DPI), 96 x 96
> and
> 49958 standard input: JPEG image data, JFIF standard 1.01, resolution (DPI), 96 x 96
>
> (so far) If I do
>
> dd if=filename.ppt skip=537 bs=1 | djpeg | xv -
>
> there is one of the images from the file! I can do the same with the other
> offset. Now, I don't know if they were JPEGs originally, or that's just
> what Powerpoint uses. And it's slow. Very slow. And (as you see) subject
> to misidentification by "file" -- I don't know but what the file's
> fragmented internally (a la "fast save" in Word), so I'd never retrieve the
> images this way. But it's a possibility, for users who don't have OO, and
> _do_ have lots of time, and who won't get too upset about missing images.
>
> --

Yes, that is a interesting work around, it would get you there. You would have
to follow things a bit more closely.

I will look and see if there are other ways to get them out using OO, seems like
someone needs to work on that one if not.

-----------------------------------------------------------------------
This list is provided as an unmoderated internet service by Networked
Knowledge Systems (NKS). Views and opinions expressed in messages
posted are those of the author and do not necessarily reflect the
official policy or position of NKS or any of its employees.



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 17:49:58 EDT