Re: [SLUG] Powerpoint image extraction

From: Eben King (eben1@tampabay.rr.com)
Date: Fri Jul 15 2005 - 17:28:24 EDT


> On Fri, 15 Jul 2005, Chuck Hast wrote:
>
> > On 7/15/05, Eben King <eben1@tampabay.rr.com> wrote:
> > > So a correspondent sent me some images, encapsulated in a .PPT file.
> > > (Naked JPEGs would have worked fine, but nooooo...) Is there any way, using
> > > od, grep, dd, or maybe some tool I don't have yet, to get them out? I can
> > > view it in Windows (I suppose Windows-in-VMware too) using MS*spit*'s free
> > > "Powerpoint Viewer", but I can't do squat with it.
> > >
> > Did you try Open Office? I use it to open company PPT files all of the time,
> > it should allow you to pull those images out.

I found a way that does not involve OpenOffice. I wrote a scriptlet:

for skipcount in `seq 2 1032700`; do
  echo -n "$skipcount "
  dd if=body_paint.ppt skip=$skipcount bs=1 2>/dev/null | file -
done | grep -v ': *data$' > /tmp/file2

(1032700 is a nice round number a little less than the file size in bytes)

In /tmp/file2, there is lots of stuff like

71008 standard input: Sendmail frozen configuration - version þ?5ol÷þŸžqXÿ
71042 standard input: DBase 3 data file with memo(s) (16772103 records)
71178 standard input: shell archive or script for antique kernel text

and most of it is total BS. But among the dreck, I found

537 standard input: JPEG image data, JFIF standard 1.01, resolution (DPI), 96 x 96
and
49958 standard input: JPEG image data, JFIF standard 1.01, resolution (DPI), 96 x 96

(so far) If I do

dd if=filename.ppt skip=537 bs=1 | djpeg | xv -

there is one of the images from the file! I can do the same with the other
offset. Now, I don't know if they were JPEGs originally, or that's just
what Powerpoint uses. And it's slow. Very slow. And (as you see) subject
to misidentification by "file" -- I don't know but what the file's
fragmented internally (a la "fast save" in Word), so I'd never retrieve the
images this way. But it's a possibility, for users who don't have OO, and
_do_ have lots of time, and who won't get too upset about missing images.

-- 
-eben    ebQenW1@EtaRmpTabYayU.rIr.OcoPm    home.tampabay.rr.com/hactar
LEO:  Now is not a good time to photocopy your butt and staple
it to your boss' face, oh no.  Eat a bucket of tuna-flavored pudding
and wash it down with a gallon of strawberry Quik.  -- Weird Al

----------------------------------------------------------------------- This list is provided as an unmoderated internet service by Networked Knowledge Systems (NKS). Views and opinions expressed in messages posted are those of the author and do not necessarily reflect the official policy or position of NKS or any of its employees.



This archive was generated by hypermail 2.1.3 : Fri Aug 01 2014 - 17:49:36 EDT