Stupid DATA Tricks
-
Image credit: Rob Nguyen on Flickr
I’ve previously written about Stupid Open Tricks, so now it’s time for some stupid
DATA
tricks. You probably already know that you can “embed” a file inside a Perl program then read it from theDATA
filehandle. David Farrell wrote about this in Perl tokens you should know and he’s the one who reminded me about the curiousity that I’ll demonstrate here.Anything after the
__DATA__
line is not part of the program but is available to the program through the specialDATA
filehandle:#!/usr/bin/perl print "---Outputting DATA\n", <DATA>, "---Done\n"; __DATA__ Dog Cat Bird
The output shows each line after
__DATA__
:---Outputting DATA Dog Cat Bird ---Done
I typically go the other way by starting with a data file and adding a program to the top of it:
#!/usr/bin/perl use v5.26; use Text::CSV_XS; my $csv = Text::CSV_XS->new; while( my $row = $csv->getline(*DATA) ) { say join ':', $row->@[3,7]; } __DATA__ ...many CSV lines...
This is the end, my friend, the END
You probably also know that you can use
__END__
instead. I’m used to using that because it’s a holdover from Perl 4 and that’s where I first learned this:#!/usr/bin/perl print "---Outputting DATA\n", <DATA>, "---Done\n"; __END__ Dog Cat Bird
You get the same output:
---Outputting DATA Dog Cat Bird ---Done
But now let’s get a little tricky. Define a package at the end of the program. This still uses
__END__
:#!/usr/bin/perl print "---Outputting DATA\n", <DATA>, "---Done\n"; package not::main; __END__ Dog Cat Bird
Again, this outputs the same thing as before. Nothing surprising here, but the suspense must be building:
---Outputting DATA Dog Cat Bird ---Done
Change that
__END__
to__DATA__
and try again:#!/usr/bin/perl print "---Outputting DATA\n", <DATA>, "---Done\n"; package not::main; __DATA__ Dog Cat Bird
Now you don’t see those lines:
---Outputting DATA ---Done
If you’ve read the documentation and cared about this sort of thing (or like me, forgotten it), you may have noticed that the
DATA
handle lives in the package that’s in scope at the end of the program:Text after DATA may be read via the filehandle “PACKNAME::DATA”, where “PACKNAME” is the package that was current when the DATA token was encountered.
I can use the package specification to get the lines back:
#!/usr/bin/perl print "---Outputting DATA\n", <not::main::DATA>, "---Done\n"; package not::main; __DATA__ Dog Cat Bird
Now those lines are back:
---Outputting DATA Dog Cat Bird ---Done
But what about the
__END__
? Well, that was a Perl 4 thing, before there were packages. Perl 5 added packages, then Perl 5.6 added the__DATA__
token. The__END__
kept doing what it was doing in the way it was doing it (package-less), and__DATA__
did something related by new:For compatibility with older scripts written before DATA was introduced, END behaves like DATA in the top level script (but not in files loaded with “require” or “do”) and leaves the remaining contents of the file accessible via “main::DATA”.
Some other DATA tricks
There’s a few other interesting things you can do.
Program size
You can get the entire file size with the
-s
file test operator. The__DATA__
(or__END__
) has to be there, but you don’t need any data after those tokens.use v5.10; my $size = -s DATA; say "File size is $size"; __DATA__
The file size this reports includes everything in the file, not just the part before the end of processing.
use v5.10; my $size = -s DATA; say "File size is $size"; my $data = tell DATA; say "Data starts at $data"; say "Data size is ", $size - $data __END__ Dog Cat Bird
The program size includes the
__END__
token and the newline after it. The rest belongs to the data:File size is 164 Data starts at 151 Data size is 13
You can use
DATA
in other file things, includingstat
.Read it twice
If you want to read the data twice, you can reset the file cursor. First, remember where
DATA
starts by callingtell
before you read any lines. When you are ready to read it again,seek
to that same position:#!/usr/bin/perl my $data_start = tell DATA; print "---Outputting DATA\n", <DATA>, "---Done\n"; seek DATA, $data_start, 0; print "---Outputting DATA\n", <DATA>, "---Done\n"; __END__ Dog Cat Bird
Using line numbers
#!/usr/bin/perl my $data_start = tell DATA; print "---Outputting DATA\n"; while( <DATA> ) { print "$. $_" } print "---Done\n"; __END__ Dog Cat Bird
Now you see some line numbers, but those start counting from the first line under
__DATA__
:---Outputting DATA 1 Dog 2 Cat 3 Bird ---Done
To get the real line numbers, you can figure out where the
__END__
token is. This assumes that it’s not in the middle of documentation or in a string:#!/usr/bin/perl my $data_start = tell DATA; my $end_line; UNITCHECK { open my $fh, '<', $0; while( <$fh> ) { last if /\A__END__$/ } $end_line = $. } print "---Outputting DATA\n"; while( <DATA> ) { $n = $end_line + $.; print "$n $_" } print "---Done\n"; __END__ Dog Cat Bird
Now you see the offsets in the whole file and not the count after the
__END__
:---Outputting DATA 19 Dog 20 Cat 21 Bird ---Done
There are some more vigorous methods in Can a Perl program know the line number where DATA begins? .
Multiple embedded files
This isn’t a
DATA
thing, but you can make several embedded files with Inline::Files:#!/usr/bin/perl use Inline::Files; print "---Outputting dogs\n", <DOGS>, "---Done\n"; print "---Outputting cats\n", <CATS>, "---Done\n"; print "---Outputting birds\n", <BIRDS>, "---Done\n"; __DOGS__ Rin Tin Tin Lassie Ol' Yellar __CATS__ Grumpy Cat Garfield Maru Mr Bigglesworth __BIRDS__ Woody Woodpecker Roadrunner Zazu Sam the Eagle
Each of those get their own filehandles:
---Outputting dogs Rin Tin Tin Lassie Ol' Yellar ---Done ---Outputting cats Grumpy Cat Garfield Maru Mr Bigglesworth ---Done ---Outputting birds Woody Woodpecker Roadrunner Zazu Sam the Eagle ---Done
Inline::Files has a problem because it overrides
open
. You have to useCORE::open
to get to the real one:use Inline::Files; print "---Outputting dogs\n", <DOGS>, "---Done\n"; print "---Outputting cats\n", <CATS>, "---Done\n"; print "---Outputting birds\n", <BIRDS>, "---Done\n"; CORE::open my $fh, '<:utf8', '/etc/hosts' or die $!; print "---Outputting hosts\n", <$fh>, "---Done\n";
https://perldotcom.perl.org/article/stupid-data-tricks/
© Lightnetics 2024