Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FrozenErrors #168

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion lib/combine_pdf/api.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ module CombinePDF
def load(file_name = '', options = {})
raise TypeError, "couldn't parse data, expecting type String" unless file_name.is_a?(String) || file_name.is_a?(Pathname)
return PDF.new if file_name == ''
PDF.new(PDFParser.new(IO.read(file_name, mode: 'rb').force_encoding(Encoding::ASCII_8BIT), options))
PDF.new(PDFParser.new(IO.read(file_name, mode: 'rb').dup.force_encoding(Encoding::ASCII_8BIT), options))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dup is redundant and will use up a bunch of memory. The dynamic String returned fromIO.read (should probably be IO.binread) is mutable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

end

# creats a new PDF object.
Expand Down
6 changes: 3 additions & 3 deletions lib/combine_pdf/decrypt.rb
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def decrypt

def set_general_key(password = '')
# 1) make sure the initial key is 32 byte long (if no password, uses padding).
key = (password.bytes[0..32].to_a + @padding_key)[0..31].to_a.pack('C*').force_encoding(Encoding::ASCII_8BIT)
key = (password.bytes[0..32].to_a + @padding_key)[0..31].to_a.pack('C*').dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dup is redundant and will use up a bunch of memory. The dynamic String returned from Array#pack is mutable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

# 2) add the value of the encryption dictionary’s O entry
key << actual_object(@encryption_dictionary[:O]).to_s
# 3) Convert the integer value of the P entry to a 32-bit unsigned binary number
Expand All @@ -89,7 +89,7 @@ def set_general_key(password = '')
# # if document metadata is not being encrypted, add 4 bytes with the value 0xFFFFFFFF.
if actual_object(@encryption_dictionary[:R]) >= 4
if actual_object(@encryption_dictionary)[:EncryptMetadata] == false
key << "\xFF\xFF\xFF\xFF".force_encoding(Encoding::ASCII_8BIT)
key << "\xFF\xFF\xFF\xFF".dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the interpreter requires dup it's a little odd... Maybe there's a way to indicate that a String literal is ASCII without creating an additional object (see: https://stackoverflow.com/a/15843685/4025095 )?

After all, the object's lifetime os very short (only created to be inserted into the existing String).

end
end
# 5) pass everything as a MD5 hash
Expand Down Expand Up @@ -137,7 +137,7 @@ def decrypt_AES(encrypted, encrypted_id, encrypted_generation, _encrypted_filter
object_key = @key.dup
object_key << [encrypted_id].pack('i')[0..2]
object_key << [encrypted_generation].pack('i')[0..1]
object_key << 'sAlT'.force_encoding(Encoding::ASCII_8BIT)
object_key << 'sAlT'.dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the interpreter requires dup it's a little odd... Maybe there's a way to indicate that a String literal is ASCII without creating an additional object (see: https://stackoverflow.com/a/15843685/4025095 )?

After all, the object's lifetime os very short (only created to be inserted into the existing String).

key_length = object_key.length < 16 ? object_key.length : 16

begin
Expand Down
22 changes: 11 additions & 11 deletions lib/combine_pdf/parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ class PDFParser
# string:: the data to be parsed, as a String object.
def initialize(string, options = {})
raise TypeError, "couldn't parse data, expecting type String" unless string.is_a? String
@string_to_parse = string.force_encoding(Encoding::ASCII_8BIT)
@string_to_parse = string.dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... why? Are we expecting frozen Strings?

In general I understand that this might be a correct approach that will minimize unexpected side-effects. However, on the other hand, this will use a lot of memory when reading larger PDF files.

@literal_strings = [].dup
@hex_strings = [].dup
@streams = [].dup
Expand Down Expand Up @@ -243,22 +243,22 @@ def _parse_
##########################################
elsif str = @scanner.scan(/\<[0-9a-fA-F]*\>/)
# warn "Found a hex string"
str = str.slice(1..-2).force_encoding(Encoding::ASCII_8BIT)
str = str.slice(1..-2).dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dup is redundant and will use up a bunch of memory. The dynamic String returned from String#slice should be mutable.

# str = "0#{str}" if str.length.odd?
out << unify_string([str].pack('H*').force_encoding(Encoding::ASCII_8BIT))
out << unify_string([str].pack('H*').dup.force_encoding(Encoding::ASCII_8BIT))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(again) This dup is redundant and will use up a bunch of memory. The dynamic String returned from Array#pack is mutable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

##########################################
## parse a space delimited Hex String
##########################################
elsif str = @scanner.scan(/\<[0-9a-fA-F\s]*\>/)
# warn "Found a space seperated hex string"
str = str.force_encoding(Encoding::ASCII_8BIT).split(/\s/).map! {|b| b.length.odd? ? "0#{b}" : b}
out << unify_string(str.pack('H*' * str.length).force_encoding(Encoding::ASCII_8BIT))
str = str.dup.force_encoding(Encoding::ASCII_8BIT).split(/\s/).map! {|b| b.length.odd? ? "0#{b}" : b}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why dup?

out << unify_string(str.pack('H*' * str.length).dup.force_encoding(Encoding::ASCII_8BIT))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dup is redundant and will use up a bunch of memory. The dynamic String returned from Array#pack is mutable.

##########################################
## parse a Literal String
##########################################
elsif @scanner.scan(/\(/)
# warn "Found a literal string"
str = ''.force_encoding(Encoding::ASCII_8BIT)
str = ''.dup.force_encoding(Encoding::ASCII_8BIT)
count = 1
while count > 0 && @scanner.rest?
scn = @scanner.scan_until(/[\(\)]/)
Expand All @@ -285,7 +285,7 @@ def _parse_
end
# The PDF formatted string is: str[0..-2]
# now starting to convert to regular string
str_bytes = str.force_encoding(Encoding::ASCII_8BIT)[0..-2].bytes.to_a
str_bytes = str.dup.force_encoding(Encoding::ASCII_8BIT)[0..-2].bytes.to_a
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why dup?

str = []
until str_bytes.empty?
case str_bytes[0]
Expand Down Expand Up @@ -335,7 +335,7 @@ def _parse_
str << str_bytes.shift
end
end
out << unify_string(str.pack('C*').force_encoding(Encoding::ASCII_8BIT))
out << unify_string(str.pack('C*').dup.force_encoding(Encoding::ASCII_8BIT))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(again) This dup is redundant and will use up a bunch of memory. The dynamic String returned from Array#pack is mutable.

##########################################
## parse a Dictionary
##########################################
Expand Down Expand Up @@ -367,10 +367,10 @@ def _parse_
# need to remove end of stream
if out.last.is_a? Hash
# out.last[:raw_stream_content] = str[0...-10] #cuts only one EON char (\n or \r)
out.last[:raw_stream_content] = unify_string str.sub(/(\r\n|\n|\r)?endstream\z/, '').force_encoding(Encoding::ASCII_8BIT)
out.last[:raw_stream_content] = unify_string str.sub(/(\r\n|\n|\r)?endstream\z/, '').dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(again) This dup is redundant and will use up a bunch of memory. The dynamic String used here should be guaranteed to be mutable.

else
warn 'Stream not attached to dictionary!'
out << str.sub(/(\r\n|\n|\r)?endstream\z/, '').force_encoding(Encoding::ASCII_8BIT)
out << str.sub(/(\r\n|\n|\r)?endstream\z/, '').dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(again) This dup is redundant and will use up a bunch of memory. The dynamic String used here should be guaranteed to be mutable.

end
##########################################
## parse an Object after finished
Expand Down Expand Up @@ -686,7 +686,7 @@ def serialize_objects_and_references

# All Strings are one String
def unify_string(str)
str.force_encoding(Encoding::ASCII_8BIT)
str.dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why dup?

@strings_dictionary[str] ||= str
end

Expand Down
4 changes: 2 additions & 2 deletions lib/combine_pdf/pdf_public.rb
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ def to_pdf(options = {})
xref = []
indirect_object_count = 1 # the first object is the null object
# write head (version and binanry-code)
out << "%PDF-#{@version}\n%\xFF\xFF\xFF\xFF\xFF\x00\x00\x00\x00".force_encoding(Encoding::ASCII_8BIT)
out << "%PDF-#{@version}\n%\xFF\xFF\xFF\xFF\xFF\x00\x00\x00\x00".dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the interpreter requires dup it's a little odd... Maybe there's a way to indicate that a String literal is ASCII without creating an additional object (see: https://stackoverflow.com/a/15843685/4025095 )?

After all, the object's lifetime os very short (only created to be inserted into the existing String).


# collect objects and set xref table locations
loc = 0
Expand All @@ -211,7 +211,7 @@ def to_pdf(options = {})
# when finished, remove the numbering system and keep only pointers
remove_old_ids
# output the pdf stream
out.join("\n".force_encoding(Encoding::ASCII_8BIT)).force_encoding(Encoding::ASCII_8BIT)
out.join("\n".dup.force_encoding(Encoding::ASCII_8BIT)).dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again... maybe see his SO answer:

https://stackoverflow.com/a/15843685/4025095

end

# this method returns all the pages cataloged in the catalog.
Expand Down
24 changes: 12 additions & 12 deletions lib/combine_pdf/renderer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,14 @@ def object_to_pdf(object)

def format_string_to_pdf(object)
obj_bytes = object.bytes.to_a
# object.force_encoding(Encoding::ASCII_8BIT)
# object.dup.force_encoding(Encoding::ASCII_8BIT)
if object.length == 0 || obj_bytes.min <= 31 || obj_bytes.max >= 127 # || (obj_bytes[0] != 68 object.match(/[^D\:\d\+\-Z\']/))
# A hexadecimal string shall be written as a sequence of hexadecimal digits (0-9 and either A-F or a-f)
# encoded as ASCII characters and enclosed within angle brackets (using LESS-THAN SIGN (3Ch) and GREATER- THAN SIGN (3Eh)).
"<#{object.unpack('H*')[0]}>".force_encoding(Encoding::ASCII_8BIT)
"<#{object.unpack('H*')[0]}>".dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this dup? in this recursive approach, the inner elements should be consumed even if they are frozen (I think)....

else
# a good fit for a Literal String or the string is a date (MUST be literal)
('(' + ([].tap { |out| obj_bytes.each { |byte| out.concat(STRING_REPLACEMENT_ARRAY[byte]) } } ).pack('C*') + ')').force_encoding(Encoding::ASCII_8BIT)
('(' + ([].tap { |out| obj_bytes.each { |byte| out.concat(STRING_REPLACEMENT_ARRAY[byte]) } } ).pack('C*') + ')').dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(again) This dup is redundant and will use up a bunch of memory. The dynamic String used here should be guaranteed to be mutable.

end
end

Expand Down Expand Up @@ -84,7 +84,7 @@ def format_name_to_pdf(object)
def format_array_to_pdf(object)
# An array shall be written as a sequence of objects enclosed in SQUARE BRACKETS (using LEFT SQUARE BRACKET (5Bh) and RIGHT SQUARE BRACKET (5Dh)).
# EXAMPLE [549 3.14 false (Ralph) /SomeName]
('[' + (object.collect { |item| object_to_pdf(item) }).join(' ') + ']').force_encoding(Encoding::ASCII_8BIT)
('[' + (object.collect { |item| object_to_pdf(item) }).join(' ') + ']').dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(again) This dup is redundant and will use up a bunch of memory. The dynamic String used here should be guaranteed to be mutable.

end

EMPTY_PAGE_CONTENT_STREAM = {is_reference_only: true, referenced_object: { indirect_reference_id: 0, raw_stream_content: '' }}
Expand All @@ -100,19 +100,19 @@ def format_hash_to_pdf(object)
end
object[:indirect_reference_id] ||= 0
object[:indirect_generation_number] ||= 0
return "#{object[:indirect_reference_id]} #{object[:indirect_generation_number]} R".force_encoding(Encoding::ASCII_8BIT)
return "#{object[:indirect_reference_id]} #{object[:indirect_generation_number]} R".dup.force_encoding(Encoding::ASCII_8BIT)
end

# if the object is indirect...
out = []
if object[:indirect_reference_id]
object[:indirect_reference_id] ||= 0
object[:indirect_generation_number] ||= 0
out << "#{object[:indirect_reference_id]} #{object[:indirect_generation_number]} obj\n".force_encoding(Encoding::ASCII_8BIT)
out << "#{object[:indirect_reference_id]} #{object[:indirect_generation_number]} obj\n".dup.force_encoding(Encoding::ASCII_8BIT)
if object[:indirect_without_dictionary]
out << object_to_pdf(object[:indirect_without_dictionary])
out << "\nendobj\n"
return out.join.force_encoding(Encoding::ASCII_8BIT)
return out.join.dup.force_encoding(Encoding::ASCII_8BIT)
end
end
# remove extra page references.
Expand All @@ -123,15 +123,15 @@ def format_hash_to_pdf(object)
# if the object is not a simple object, it is a dictionary
# A dictionary shall be written as a sequence of key-value pairs enclosed in double angle brackets (<<...>>)
# (using LESS-THAN SIGNs (3Ch) and GREATER-THAN SIGNs (3Eh)).
out << "<<\n".force_encoding(Encoding::ASCII_8BIT)
out << "<<\n".dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, what about a binary literal? SO: https://stackoverflow.com/a/15843685/4025095

object.each do |key, value|
out << "#{object_to_pdf key} #{object_to_pdf value}\n".force_encoding(Encoding::ASCII_8BIT) unless PDF::PRIVATE_HASH_KEYS.include? key
out << "#{object_to_pdf key} #{object_to_pdf value}\n".dup.force_encoding(Encoding::ASCII_8BIT) unless PDF::PRIVATE_HASH_KEYS.include? key
end
object.delete :Length
out << '>>'.force_encoding(Encoding::ASCII_8BIT)
out << "\nstream\n#{object[:raw_stream_content]}\nendstream".force_encoding(Encoding::ASCII_8BIT) if object[:raw_stream_content]
out << '>>'.dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, what about a binary literal? SO: https://stackoverflow.com/a/15843685/4025095

out << "\nstream\n#{object[:raw_stream_content]}\nendstream".dup.force_encoding(Encoding::ASCII_8BIT) if object[:raw_stream_content]
out << "\nendobj\n" if object[:indirect_reference_id]
out.join.force_encoding(Encoding::ASCII_8BIT)
out.join.dup.force_encoding(Encoding::ASCII_8BIT)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why dup the result?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched for .force_encoding and replaced it with .dup.force_encoding. Perhaps a too aggressive/naive approach.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps so... but perhaps not a bad idea.

I would consider automatically replacing all 'literal'.force_encoding(Encoding::ASCII_8BIT) with 'literal'.b ... an operation that could be automated by searching for '.force_encoding(Encoding::ASCII_8BIT) and replacing it with '.b

end

def actual_object(obj)
Expand Down
2 changes: 1 addition & 1 deletion test/combine_pdf/renderer_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ def test_object(object)

def test_numeric_array_to_pdf
input = [1.234567, 0.000054, 5, -0.000099]
expected = "[1.234567 0.000054 5 -0.000099]".force_encoding('BINARY')
expected = "[1.234567 0.000054 5 -0.000099]".dup.force_encoding('BINARY')
actual = TestRenderer.new.test_object(input)

assert_equal(expected, actual)
Expand Down