-
Notifications
You must be signed in to change notification settings - Fork 0
/
atom.xml
925 lines (723 loc) · 55.4 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title><![CDATA[Binspector]]></title>
<link href="http://binspector.github.io/atom.xml" rel="self"/>
<link href="http://binspector.github.io/"/>
<updated>2014-10-27T10:14:30-07:00</updated>
<id>http://binspector.github.io/</id>
<author>
<name><![CDATA[Foster Brereton]]></name>
</author>
<generator uri="http://octopress.org/">Octopress</generator>
<entry>
<title type="html"><![CDATA[Atoms and Structures]]></title>
<link href="http://binspector.github.io/blog/2014/10/24/atoms-and-structures/"/>
<updated>2014-10-24T16:48:59-07:00</updated>
<id>http://binspector.github.io/blog/2014/10/24/atoms-and-structures</id>
<content type="html"><![CDATA[<p>Binspector’s format grammar centers around two fundamental data types: the <em>structure</em> and the <em>atom</em>. While the introductory post provided an overview of these two types, in this post I will explain each in more fine-grained detail, along with a couple examples.</p>
<!-- more -->
<h1>Structures</h1>
<p>Structures are the containers used to describe zero or more <em>fields</em>, which themselves are either structures or atoms. Each grammar is required to have at least one structure, and the default structure Binspector will start analysis with is <code>main</code> <sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>. Structures can be defined in any order in the format grammar, but no two may have the same name.</p>
<p>Here is an example of a complete, well-formed and utterly boring format grammar:</p>
<pre><code>struct main
{
}
</code></pre>
<p>In the above example we have a single structure, <code>main</code>, which is empty. Structures can refer to other structures that have already been declared, and it is through this method that relationships can be defined within a format:</p>
<pre><code>struct c_string
{
// empty
}
struct main
{
c_string first_name;
c_string last_name;
}
</code></pre>
<p>Which will result in the following structure hierarchy, where <code>first_name</code> and <code>last_name</code> are both children of <code>main</code> and siblings of one another. Binspector processes fields in the order they are defined:</p>
<p><img class="center" src="http://binspector.github.io/images/structures.png"></p>
<p>Although we are using multiple structures to define a more complex grammar, Binspector will still process no binary data. What is missing are atoms, the second essential building block.</p>
<h1>Atoms</h1>
<p>Atoms are indivisible units of binary data- hence the name. Atoms provide a means of describing the type for a collection of bits: information on how the bits are to be interpreted. The atom is described with several attributes: the <em>type</em>, <em>identifier</em>, <em>size</em> and <em>offset</em>, each of which are detailed below.</p>
<h2>Type</h2>
<p>The type for an atom is made up of three distinct parts: sign, width, and endianness. Each atom declaration requires all three.</p>
<h3>Sign</h3>
<p>The sign of the atom can be one of three values: <code>signed</code>, <code>unsigned</code> or <code>float</code>, and have meaning similar to languages like C and C++. The use of <code>float</code> is restricted to atoms of 32 and 64 bits in size, and its use with other sizes will result in an error.</p>
<h3>Width</h3>
<p>The width of the atom is defined in the number of raw bits that make it up. Typically the values are byte-aligned (8, 16, etc.) but do not need to be. The maximum width of a single atom is 64 bits.</p>
<h3>Endianness</h3>
<p>The endianness of an atom can be one of two values: <code>big</code> or <code>little</code>. It is also possible to use an expression to represent the endianness of an atom. This is useful for some binary formats (e.g., tiff) where the reader of the binary file must discern the endianness of the data. In those situations the expression must return a boolean value. When the value is <code>true</code> the atom is <code>big</code> endian, otherwise it is <code>little</code> endian.</p>
<p>Though unnecessary, atoms of widths 8 or less still must specify an edian interpretation. This is a limitation of the language grammar.</p>
<h4>Example</h4>
<pre><code>struct main
{
float 32 big scale;
}
</code></pre>
<p>Here we have a single atom in <code>main</code> named <code>scale</code> to be interpreted as a 32-bit floating-point value. We have just given substance to a template file, as we have instructed Binspector to consume and interpret the first 32 bits of some binary format. Atoms inserted into a structure will be analyzed based on the current read position, which advances atom by atom.</p>
<h2>Size</h2>
<p>The identifier for the atom is its name. These must be unique within a structure. Additionally no field may be named <code>main</code> or <code>this</code>.</p>
<p>In addition to being singletons, atoms can also describe static or dynamic arrays. A field’s size is an optional parameter that instruct Binspector to instantiate multiple atoms under a single field. There are several field size declarations, and they all result in contiguous arrays. The declaration types are the <em>integer</em>, <em>while</em>, <em>terminator</em>, and <em>delimiter</em>. Integer arrays will be detailed below, while the others are described in a <a href="http://binspector.github.io/blog/2014/10/15/predicated-arrays/">previous post</a>. Size descriptions are appended to the field idenitifier with brackets <code>[ ]</code>. If an expression is used to describe the size of a field, it is evaluated at the time the field is to be read.</p>
<h3>Integer Size</h3>
<p>The integer field size is the most basic and is used to define a discrete array of entries. For example:</p>
<pre><code>struct main
{
unsigned 16 big magic_word[2];
}
</code></pre>
<p>Above we have described a single field <code>magic_word</code> that interprets two 16-bit values. An array of zero elements is valid. That being said it is possible to specify the format grammar to avoid them, making output easier to read.</p>
<p>It is also possible to derive the length of a field based on values previously found in the binary format. An example of this is <code>pascal_t</code>, covered in the <a href="http://binspector.github.io/blog/2014/10/13/binspector-a-binary-format-analysis-tool/">introductory post</a>.</p>
<h2>Offset</h2>
<p>Binary files are not always formatted a contiguous way. Sometimes they specify offsets relative to locations in the data where other data can be found. An example of this is the tiff IFD metadata format, where values beyond a certain size are appended to the end of the metadata block and offsets are specified within the metadata itself as to where the extended information resides. Binspector allows for offsets as a means for fetching remote data and analyzing it as part of a structure.</p>
<p>Offsets are always specified with absolute values. For binary formats that use relative offsets, the language has primitives to assist in converting between relative and absolute offsets.</p>
<h3>Example</h3>
<p>In the following example let us extend our <code>pascal_t</code> to include an extra field that specifies the remote location for the string data as an absolute offset:</p>
<pre><code>struct pascal_remote_absolute_t
{
unsigned 8 big length;
unsigned 32 big absolute_offset;
unsigned 8 big string[length] @ absolute_offset;
}
</code></pre>
<p>In the example our first atom is the length of the string and the second is the remote offset. Since the value is absolute we can use it without modification in the third atom definition. Binspector will seek to that absolute offset within the file and proceed to read <code>length</code> 8-bit values to constitute <code>string</code>.</p>
<p>In the case we are dealing with remote offsets we need to convert them to absolute offsets before Binspector can find the data correctly. In such case we need to know which field is the basis for the relative offset, and construct an expression to use its offset to compute the final, absolute offset. In a relative-offset-based remote pascal_t structure, given that the remote offset is relative to the length byte of the string, we might have:</p>
<pre><code>struct pascal_remote_relative_t
{
unsigned 8 big length;
unsigned 32 big relative_offset;
unsigned 8 big string[length] @ startof(@length) + relative_offset;
}
</code></pre>
<p>In this example we use the <code>startof()</code> routine to fetch the absolute offset of the length atom in the file and use it in conjunction with the relative offset of the string to compute the absolute offset of the string data.</p>
<p>Within routine calls it is necessary to refer to fields with a prefixed <code>@</code> symbol, otherwise the value of the field will be used instead of the field itself. This is loosely similar to pass-by-reference v. pass-by-value. However when the field is part of an expression the <code>@</code> should be omitted (e.g., <code>startof(main.foo.bar)</code>).</p>
<p>For readability’s sake it would be fine to use a <code>const</code> or <code>invis</code> field to construct the absolute offset before using it:</p>
<pre><code>struct pascal_remote_relative_t
{
unsigned 8 big length;
unsigned 32 big relative_offset;
const absolute_offset = startof(@length) + relative_offset;
unsigned 8 big string[length] @ absolute_offset;
}
</code></pre>
<p>Reading remote data does not affect the read position once the reading is complete. For example given the following fields:</p>
<pre><code> unsigned 8 big offset;
unsigned 8 big remote_data @ offset;
unsigned 8 big some_value;
</code></pre>
<p>The bytes for <code>offset</code> and <code>some_value</code> are adjacent to one another in the binary file. However in the analysis <code>remote_data</code> will be between them as the remote data is brought in to its interpreted location in the file at that time. In addition an analyzed structure’s starting and ending offsets (<code>startof()</code> and <code>endof()</code> values) are only affected by its nonremote (local) data, though this may change in a later version.</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>It is possible to specify a different starting structure from the command line.<a href="#fnref:1" rev="footnote">↩</a></p></li>
</ol>
</div>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Predicated Arrays]]></title>
<link href="http://binspector.github.io/blog/2014/10/15/predicated-arrays/"/>
<updated>2014-10-15T10:20:35-07:00</updated>
<id>http://binspector.github.io/blog/2014/10/15/predicated-arrays</id>
<content type="html"><![CDATA[<p>It is normal for formats to specify an array of data that is not length-prefixed. Instead, the format usually mandates some kind of marker to signify the end of the array. The canonical example of this is a null-terminated C string. We call these arrays <em>predicated</em> because there is a check- a predicate- that when met denotes the end of the array. In this post we’ll take a look at some of the tools available within Binspector to handle these kinds of arrays.</p>
<!-- more -->
<h2>Terminators</h2>
<p>If an atom specifies a terminator for its array size, Binspector will continue to grow the array until the terminator is found. The terminator is then included in the array. Using the above example, a null-terminated string would take this form:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>unsigned 8 big string[terminator: 0];</span></code></pre></td></tr></table></div></figure>
<p>One might visualize Binspector’s process like so:</p>
<p><img class="center" src="http://binspector.github.io/images/terminator.png"></p>
<p>There are a handful of restrictions to the use of terminators. First, terminators are always integer values, and cannot be used for structure arrays. Also, because the terminator is appended at the end of the resulting array, it must be the same type as the atom it terminates.</p>
<h2>Delimiters</h2>
<p>An atom with a size delimiter is very similar to one with a terminator, however there are two distinctions between them. The first is that the delimited value is <em>not</em> included in the atom’s resulting array. As a consequence of the this the second difference is that the delimiter’s type need not be the same as the array it delimits.</p>
<p>The most common use case for a delimited atom declaration is when Binspector should skip over some uninteresting portion of a binary file until a sought-after piece is found. For example in a JPEG grammar one might want to skip over the image data stream, requiring a delimiter field until the end of image marker is found:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>struct main
</span><span class='line'>{
</span><span class='line'> //... prior JPEG template file declaration
</span><span class='line'> unsigned 8 big image_stream[delimiter: 0xFFD9];
</span><span class='line'>
</span><span class='line'> unsigned 16 big eoi_marker; // will be 0xFFD9
</span><span class='line'>}</span></code></pre></td></tr></table></div></figure>
<p>Here we have an 8-bit array filled with image stream data that the format grammar is uninterested in. The delimiter is a 16-bit value, which is fine because the value will not be included in <code>image_stream</code>. As noted in the example the following 16-bit value will be <code>0xFFD9</code>.</p>
<p>The bit length of the delimiter is deduced from its value rounded to a byte. Like terminators, delimiters cannot be applied to structures (<em>note: why not?</em>). Finally, delimiters are far more efficient than <code>peek</code> when trying to do look-ahead processing.</p>
<h2>While</h2>
<p>The final predicate that can be applied to arrays is the <code>while</code> statement, which includes an expression that is evaluated after every element of the array is read. <code>while</code> statements <em>can</em> be used for the size of a structure array, so there can be many bytes read between subsequent evaluations of the <code>while</code> predicate. The most common use case for <code>while</code> predicates is via slots and signals:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>struct main
</span><span class='line'>{
</span><span class='line'> slot done = false;
</span><span class='line'>
</span><span class='line'> other_structure_t my_array[while: !done];
</span><span class='line'>}</span></code></pre></td></tr></table></div></figure>
<p>Binspector is depending on something inside <code>other_structure_t</code> to modify the <code>done</code> slot via a signal. If the signal never happens, the array will expand until the end of the binary is found, and should be considered a bug in the grammar.</p>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Sentries]]></title>
<link href="http://binspector.github.io/blog/2014/10/14/sentries/"/>
<updated>2014-10-14T20:43:51-07:00</updated>
<id>http://binspector.github.io/blog/2014/10/14/sentries</id>
<content type="html"><![CDATA[<p>Binspector can analyze a binary file and report to the user if the file is <em>well-formed</em> or not, that is, if the file passes analysis. While <code>true</code> is a straightforward answer, <code>false</code> comes with a host of complications. Specifically, what was it about the file that caused the analysis to fail? Was there some invariant violated, a read that went off into the weeds… what? Validation works best when it fails <em>as fast as it can</em>, because the closer one halts to the actual point of failure, the more information can be gathered about it.</p>
<p>Sentries are one way to facilitate failing as fast as possible during file validation. So how do they work?</p>
<!-- more -->
<p>File formats such a PNG and TIFF contain data wrapped in length-prefixed blocks. Sometimes the format is completely block-based; sometimes it’s just substructures that are. For our purposes lets modify our original sample format grammar to be length-prefixed:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>struct pascal_t
</span><span class='line'>{
</span><span class='line'> unsigned 8 big length;
</span><span class='line'> unsigned 8 big string[length];
</span><span class='line'>
</span><span class='line'> summary str(@string);
</span><span class='line'>}
</span><span class='line'>
</span><span class='line'>struct user_name_t
</span><span class='line'>{
</span><span class='line'> unsigned 16 big length;
</span><span class='line'> pascal_t first;
</span><span class='line'> pascal_t last;
</span><span class='line'>
</span><span class='line'> summary summaryof(first), " ", summaryof(last);
</span><span class='line'>}</span></code></pre></td></tr></table></div></figure>
<p>To keep our binary file up to speed with the grammar, we prefix <code>file.bin</code> with two bytes that indicate the length of the block:</p>
<p><img class="left" src="http://binspector.github.io/images/binfile_l.png"></p>
<p>If, in the course of analyzing one of the <code>pascal_t</code>s, a <code>length</code> is larger or smaller than it should be, we won’t find out about it until the parse is completed. Given a malformed binary file:</p>
<p><img class="left" src="http://binspector.github.io/images/binfile_lbad.png"></p>
<p>The analysis result doesn’t give us much to go on:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>$ binspector -t format.bfft -i file.bin -s user_name_t
</span><span class='line'>error: EOF reached. Consider using the eof slot.
</span><span class='line'>in file: format.bfft:3
</span><span class='line'>$main$</span></code></pre></td></tr></table></div></figure>
<p>The key piece of information we need to leverage is <code>main.length</code>. If we know the scope to which that length applies, we could inform Binspector of a boundary that must be met exactly by the time that scope ends. The boundary is specified with the <code>sentry</code> declaration:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>struct pascal_t
</span><span class='line'>{
</span><span class='line'> unsigned 8 big length;
</span><span class='line'> unsigned 8 big string[length];
</span><span class='line'>
</span><span class='line'> summary str(@string);
</span><span class='line'>}
</span><span class='line'>
</span><span class='line'>struct user_name_t
</span><span class='line'>{
</span><span class='line'> unsigned 16 big length;
</span><span class='line'>
</span><span class='line'> sentry (length)
</span><span class='line'> {
</span><span class='line'> pascal_t first;
</span><span class='line'> pascal_t last;
</span><span class='line'> }
</span><span class='line'>
</span><span class='line'> summary summaryof(first), " ", summaryof(last);
</span><span class='line'>}</span></code></pre></td></tr></table></div></figure>
<p>And the Binspector output is more informative:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>$ binspector -t format.bfft -i file.bin -s user_name_t
</span><span class='line'>main sentry barrier breach
</span><span class='line'>main sentry barrier breach
</span><span class='line'>error: EOF reached. Consider using the eof slot.
</span><span class='line'>while analyzing: main.length
</span><span class='line'>in file: format.bfft:3
</span><span class='line'>$main$</span></code></pre></td></tr></table></div></figure>
<p>I’ll be the first to admit the sentry error reporting needs to be cleaned up, but let me break down what Binspector is trying to say. The two key bits of information are <code>main sentry barrier breach</code> and the point the grammar failed, namely <code>format.bfft:3</code>. Binspector was in the process of executing the line found at <code>format.bfft:3</code>, namely, the <code>length</code> of a <code>pascal_t</code>, when the sentry established by <code>main.length</code> was overrun.</p>
<p>If the <code>length</code> value is malformed and specifies a larger block than actual data:</p>
<p><img class="left" src="http://binspector.github.io/images/binfile_lbad2.png"></p>
<p>We get notified of that in turn:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>$ binspector -t format.bfft -i file.bin -s user_name_t
</span><span class='line'>WARNING: After sentry, read position should be 34 but instead is 18.
</span><span class='line'>$main$</span></code></pre></td></tr></table></div></figure>
<p>Notice in <em>both</em> cases, Binspector still drops you into a command-line interface. This gives the user the ability to navigate the analysis up to the point of failure in an attempt to discern where things went wrong.</p>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[A Hairbrained Approach to Security Testing]]></title>
<link href="http://binspector.github.io/blog/2014/10/13/a-hairbrained-approach-to-security-testing/"/>
<updated>2014-10-13T23:56:06-07:00</updated>
<id>http://binspector.github.io/blog/2014/10/13/a-hairbrained-approach-to-security-testing</id>
<content type="html"><![CDATA[<p>Let’s go back to a structure defined in the introductory post:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>struct pascal_t
</span><span class='line'>{
</span><span class='line'> unsigned 8 big length;
</span><span class='line'> unsigned 8 big string[length];
</span><span class='line'>}</span></code></pre></td></tr></table></div></figure>
<p>If you were testing an application that read a <code>pascal_t</code>, what kind of data would you feed it in an attempt to break it? One strategy might be to throw random data at the program: this is called <em>fuzzing</em>. Because reading <code>string</code> is dependent upon the value of <code>length</code>, it would follow that fuzzing <code>length</code> is more likely to surface vulnerabilities than fuzzing <code>string</code>. This is a classic buffer overflow exploitation.</p>
<p>Binspector knows a lot about a file being analyzed. As it turns out, the knowledge it collects makes this kind of intelligent fuzzing heuristic pretty straightforward to implement.</p>
<!-- more -->
<p>Consider the following change to the sample binary:</p>
<p><img class="left" src="http://binspector.github.io/images/binfile_f.png"></p>
<p>According to the format grammar, the binary file is no longer <em>well-formed</em> because the <code>length</code> does not match the amount of ensuing data. At this point, what happens during reading depends entirely on the application. For example, binspector will produce the following:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>$ binspector -i file.bin -t format.bfft -s user_name_t
</span><span class='line'>error: EOF reached. Consider using the eof slot.
</span><span class='line'>in file: format.bfft:3
</span><span class='line'>$main$</span></code></pre></td></tr></table></div></figure>
<p>Binspector (and any other application reading a <code>pascal_t</code>) needs <code>length</code> to derive the contents of <code>string</code>. Intelligent fuzzing is based on the observation that the more interesting values are the ones used to drive further reading of the file. With a well-formed binary and an associated format grammar, Binspector can produce a series of derivative files that have been strategically altered. (The fuzzing engine used to be a separate tool I called Hairbrain. Despite my love of the name, it was easier to maintain the tools as one codebase than keep them apart.)</p>
<h2>Integral Attacks</h2>
<p>The first attack type starts with Binspector keeping track of the atoms in the binary that were evaluated to continue analysis. Since Binspector knows the types of these atoms (that is, how they will be interpreted) it can tweak them to try and throw off file reading code. Let’s take a look at what Binspector produces with our sample file and grammar:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>~/Desktop/binsample$ binspector -i file.bin -t format.bfft -s user_name_t -m fuzz
</span><span class='line'>Scanned 21 nodes
</span><span class='line'>Fuzzing 2 weak points . done.
</span><span class='line'>Generated 4 files</span></code></pre></td></tr></table></div></figure>
<p>Binspector has generated four files, each one a corruption of <code>file.bin</code> in a known way. It also produces an attack summary file which details what it has done:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>main.first.length
</span><span class='line'> ? attack_type : usage
</span><span class='line'> ? use_count : 1
</span><span class='line'> ? offset : 0
</span><span class='line'> ? bits : 8
</span><span class='line'> ? type : unsigned
</span><span class='line'> ? big_endian : true
</span><span class='line'> > file_0_z.bin
</span><span class='line'> > file_0_o.bin
</span><span class='line'>main.last.length
</span><span class='line'> ? attack_type : usage
</span><span class='line'> ? use_count : 1
</span><span class='line'> ? offset : 7
</span><span class='line'> ? bits : 8
</span><span class='line'> ? type : unsigned
</span><span class='line'> ? big_endian : true
</span><span class='line'> > file_7_z.bin
</span><span class='line'> > file_7_o.bin</span></code></pre></td></tr></table></div></figure>
<p>The file is delineated per-attack. You will notice there are two atoms Binspector decided to attack: the <code>length</code>s of the two <code>pascal_t</code> structures. The lines prefixed by <code>?</code> reveal what Binspector knows about those particular atoms that are relevant to fuzzing them. The lines prefixed by <code>></code> reveal files derived from the known good file but that have been attacked. For the case of integer atoms it changes the values to all-zeroes (<code>_z</code>) and all-ones (<code>_o</code>).</p>
<p>You now have four files against which you can harden the file code of your <code>pascal_t</code>-reading application, each of which may throw it completely off the rails.</p>
<h2>Shuffle Attacks</h2>
<p>Many file formats or substructures within them are block-based. For example, a PNG contains a series of chunks that flesh out its contents. A PNG always starts with an IHDR chunk, and always finishes with an IEND chunk, and in the middle can be a varying number of others. For example:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>IHDR | gAMA | sBIT | bKGD | oFFs | pCAL | pHYs | tIME | tEXt | IDAT | zTXt | IEND</span></code></pre></td></tr></table></div></figure>
<p>A shuffle attack is based on the observation that contiguous chunks of data may affect input code differently if they are rearranged. We know that these chunks together occupy <em>N</em> bytes in the file, and this is true regardless of the order they are in. Therefore we are free to shuffle them in-place, and we know the rest of the file should still hold up. For example what if we tried to open the above PNG that had been altered thusly:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>IHDR | bKGD | gAMA | IDAT | oFFs | pCAL | pHYs | sBIT | tEXt | tIME | zTXt| IEND</span></code></pre></td></tr></table></div></figure>
<p>Whether or not the above reordering still constitutes a valid PNG is irrelevant. What matters is <em>how input code will handle it</em>. It may look enough like a PNG to begin the file input process, only to be thrown into the weeds when it is faced with an unexpected chunk.</p>
<p>Since we may not want to shuffle <em>every</em> array found in a binary, Binspector has a special keyword to enable this kind of attack. Lets modify the format grammar slightly and see what kind of fuzzing result comes out:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>struct pascal_t
</span><span class='line'>{
</span><span class='line'> unsigned 8 big length;
</span><span class='line'> unsigned 8 big string[length];
</span><span class='line'>
</span><span class='line'> summary str(@string);
</span><span class='line'>}
</span><span class='line'>
</span><span class='line'>struct user_name_t
</span><span class='line'>{
</span><span class='line'> pascal_t name[2] shuffle;
</span><span class='line'>
</span><span class='line'> summary summaryof(name[0]), " ", summaryof(name[1]);
</span><span class='line'>}</span></code></pre></td></tr></table></div></figure>
<p>In this format grammar we have consolidated <code>first</code> and <code>last</code> into a two-element array, and have suffixed the statement with the <code>shuffle</code> keyword. This lets Binspector know that we are interested in producing a shuffle attack on what it finds. The resulting fuzz produces the following attack summary:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>main.name
</span><span class='line'> ? attack_type : shuffle
</span><span class='line'> ? array_size : 2
</span><span class='line'> > file_0_s_1.bin
</span><span class='line'>main.name[0].length
</span><span class='line'> ? attack_type : usage
</span><span class='line'> ? use_count : 1
</span><span class='line'> ? offset : 0
</span><span class='line'> ? bits : 8
</span><span class='line'> ? type : unsigned
</span><span class='line'> ? big_endian : true
</span><span class='line'> > file_0_0_z.bin
</span><span class='line'> > file_0_0_o.bin
</span><span class='line'>main.name[1].length
</span><span class='line'> ? attack_type : usage
</span><span class='line'> ? use_count : 1
</span><span class='line'> ? offset : 7
</span><span class='line'> ? bits : 8
</span><span class='line'> ? type : unsigned
</span><span class='line'> ? big_endian : true
</span><span class='line'> > file_0_7_z.bin
</span><span class='line'> > file_0_7_o.bin</span></code></pre></td></tr></table></div></figure>
<p>In a hex editor, it is easy to see the shuffle file’s contents have been reordered:</p>
<p><img class="left" src="http://binspector.github.io/images/binfile_fs.png"></p>
<p>In this particular case the shuffle attack is unlikely to reveal any problems with an application’s read code. However (as is the case with PNG) it does not take much to turn an innocuous file on its head, and perhaps cause input code to nosedive in turn.</p>
<p>Fuzzing as a means of security testing is as much art as it is science. Many enhancements can go into Binspector’s current engine to make it more useful than it is (for example, producing files that have been attacked in several ways, not just one). Also, intelligent fuzzing itself should be augmented with more broad-spectrum tests, including more traditional fuzzing. Nevertheless, Binspector does provide a valuable subset of attacks, and makes them easily available to users.</p>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Big Trouble in Little Endian]]></title>
<link href="http://binspector.github.io/blog/2014/10/13/big-trouble-in-little-endian/"/>
<updated>2014-10-13T21:00:07-07:00</updated>
<id>http://binspector.github.io/blog/2014/10/13/big-trouble-in-little-endian</id>
<content type="html"><![CDATA[<p>Exif is a body of specifications related to the embedding of metadata inside images. (Interestingly enough, when embedded inside a JPEG it uses a structure based on TIFF, making for a well-formed TIFF within a JPEG.) One of the features, for better or for worse, is that Exif can be stored either big- or little-endian. It is the responsibility of the input code to detect which mode the incoming Exif is in, and to interpret ensuing values correctly.</p>
<!-- more -->
<p>The start of the Exif data block will always be a 16-bit value. How it is interpreted will determine if the ensuing data is big- or little-endian:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>struct tiff_t
</span><span class='line'>{
</span><span class='line'> unsigned 16 big header; // 0x4949 (little) or 0x4d4d (big)
</span><span class='line'> ...</span></code></pre></td></tr></table></div></figure>
<p>While Binspector handles endianness in atoms, the last thing I wanted to do was double-specify the format, one for each endian option. What I wanted was to be able to predicate the endianness of an atom in an <em>expression</em>, not just a keyword (<code>big</code> v. <code>little</code>), and to have that expression take file data into consideration in the process (<code>header == 0x4d4d</code>). So let it be done:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>struct tiff_t
</span><span class='line'>{
</span><span class='line'> unsigned 16 big header; // 0x4949 (little) or 0x4d4d (big)
</span><span class='line'>
</span><span class='line'> enumerate (header)
</span><span class='line'> {
</span><span class='line'> 0x4D4D : const BE_k = true;
</span><span class='line'> 0x4949 : const BE_k = false;
</span><span class='line'> }
</span><span class='line'>
</span><span class='line'> unsigned 16 BE_k tag_mark;
</span><span class='line'>
</span><span class='line'> invariant ok_tag_mark = tag_mark == 42;
</span><span class='line'>}</span></code></pre></td></tr></table></div></figure>
<p>The advantage of the <code>enumerate</code> construct above is that if <code>header</code> is neither of the two expected values, the analysis will fail. With the <code>typedef</code> operation, we can add a little syntactic sugar to make subsequent atoms even more readable:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>struct tiff_t
</span><span class='line'>{
</span><span class='line'> unsigned 16 big header; // 0x4949 (little) or 0x4d4d (big)
</span><span class='line'>
</span><span class='line'> enumerate (header)
</span><span class='line'> {
</span><span class='line'> 0x4D4D : const BE_k = true;
</span><span class='line'> 0x4949 : const BE_k = false;
</span><span class='line'> }
</span><span class='line'>
</span><span class='line'> typedef unsigned 32 BE_k long_t;
</span><span class='line'> typedef unsigned 16 BE_k word_t;
</span><span class='line'>
</span><span class='line'> word_t tag_mark;
</span><span class='line'>
</span><span class='line'> invariant ok_tag_mark = tag_mark == 42;
</span><span class='line'>
</span><span class='line'> long_t ifd_offset; // usually 8 for IFD 0
</span><span class='line'>
</span><span class='line'> // ...and on to teasing apart the IFD structure
</span><span class='line'>}</span></code></pre></td></tr></table></div></figure>
<p><code>ifd_t</code> will inherit typedefs declared in its ancestry, which means it (and its substructures) can take advantage of <code>word_t</code>, <code>long_t</code>, etc. The endian handling is limited to the top-level <code>tiff_t</code> structure, and everything it contains is as readable as if endianness was never an issue in the first place.</p>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Binspector: A Binary Format Analysis Tool]]></title>
<link href="http://binspector.github.io/blog/2014/10/13/binspector-a-binary-format-analysis-tool/"/>
<updated>2014-10-13T11:17:31-07:00</updated>
<id>http://binspector.github.io/blog/2014/10/13/binspector-a-binary-format-analysis-tool</id>
<content type="html"><![CDATA[<p>Binary formats and files are inescapable. Although optimal for computers to read, sussing them manually requires exacting patience. Every developer has a moment in their career with a hex editor open, staring blankly at screenfuls of <code>0xDEADBEEF</code> or UTF-8 encoded multibyte unicode. Binspector<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup> was born when I found myself scouring JPEGs to make sure their Exif and IPTC/IIM metadata blocks were telling consistent stories. The tool has evolved into something genuinely useful, and I am excited to share it in the hopes others will benefit from it as well.</p>
<!-- more -->
<p>The goal of Binspector is to help bridge the gap between binary formats and the developers who wrestle with them. It does this in many ways, but this post seeks to cover the two most significant. First, Binspector leverages a domain-specific language to formally describe a binary format. Second, the tool provides interactive analysis into the contents of a binary file.</p>
<h2>Formal Description</h2>
<p>Binspector seeks to formalize both the <em>interpretation</em> and the <em>context</em> of binary data. The interpretation of data is simply the value of its bits- its size, endianness, and sign. In Binspector these declarations are called <strong><em>atoms</em></strong>. Context is the larger scope in which the data is found: now that we know what the value <em>is</em>, what does the value <em>mean</em>? The basic contextual building block in Binspector is called a <strong><em>structure</em></strong>. Combined, these two attributes form the type of binary data we are dealing with.</p>
<h3>Example</h3>
<p>The following is a small grammar describing two structures and the relationship between them:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>struct pascal_t
</span><span class='line'>{
</span><span class='line'> unsigned 8 big length;
</span><span class='line'> unsigned 8 big string[length];
</span><span class='line'>}
</span><span class='line'>
</span><span class='line'>struct user_name_t
</span><span class='line'>{
</span><span class='line'> pascal_t first;
</span><span class='line'> pascal_t last;
</span><span class='line'>}</span></code></pre></td></tr></table></div></figure>
<p>It is easy to see that a <code>pascal_t</code> is a length-prefixed 8-bit string of some kind. The interpretation of the bits are described in <code>pascal_t</code>: everything here is a byte<sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>. Also, there is a relationship that is defined when <code>string</code>’s value is derived in part from <code>length</code>. <code>user_name_t</code> is comprised of two <code>pascal_t</code>s, one for the first name and one for the last. This grammar format produces an abstract syntax tree:</p>
<p><img class="center" src="http://binspector.github.io/images/user_name_t.png"></p>
<p>It is this structured formalization (both the interpretation of binary data and the relationships between the data) that makes possible the rest of what Binspector can do. Once a format grammar has been applied to an actual binary file, this tree becomes concrete and can be analyzed.</p>
<h2>Analysis</h2>
<p>Binspector attempts to interpret a binary file against a format grammar. During this process the tool builds out a tree that reports to the user what it has found. This tree can then be analyzed by a user in one of several ways, the most common of which is a command-line interface.</p>
<h3>Example</h3>
<p>Let’s take the above description and apply it to the following binary data, seen here in the excellent <a href="http://ridiculousfish.com/hexfiend/">Hex Fiend</a>:</p>
<p><img class="left" src="http://binspector.github.io/images/binfile.png"></p>
<p>We would invoke Binspector with something like:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>~$ binspector -i file.bin -t format.bfft -s user_name_t</span></code></pre></td></tr></table></div></figure>
<p>The tool will generate a concrete syntax tree based on the abstract tree defined in the format grammar:</p>
<p><img class="center" src="http://binspector.github.io/images/name_processed.png"></p>
<p>Binspector will drop into a command-line interface so we can navigate the parse tree. To facilitate navigation, think of structures as folders and atoms as files:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>$main$ ls
</span><span class='line'>(user_name_t) main
</span><span class='line'>{
</span><span class='line'> (pascal_t) first
</span><span class='line'> (pascal_t) last
</span><span class='line'>}
</span><span class='line'>$main$ cd first
</span><span class='line'>$main.first$ ls
</span><span class='line'>(pascal_t) first
</span><span class='line'>{
</span><span class='line'> (u8) length: 6
</span><span class='line'> (u8) string[6]
</span><span class='line'>}
</span><span class='line'>$main.first$ cd string
</span><span class='line'>$main.first.string$ ls
</span><span class='line'>(u8) string[6]
</span><span class='line'>{
</span><span class='line'> (u8) [0]: 70
</span><span class='line'> (u8) [1]: 111
</span><span class='line'> (u8) [2]: 115
</span><span class='line'> (u8) [3]: 116
</span><span class='line'> (u8) [4]: 101
</span><span class='line'> (u8) [5]: 114
</span><span class='line'>}</span></code></pre></td></tr></table></div></figure>
<p>The <code>$</code>s bookend the current working branch in the analysis tree, and you can navigate into and out of structures with the <code>cd</code> command. Likewise, the <code>ls</code> command lists the contents of the current structure. As we navigate, we can get specific information about otherwise raw data. For example, to see a breakdown of a specific field we use the <code>detail_field</code> command (or just <code>df</code>). These details include the location in the file the field represents, the raw bits used for interpretation, and its value:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>$main.first.string$ detail_field this[2]
</span><span class='line'> path: main.first.string[2]
</span><span class='line'> format: 8-bit unsigned
</span><span class='line'> offset: 3
</span><span class='line'> raw: 0x73
</span><span class='line'> value: 115 (0x73)
</span><span class='line'>$main.first.string$</span></code></pre></td></tr></table></div></figure>
<p>These details are confirmed by looking at the same byte in a hex editor:</p>
<p><img class="left" src="http://binspector.github.io/images/binfile_s.png"></p>
<p>Over a dozen analytic operations are available to users from the command-line interface.</p>
<p>(It is important to note here that the Binspector CLI is in no way POSIX compliant. It just borrows a handful of terms from that interface to make navigating with a command line more approachable.)</p>
<p>All this is well and good, but it still requires the user go pretty far into the tree to see valuable data. What’s needed is more meaning in a structure, the bubbling up of information to better understand what is going on. Binspector can handle that with a couple additions to the format grammar:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>struct pascal_t
</span><span class='line'>{
</span><span class='line'> unsigned 8 big length;
</span><span class='line'> unsigned 8 big string[length];
</span><span class='line'>
</span><span class='line'> summary str(@string);
</span><span class='line'>}
</span><span class='line'>
</span><span class='line'>struct user_name_t
</span><span class='line'>{
</span><span class='line'> pascal_t first;
</span><span class='line'> pascal_t last;
</span><span class='line'>
</span><span class='line'> summary summaryof(first), " ", summaryof(last);
</span><span class='line'>}</span></code></pre></td></tr></table></div></figure>
<p>Lines 6 and 14 now contain <code>summary</code> statements which are expressions intended to summarize the contents of a structure. <code>user_name_t</code> leverages the summaries of the structures it contains with the <code>summaryof</code> command. The resulting output in the command-line interface is now far more helpful:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>$main$ ls
</span><span class='line'>(user_name_t) main (Foster Brereton)
</span><span class='line'>{
</span><span class='line'> (pascal_t) first (Foster)
</span><span class='line'> (pascal_t) last (Brereton)
</span><span class='line'>}</span></code></pre></td></tr></table></div></figure>
<p>Though the use case may be fictional, it is easy to see the general usefulness of these capabilities for real-world formats.</p>
<h1>Open Source</h1>
<p>I am excited to make Binspector available as open source software. The repository is on GitHub and includes build scripts, sources, documentation, and some format grammars (currently known as <code>bfft</code>s). I have been compiling a list of features and changes that I would like to see happen in the tool and hope the community gets involved. More importantly I would love to see a community-built corpus of format grammars developed and shared.</p>
<p>So please, grab a copy of the sources, kick the tires on the tool, and let me know what you think!</p>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<p>Though a portmanteau of “binary inspector”, the <em>i</em> in Binspector is short (rhymes with “tin”).<a href="#fnref:1" rev="footnote">↩</a></p></li>
<li id="fn:2">
<p>The endianness specifier <code>big</code> is required even for 8-bit values; this is a limitation of the language.<a href="#fnref:2" rev="footnote">↩</a></p></li>
</ol>
</div>
]]></content>
</entry>
</feed>