summaryrefslogtreecommitdiffstats
path: root/python/pyasn1/doc/codecs.html
diff options
context:
space:
mode:
Diffstat (limited to 'python/pyasn1/doc/codecs.html')
-rw-r--r--python/pyasn1/doc/codecs.html503
1 files changed, 503 insertions, 0 deletions
diff --git a/python/pyasn1/doc/codecs.html b/python/pyasn1/doc/codecs.html
new file mode 100644
index 000000000..9c2c36ed6
--- /dev/null
+++ b/python/pyasn1/doc/codecs.html
@@ -0,0 +1,503 @@
+<html>
+<title>
+PyASN1 codecs
+</title>
+<head>
+</head>
+<body>
+<center>
+<table width=60%>
+<tr>
+<td>
+<h3>
+2. PyASN1 Codecs
+</h3>
+
+<p>
+In ASN.1 context,
+<a href=http://en.wikipedia.org/wiki/Codec>codec</a>
+is a program that transforms between concrete data structures and a stream
+of octets, suitable for transmission over the wire. This serialized form of
+data is sometimes called <i>substrate</i> or <i>essence</i>.
+</p>
+
+<p>
+In pyasn1 implementation, substrate takes shape of Python 3 bytes or
+Python 2 string objects.
+</p>
+
+<p>
+One of the properties of a codec is its ability to cope with incomplete
+data and/or substrate what implies codec to be stateful. In other words,
+when decoder runs out of substrate and data item being recovered is still
+incomplete, stateful codec would suspend and complete data item recovery
+whenever the rest of substrate becomes available. Similarly, stateful encoder
+would encode data items in multiple steps waiting for source data to
+arrive. Codec restartability is especially important when application deals
+with large volumes of data and/or runs on low RAM. For an interesting
+discussion on codecs options and design choices, refer to
+<a href=http://directory.apache.org/subprojects/asn1/>Apache ASN.1 project</a>
+.
+</p>
+
+<p>
+As of this writing, codecs implemented in pyasn1 are all stateless, mostly
+to keep the code simple.
+</p>
+
+<p>
+The pyasn1 package currently supports
+<a href=http://en.wikipedia.org/wiki/Basic_encoding_rules>BER</a> codec and
+its variations --
+<a href=http://en.wikipedia.org/wiki/Canonical_encoding_rules>CER</a> and
+<a href=http://en.wikipedia.org/wiki/Distinguished_encoding_rules>DER</a>.
+More ASN.1 codecs are planned for implementation in the future.
+</p>
+
+<a name="2.1"></a>
+<h4>
+2.1 Encoders
+</h4>
+
+<p>
+Encoder is used for transforming pyasn1 value objects into substrate. Only
+pyasn1 value objects could be serialized, attempts to process pyasn1 type
+objects will cause encoder failure.
+</p>
+
+<p>
+The following code will create a pyasn1 Integer object and serialize it with
+BER encoder:
+</p>
+
+<table bgcolor="lightgray" border=0 width=100%><TR><TD>
+<pre>
+>>> from pyasn1.type import univ
+>>> from pyasn1.codec.ber import encoder
+>>> encoder.encode(univ.Integer(123456))
+b'\x02\x03\x01\xe2@'
+>>>
+</pre>
+</td></tr></table>
+
+<p>
+BER standard also defines a so-called <i>indefinite length</i> encoding form
+which makes large data items processing more memory efficient. It is mostly
+useful when encoder does not have the whole value all at once and the
+length of the value can not be determined at the beginning of encoding.
+</p>
+
+<p>
+<i>Constructed encoding</i> is another feature of BER closely related to the
+indefinite length form. In essence, a large scalar value (such as ASN.1
+character BitString type) could be chopped into smaller chunks by encoder
+and transmitted incrementally to limit memory consumption. Unlike indefinite
+length case, the length of the whole value must be known in advance when
+using constructed, definite length encoding form.
+</p>
+
+<p>
+Since pyasn1 codecs are not restartable, pyasn1 encoder may only encode data
+item all at once. However, even in this case, generating indefinite length
+encoding may help a low-memory receiver, running a restartable decoder,
+to process a large data item.
+</p>
+
+<table bgcolor="lightgray" border=0 width=100%><TR><TD>
+<pre>
+>>> from pyasn1.type import univ
+>>> from pyasn1.codec.ber import encoder
+>>> encoder.encode(
+... univ.OctetString('The quick brown fox jumps over the lazy dog'),
+... defMode=False,
+... maxChunkSize=8
+... )
+b'$\x80\x04\x08The quic\x04\x08k brown \x04\x08fox jump\x04\x08s over \
+t\x04\x08he lazy \x04\x03dog\x00\x00'
+>>>
+>>> encoder.encode(
+... univ.OctetString('The quick brown fox jumps over the lazy dog'),
+... maxChunkSize=8
+... )
+b'$7\x04\x08The quic\x04\x08k brown \x04\x08fox jump\x04\x08s over \
+t\x04\x08he lazy \x04\x03dog'
+</pre>
+</td></tr></table>
+
+<p>
+The <b>defMode</b> encoder parameter disables definite length encoding mode,
+while the optional <b>maxChunkSize</b> parameter specifies desired
+substrate chunk size that influences memory requirements at the decoder's end.
+</p>
+
+<p>
+To use CER or DER encoders one needs to explicitly import and call them - the
+APIs are all compatible.
+</p>
+
+<table bgcolor="lightgray" border=0 width=100%><TR><TD>
+<pre>
+>>> from pyasn1.type import univ
+>>> from pyasn1.codec.ber import encoder as ber_encoder
+>>> from pyasn1.codec.cer import encoder as cer_encoder
+>>> from pyasn1.codec.der import encoder as der_encoder
+>>> ber_encoder.encode(univ.Boolean(True))
+b'\x01\x01\x01'
+>>> cer_encoder.encode(univ.Boolean(True))
+b'\x01\x01\xff'
+>>> der_encoder.encode(univ.Boolean(True))
+b'\x01\x01\xff'
+>>>
+</pre>
+</td></tr></table>
+
+<a name="2.2"></a>
+<h4>
+2.2 Decoders
+</h4>
+
+<p>
+In the process of decoding, pyasn1 value objects are created and linked to
+each other, based on the information containted in the substrate. Thus,
+the original pyasn1 value object(s) are recovered.
+</p>
+
+<table bgcolor="lightgray" border=0 width=100%><TR><TD>
+<pre>
+>>> from pyasn1.type import univ
+>>> from pyasn1.codec.ber import encoder, decoder
+>>> substrate = encoder.encode(univ.Boolean(True))
+>>> decoder.decode(substrate)
+(Boolean('True(1)'), b'')
+>>>
+</pre>
+</td></tr></table>
+
+<p>
+Commenting on the code snippet above, pyasn1 decoder accepts substrate
+as an argument and returns a tuple of pyasn1 value object (possibly
+a top-level one in case of constructed object) and unprocessed part
+of input substrate.
+</p>
+
+<p>
+All pyasn1 decoders can handle both definite and indefinite length
+encoding modes automatically, explicit switching into one mode
+to another is not required.
+</p>
+
+<table bgcolor="lightgray" border=0 width=100%><TR><TD>
+<pre>
+>>> from pyasn1.type import univ
+>>> from pyasn1.codec.ber import encoder, decoder
+>>> substrate = encoder.encode(
+... univ.OctetString('The quick brown fox jumps over the lazy dog'),
+... defMode=False,
+... maxChunkSize=8
+... )
+>>> decoder.decode(substrate)
+(OctetString(b'The quick brown fox jumps over the lazy dog'), b'')
+>>>
+</pre>
+</td></tr></table>
+
+<p>
+Speaking of BER/CER/DER encoding, in many situations substrate may not contain
+all necessary information needed for complete and accurate ASN.1 values
+recovery. The most obvious cases include implicitly tagged ASN.1 types
+and constrained types.
+</p>
+
+<p>
+As discussed earlier in this handbook, when an ASN.1 type is implicitly
+tagged, previous outermost tag is lost and never appears in substrate.
+If it is the base tag that gets lost, decoder is unable to pick type-specific
+value decoder at its table of built-in types, and therefore recover
+the value part, based only on the information contained in substrate. The
+approach taken by pyasn1 decoder is to use a prototype pyasn1 type object (or
+a set of them) to <i>guide</i> the decoding process by matching [possibly
+incomplete] tags recovered from substrate with those found in prototype pyasn1
+type objects (also called pyasn1 specification object further in this paper).
+</p>
+
+<table bgcolor="lightgray" border=0 width=100%><TR><TD>
+<pre>
+>>> from pyasn1.codec.ber import decoder
+>>> decoder.decode(b'\x02\x01\x0c', asn1Spec=univ.Integer())
+Integer(12), b''
+>>>
+</pre>
+</td></tr></table>
+
+<p>
+Decoder would neither modify pyasn1 specification object nor use
+its current values (if it's a pyasn1 value object), but rather use it as
+a hint for choosing proper decoder and as a pattern for creating new objects:
+</p>
+
+<table bgcolor="lightgray" border=0 width=100%><TR><TD>
+<pre>
+>>> from pyasn1.type import univ, tag
+>>> from pyasn1.codec.ber import encoder, decoder
+>>> i = univ.Integer(12345).subtype(
+... implicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 40)
+... )
+>>> substrate = encoder.encode(i)
+>>> substrate
+b'\x9f(\x0209'
+>>> decoder.decode(substrate)
+Traceback (most recent call last):
+...
+pyasn1.error.PyAsn1Error:
+ TagSet(Tag(tagClass=128, tagFormat=0, tagId=40)) not in asn1Spec
+>>> decoder.decode(substrate, asn1Spec=i)
+(Integer(12345), b'')
+>>>
+</pre>
+</td></tr></table>
+
+<p>
+Notice in the example above, that an attempt to run decoder without passing
+pyasn1 specification object fails because recovered tag does not belong
+to any of the built-in types.
+</p>
+
+<p>
+Another important feature of guided decoder operation is the use of
+values constraints possibly present in pyasn1 specification object.
+To explain this, we will decode a random integer object into generic Integer
+and the constrained one.
+</p>
+
+<table bgcolor="lightgray" border=0 width=100%><TR><TD>
+<pre>
+>>> from pyasn1.type import univ, constraint
+>>> from pyasn1.codec.ber import encoder, decoder
+>>> class DialDigit(univ.Integer):
+... subtypeSpec = constraint.ValueRangeConstraint(0,9)
+>>> substrate = encoder.encode(univ.Integer(13))
+>>> decoder.decode(substrate)
+(Integer(13), b'')
+>>> decoder.decode(substrate, asn1Spec=DialDigit())
+Traceback (most recent call last):
+...
+pyasn1.type.error.ValueConstraintError:
+ ValueRangeConstraint(0, 9) failed at: 13
+>>>
+</pre>
+</td></tr></table>
+
+<p>
+Similarily to encoders, to use CER or DER decoders application has to
+explicitly import and call them - all APIs are compatible.
+</p>
+
+<table bgcolor="lightgray" border=0 width=100%><TR><TD>
+<pre>
+>>> from pyasn1.type import univ
+>>> from pyasn1.codec.ber import encoder as ber_encoder
+>>> substrate = ber_encoder.encode(univ.OctetString('http://pyasn1.sf.net'))
+>>>
+>>> from pyasn1.codec.ber import decoder as ber_decoder
+>>> from pyasn1.codec.cer import decoder as cer_decoder
+>>> from pyasn1.codec.der import decoder as der_decoder
+>>>
+>>> ber_decoder.decode(substrate)
+(OctetString(b'http://pyasn1.sf.net'), b'')
+>>> cer_decoder.decode(substrate)
+(OctetString(b'http://pyasn1.sf.net'), b'')
+>>> der_decoder.decode(substrate)
+(OctetString(b'http://pyasn1.sf.net'), b'')
+>>>
+</pre>
+</td></tr></table>
+
+<a name="2.2.1"></a>
+<h4>
+2.2.1 Decoding untagged types
+</h4>
+
+<p>
+It has already been mentioned, that ASN.1 has two "special case" types:
+CHOICE and ANY. They are different from other types in part of
+tagging - unless these two are additionally tagged, neither of them will
+have their own tag. Therefore these types become invisible in substrate
+and can not be recovered without passing pyasn1 specification object to
+decoder.
+</p>
+
+<p>
+To explain the issue, we will first prepare a Choice object to deal with:
+</p>
+
+<table bgcolor="lightgray" border=0 width=100%><TR><TD>
+<pre>
+>>> from pyasn1.type import univ, namedtype
+>>> class CodeOrMessage(univ.Choice):
+... componentType = namedtype.NamedTypes(
+... namedtype.NamedType('code', univ.Integer()),
+... namedtype.NamedType('message', univ.OctetString())
+... )
+>>>
+>>> codeOrMessage = CodeOrMessage()
+>>> codeOrMessage.setComponentByName('message', 'my string value')
+>>> print(codeOrMessage.prettyPrint())
+CodeOrMessage:
+ message=b'my string value'
+>>>
+</pre>
+</td></tr></table>
+
+<p>
+Let's now encode this Choice object and then decode its substrate
+with and without pyasn1 specification object:
+</p>
+
+<table bgcolor="lightgray" border=0 width=100%><TR><TD>
+<pre>
+>>> from pyasn1.codec.ber import encoder, decoder
+>>> substrate = encoder.encode(codeOrMessage)
+>>> substrate
+b'\x04\x0fmy string value'
+>>> encoder.encode(univ.OctetString('my string value'))
+b'\x04\x0fmy string value'
+>>>
+>>> decoder.decode(substrate)
+(OctetString(b'my string value'), b'')
+>>> codeOrMessage, substrate = decoder.decode(substrate, asn1Spec=CodeOrMessage())
+>>> print(codeOrMessage.prettyPrint())
+CodeOrMessage:
+ message=b'my string value'
+>>>
+</pre>
+</td></tr></table>
+
+<p>
+First thing to notice in the listing above is that the substrate produced
+for our Choice value object is equivalent to the substrate for an OctetString
+object initialized to the same value. In other words, any information about
+the Choice component is absent in encoding.
+</p>
+
+<p>
+Sure enough, that kind of substrate will decode into an OctetString object,
+unless original Choice type object is passed to decoder to guide the decoding
+process.
+</p>
+
+<p>
+Similarily untagged ANY type behaves differently on decoding phase - when
+decoder bumps into an Any object in pyasn1 specification, it stops decoding
+and puts all the substrate into a new Any value object in form of an octet
+string. Concerned application could then re-run decoder with an additional,
+more exact pyasn1 specification object to recover the contents of Any
+object.
+</p>
+
+<p>
+As it was mentioned elsewhere in this paper, Any type allows for incomplete
+or changing ASN.1 specification to be handled gracefully by decoder and
+applications.
+</p>
+
+<p>
+To illustrate the working of Any type, we'll have to make the stage
+by encoding a pyasn1 object and then putting its substrate into an any
+object.
+</p>
+
+<table bgcolor="lightgray" border=0 width=100%><TR><TD>
+<pre>
+>>> from pyasn1.type import univ
+>>> from pyasn1.codec.ber import encoder, decoder
+>>> innerSubstrate = encoder.encode(univ.Integer(1234))
+>>> innerSubstrate
+b'\x02\x02\x04\xd2'
+>>> any = univ.Any(innerSubstrate)
+>>> any
+Any(b'\x02\x02\x04\xd2')
+>>> substrate = encoder.encode(any)
+>>> substrate
+b'\x02\x02\x04\xd2'
+>>>
+</pre>
+</td></tr></table>
+
+<p>
+As with Choice type encoding, there is no traces of Any type in substrate.
+Obviously, the substrate we are dealing with, will decode into the inner
+[Integer] component, unless pyasn1 specification is given to guide the
+decoder. Continuing previous code:
+</p>
+
+<table bgcolor="lightgray" border=0 width=100%><TR><TD>
+<pre>
+>>> from pyasn1.type import univ
+>>> from pyasn1.codec.ber import encoder, decoder
+
+>>> decoder.decode(substrate)
+(Integer(1234), b'')
+>>> any, substrate = decoder.decode(substrate, asn1Spec=univ.Any())
+>>> any
+Any(b'\x02\x02\x04\xd2')
+>>> decoder.decode(str(any))
+(Integer(1234), b'')
+>>>
+</pre>
+</td></tr></table>
+
+<p>
+Both CHOICE and ANY types are widely used in practice. Reader is welcome to
+take a look at
+<a href=http://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt>
+ASN.1 specifications of X.509 applications</a> for more information.
+</p>
+
+<a name="2.2.2"></a>
+<h4>
+2.2.2 Ignoring unknown types
+</h4>
+
+<p>
+When dealing with a loosely specified ASN.1 structure, the receiving
+end may not be aware of some types present in the substrate. It may be
+convenient then to turn decoder into a recovery mode. Whilst there, decoder
+will not bail out when hit an unknown tag but rather treat it as an Any
+type.
+</p>
+
+<table bgcolor="lightgray" border=0 width=100%><TR><TD>
+<pre>
+>>> from pyasn1.type import univ, tag
+>>> from pyasn1.codec.ber import encoder, decoder
+>>> taggedInt = univ.Integer(12345).subtype(
+... implicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 40)
+... )
+>>> substrate = encoder.encode(taggedInt)
+>>> decoder.decode(substrate)
+Traceback (most recent call last):
+...
+pyasn1.error.PyAsn1Error: TagSet(Tag(tagClass=128, tagFormat=0, tagId=40)) not in asn1Spec
+>>>
+>>> decoder.decode.defaultErrorState = decoder.stDumpRawValue
+>>> decoder.decode(substrate)
+(Any(b'\x9f(\x0209'), '')
+>>>
+</pre>
+</td></tr></table>
+
+<p>
+It's also possible to configure a custom decoder, to handle unknown tags
+found in substrate. This can be done by means of <b>defaultRawDecoder</b>
+attribute holding a reference to type decoder object. Refer to the source
+for API details.
+</p>
+
+<hr>
+
+</td>
+</tr>
+</table>
+</center>
+</body>
+</html>