Authors:
(1) Mohammadreza Hazhirpasand, University of Bern, Bern, Switzerland;
(2) Oscar Nierstrasz, University of Bern, Bern, Switzerland;
(3) Mohammad Ghafari, University of Auckland, Auckland, New Zealand.
Table of Links
- Abstract and I. Introduction
- II. Methodology
- III. Results and Discussion
- IV. Threats to Validity
- V. Related Work
- VI. Conclusions
- VII. Acknowledgments and References
III. RESULTS AND DISCUSSION
Table II lists the themes, the associated number of posts in each theme, and a brief summary of what each theme is. The highest number of posts is associated with encryption/decryption of a file while the least number of posts is associated with cryptography attacks. Table III describes in more detail the number of assigned posts to each theme in the 20 crypto libraries. The highlighted cell demonstrates the highest number of posts in each theme compared to other libraries. For instance, of 25 analyzed questions in pyOpenSSL, 17 posts were assigned to certificate-related issues. In the following, we discuss each of the 10 themes of developer challenges in the 20 crypto libraries
1) Encryption/Decryption: In this theme, developers struggled with how to conduct file encryption or decryption. However, the range of sub-problems varies. The first group of challenges is with those who could encrypt a piece of data but the decryption phase was not successful. For instance, a developer encrypted a string with Spongy Castle and the decryption code was not working due to not employing AndroidKeyStore for retrieving the private key. Another observed issue was misusing the doFinal, init and update methods in the Cipher API. A developer missed all the important elements, i.e., keys, IV, encoding, and padding, to perform the decryption process when working with the CryptoJS library.
Another type of discussion was related to the mode of encryptions. For instance, a developer asked for ways of checking the authenticity of an encrypted text and the responses suggested Galois/Counter Mode (GCM) in the AES encryption. In another discussion, a developer was unsure of the internals of cipher-block chaining (CBC) and why only the first block could be corrupted but the subsequent blocks will be as expected if the initialization vector (IV) is incorrect, whereas in other discussions, either the IV was forgotten in the decryption process or unequal IVs were used. A developer confused the difference between Electronic codebook (ECB) and CBC, and which one requires the IV for encryption/decryption of a file. With regards to different modes of encryptions, a common uncertainty was about the correct length of the IV.
One of the prevalent sub-problems was concerning the correct way of encoding/decoding the ciphertext. For instance, a developer forgot to use UTF-8 to convert plaintext to an array of bits. Other discussions had the same problem of converting the cipher to either hexadecimal or Base64.
There were also some challenges that could not be grouped together, and hence we classified them as miscellaneous. For instance, a developer did not know how to encrypt/decrypt a binary file. To do so, a parameter use_binary=true must be passed to the DataSink_Stream API in the Botan library. In another example, developers discussed how large files can be encrypted by the WebCrypto API. Developers struggled to use the provided functions in libraries to generate secured random numbers.
Other discussions were centered on password-based encryption (PBE). Developers asked how to configure PBE APIs in libraries such as Jasypt. The PBE API commonly requires a password, iteration count, and salt generator, for which developers struggled to assign the correct values.
Lastly, different padding schemes created technical problems for developers. Discussions were related to the security level provided between PKCS1.5 and OAEP, the usage of zero padding in OpenSSL, PKCS#7 padding with AES, and how padding can be disabled in a crypto library.
2) Library installation: This theme depicts problems regarding installation, compilation, usage issues, and setting up the prerequisites for a library to work. This theme has the second-highest number of posts as developer platforms and integrated development environments (IDEs) varied when developers worked with a specific crypto library. For instance, developers discussed the dependencies of Spongy Castle in the Gradle file, setting up Android Studio with Spongy Castle, adding JCE to JRE 8 on macOS Sierra, and building the Botan library with the nmake command. Each crypto library commonly uses a specific way to install or compile of the library or its modules. For instance, in the PyCrypto and M2Crypto libraries, developers commonly need to resolve their issues with the pip command, and similarly, the usage of npm was the key reason for other discussions related to the node-crypto library.
3) Certificate-related issues: We found two sub-problems with the theme of the certificate-related issue. The first challenge developers encountered was working with various file formats, e.g., p7b, and various encodings e.g., Privacy Enhanced Mail (PEM) or Distinguished Encoding Rules (DER). Developers asked about how to read or save PEM files using crypto libraries, storing/reading private keys in a Public Key Cryptography Standards (PKCS#8) file, differences between DER and PEM file formats, and storing/reading public and private keys in a PKCS#12 file. The other issue of developers was to extract various elements from a certificate, e.g., expiration date, list of Subject Alternative Name (SAN) and Certificate Authority (CA), and cipher list. They also had challenges in checking a valid certificate, generating a self-signed certificate, using different versions of TLS and SSL, and TLS handshake issues.
4) Library interoperability: It is common for developers to work with more than one crypto library in a large project. However, there might be some discrepancies between the libraries. A common issue was that developers encrypted a piece of data with OpenSSL, i.e., via command line, and then they had issues with decryption of the ciphertext with another library. This is due to the fact that the default values in libraries commonly do not match. For instance, a developer encrypted a text with OpenSSL but could not decrypt it with the Botan library because of the default usage of PKCS#1 v1.5 padding in OpenSSL. Furthermore, on closer inspection, root causes are mainly the inappropriate encoding of the ciphertext, incorrect IVs, generating cryptographic keys differently, and using unequal key formats and padding options.
5) Generate/store crypto keys: For every cryptography scenario, developers need to generate and store their crypto keys. In the analyzed discussions, the challenges are related to storing keys, e.g., AndroidKeyStore, generate a valid ECDSA or RSA key pair, generate a symmetric key, differences between trust store and keystore, generate keys with Key Based Key Derivation Function (KBKDF), the correct length of possible keys for various algorithms, and the meaning of modulus (n) and public key exponent (e) in RSA keys.
6) Hashing: It appears that developers still talk about the possibility of reversing a hashed string. However, most of the discussions were about generating a hash string, the right way of using salt, calculating checksum for large files, issues in using Hash-based Message Authentication Code (HMAC), and the usage of hash functions, i.e., Password-Based Key Derivation Function 2 (PBKDF2), bcrypt and scrypt.
7) Digital signature: Developers faced issues when signing and verifying a signature. A developer misunderstood the application of the Cipher API and the Signature API for signing a piece of data in JCA. Another developer was worried about performance bottleneck when there is a massive dataset. In other discussions, developers failed to verify a signature due to the wrong encoding of RSA keys in browsers (URL encoding), using a hash as data to be signed instead of the data itself, using the wrong key for signing or verification, the mismatched padding for the signature, and verifying a certificate in the chain of trust.
8) Sample implementation: Developers mainly asked for two types of sample implementation. In the first type, developers had a sample code from a language or a specific crypto library and were looking for an equivalent piece of code in another language or library. For instance, a developer had a piece of encryption code in Objective-C but was not able to do the same in Swift. In the second type, developers had a goal but did not know how the task could be accomplished. For example, a developer requested a sample implementation of AES256 CBC in the M2Crypto library.
9) Cryptographic attacks: Only 0.4% of the analyzed posts were concerned about cyber attacks. The first discussion was about conducting a man-in-the-middle attack when a selfsigned certificate is used. The asker received comprehensive responses regarding why a self-signed certificate is not recommended. In the second discussion, a developer was not able to comprehend how the length extension attack works.
We attempted to cast some light on the common technical issues of developers with various crypto libraries. We observed that developer uncertainty in a particular crypto library not only is related to one or two areas but is frequently linked to more than five themes. There are some libraries, such as OpenSSL and WolfSSL, that are intended to be used for special purposes, i.e., secure communications over computer networks. This increases the likelihood of identifying more questions related to the certificate issues in such libraries. Moreover, a popular crypto library, such as Bouncy Castle, presents a wide range of crypto APIs and be can be utilized in two popular programming languages, i.e., C#, and Java. This can explain why identified questions are linked to seven themes. Some of the extracted themes are interrelated, e.g., certificate-related issues, digital signatures, and generating/storing keys. For instance, a developer may need to generate an RSA key pair to work with certificates. However, we attempted to carefully identify the core issue of the posted challenge.
The detailed issues in working with various crypto libraries could provide valuable support for professionals to identify the probable pitfalls in the design phase of software development. Admittedly, identifying crypto pitfalls in earlier stages can substantially boost the security and the speed of development of software. As a result, such forethoughts can facilitate the use of cryptography in the implementation phase and prevent inexperienced from making fatal security mistakes that may have pernicious effects after the release phase. Further research is needed to shed light on how similar APIs in popular crypto libraries are misunderstood and whether the complexity of APIs has an impact on creating more problems or not.
This paper is available on arxiv under CC BY 4.0 DEED license.