There are some standard formats for storing/exchanging RSA keys such as RFC 3447. For better or worse, most (many, anyway) use ASN.1 encoding, which adds more complexity than most people like, all by itself. A few use Base64 encoding, which is a lot easier to implement.
As far as what constitutes a key goes: in its most basic form, you're correct; the public key includes the modulus (usually called n
) and an exponent (usually called e
).
To compute a key pair, you start from two large prime numbers, usually called p
and q
. You compute the modulus n
as p * q
. You also compute a number (often called r
) that's (p-1) * (q-1)
.
e
is then a more or less randomly chosen number that's prime relative to r
. Warning: you don't want e
to be really small though -- log(e) >= log(n)/4 as a bare minimum.
You then compute d
(the private decryption key) as a number satisfying the relation:
d * e = 1 (mod r)
You typically compute this using Euclid's algorithm, though there are other options (see below). Again, you don't want d
to be really small either, so if it works out to a really small number, you probably want to try another value for e
, and compute a new d
to match.
There is another way to compute your e
and d
. You can start by finding some number K that's congruent to 1 mod r, then factor it. Put the prime factors together to get two factors of roughly equal size, and use them as e
and d
.
As far as an attacker computing your d
goes: you need r
to compute this, and knowing r
depends on knowing p
and q
. That's exactly why/where/how factoring comes into breaking RSA. If you factor n
, then you know p
and q
. From them, you can find r
, and from r
you can compute the d
that matches a known e
.
So, let's work through the math to create a key pair. We're going to use primes that are much too small to be effective, but should be sufficient to demonstrate the ideas involved.
So let's start by picking a p and q (of course, both need to be primes):
p = 9999991
q = 11999989
From those we compute n
and r
:
n = 119999782000099
r = 119999760000120
Next we need to either pick e
or else compute K
, then factor it to get e
and d
. For the moment, we'll go with your suggestion of e=65537 (since 65537 is prime, the only possibility for it and r
not being relative primes would be if r
was an exact multiple of 65537, which we can verify is not the case quite easily).
From that, we need to compute our d
. We can do that fairly easily (though not necessarily very quickly) using the "Extended" version of Euclid's algorithm, (as you mentioned) Euler's Totient, Gauss' method, or any of a number of others.
For the moment, I'll compute it using Gauss' method:
template <class num>
num gcd(num a, num b) {
num r;
while (b > 0) {
r = a % b;
a = b;
b = r;
}
return a;
}
template <class num>
num find_inverse(num a, num p) {
num g, z;
if (gcd(a, p) > 1) return 0;
z = 1;
while (a > 1) {
z += p;
if ((g=gcd(a, z))> 1) {
a /= g;
z /= g;
}
}
return z;
}
The result we get is:
d = 38110914516113
Then we can plug these into an implementation of RSA, and use them to encrypt and decrypt a message.
So, let's encrypt "Very Secret Message!". Using the e
and n
given above, that encrypts to:
74603288122996
49544151279887
83011912841578
96347106356362
20256165166509
66272049143842
49544151279887
22863535059597
83011912841578
49544151279887
96446347654908
20256165166509
87232607087245
49544151279887
68304272579690
68304272579690
87665372487589
26633960965444
49544151279887
15733234551614
And, using the d
given above, that decrypts back to the original. Code to do the encryption/decryption (using hard-coded keys and modulus) looks like this:
#include <iostream>
#include <iterator>
#include <algorithm>
#include <vector>
#include <functional>
typedef unsigned long long num;
const num e_key = 65537;
const num d_key = 38110914516113;
const num n = 119999782000099;
template <class T>
T mul_mod(T a, T b, T m) {
if (m == 0) return a * b;
T r = T();
while (a > 0) {
if (a & 1)
if ((r += b) > m) r %= m;
a >>= 1;
if ((b <<= 1) > m) b %= m;
}
return r;
}
template <class T>
T pow_mod(T a, T n, T m) {
T r = 1;
while (n > 0) {
if (n & 1)
r = mul_mod(r, a, m);
a = mul_mod(a, a, m);
n >>= 1;
}
return r;
}
int main() {
std::string msg = "Very Secret Message!";
std::vector<num> encrypted;
std::cout << "Original message: " << msg << '
';
std::transform(msg.begin(), msg.end(),
std::back_inserter(encrypted),
[&](num val) { return pow_mod(val, e_key, n); });
std::cout << "Encrypted message:
";
std::copy(encrypted.begin(), encrypted.end(), std::ostream_iterator<num>(std::cout, "
"));
std::cout << "
";
std::cout << "Decrypted message: ";
std::transform(encrypted.begin(), encrypted.end(),
std::ostream_iterator<char>(std::cout, ""),
[](num val) { return pow_mod(val, d_key, n); });
std::cout << "
";
}
To have even a hope of security, you need to use a much larger modulus though--hundreds of bits at the very least (and perhaps a thousand or more for the paranoid). You could do that with a normal arbitrary precision integer library, or routines written specifically for the task at hand. RSA is inherently fairly slow, so at one time most implementations used code with lots of hairy optimization to do the job. Nowadays, hardware is fast enough that you can probably get away with a fairly average large-integer library fairly easily (especially since in real use, you only want to use RSA to encrypt/decrypt a key for a symmetrical algorithm, not to encrypt the raw data).
Even with a modulus of suitable size (and the code modified to support the large numbers needed), this is still what's sometimes referred to as "textbook RSA", and it's not really suitable for much in the way of real encryption. For example, right now, it's encrypting one byte of the input at a time. This leaves noticeable patterns in the encrypted data. It's trivial to look at the encrypted data above and see than the second and seventh words are identical--because both are the encrypted form of e
(which also occurs a couple of other places in the message).
As it stands right now, this can be attacked as a simple substitution code. e
is the most common letter in English, so we can (correctly) guess that the most common word in the encrypted data represents e
(and relative frequencies of letters in various languages are well known). Worse, we can also look at things like pairs and triplets of letters to improve the attack. For example, if we see the same word twice in succession in the encrypted data, we know we're seeing a double letter, which can only be a few letters in normal English text. Bottom line: even though RSA itself can be quite strong, the way of using it shown above definitely is not.
To prevent that problem, with a (say) 512-bit key, we'd also process the input in 512-bit chunks. That means we only have a repetition if there are two places in the original input that go for 512 bits at a time that are all entirely identical. Even if that happens, it's relatively difficult to guess that that would be, so although it's undesirable, it's not nearly as vulnerable as with the byte-by-byte version shown above. In addition, you always want to pad the input to a multiple of the size being encrypted.
Reference
https://crypto.stackexchange.com/questions/1448/definition-of-textbook-rsa