This is a known pattern, called a SeqLock. https://en.wikipedia.org/wiki/Seqlock. (With the simplification that there's only one writer so no extra support for excluding simultaneous writers is needed.)
You don't need or want the increment of the counter variable itself to use atomic RMW operations. You can just load both halves with atomic 32-bit loads, increment it, and atomically store the result. (With cheap relaxed
or release
memory order, and using a release
store for the 2nd counter update).
Similarly the counter also doesn't need to be an atomic RMW.
The writer only needs pure loads and pure stores with only release ordering, which are (much) cheaper than atomic RMW, or stores with seq_cst ordering:
- load the counter and the value in any order
- store a new counter (old+1)
- store the new value (or just update the low half if you want to branch on no carry)
- store the final counter.
The ordering of the stores in those 3 bullet points are the only thing that matters. A write fence after the first store could be good, because we don't really want the cost of making both stores of both halves of the value release
, on CPUs where that's more expensive than relaxed.
Unfortunately to satisfy C++ rules, the value
has to be atomic<T>
, which makes it inconvenient to get the compiler to generate the most efficient code possible for loading both halves. e.g. ARM ldp
/ stp
load-pair might not be atomic, but that doesn't matter. (And compilers often won't optimize two separate atomic 32-bit loads into one wider load.)
Values other threads read while the sequence-counter is odd are irrelevant, but we'd like to avoid undefined behaviour. Maybe we could use a union of a volatile uint64_t
and an atomic<uint64_t>
I wrote this C++ SeqLock<class T>
template for another question I didn't finish writing an answer for (figuring out which versions of ARM have 64-bit atomic load and store).
This tries to check if the target already supports lock-free atomic operations on atomic<T>
to stop you from using this when it's pointless. (Disable that for testing purposed by defining IGNORE_SIZECHECK
.) TODO: transparently fall back to doing that, maybe with a template specialization, instead of using a static_assert
.
I provided an inc()
function for T
that supports a ++
operator. TODO would be an apply()
that accepts a lambda to do something to a T
, and store the result between sequence counter updates.
// **UNTESTED**
#include <atomic>
#ifdef UNIPROCESSOR
// all readers and writers run on the same core
// ordering instructions at compile time is all that's necessary
#define ATOMIC_FENCE std::atomic_signal_fence
#else
// A reader can be running on another core while writing
// memory barriers or ARMv8 acquire / release loads / store are needed
#define ATOMIC_FENCE std::atomic_thread_fence
#endif
// using fences instead of .store(std::memory_order_release) will stop the compiler
// from taking advantage of a release-store instruction, like on AArch64 or x86
// SINGLE WRITER only.
// uses volatile + barriers for the data itself, like pre-C++11
template <class T>
class SeqLocked
{
#ifndef IGNORE_SIZECHECK
// sizeof(T) > sizeof(unsigned)
static_assert(!std::atomic<T>::is_always_lock_free, "A Seq Lock with a type small enough to be atomic on its own is totally pointless, and we don't have a specialization that replaces it with a straight wrapper for atomic<T>");
#endif
// C++17 doesn't have a good way to express a load that doesn't care about tearing
// without explicitly writing it as multiple small parts and thus gimping the compiler if it can use larger loads
volatile T data; // volatile should be fine on any implementation where pre-C++11 lockless code was possible with volatile,
// even though Data Race UB does apply to volatile variables in ISO C++11 and later.
std::atomic<unsigned> seqcount{0}; // Even means valid, odd means modification in progress.
// unsigned wraps around at a power of 2 on overflow
public:
T get() const {
unsigned c0, c1;
T tmp;
do {
c0 = seqcount.load(std::memory_order_relaxed); // or this can be a std::memory_order_acquire for multicore so AArch64 can use LDAR
ATOMIC_FENCE(std::memory_order_acquire);
tmp = (T)data; // load
ATOMIC_FENCE(std::memory_order_acquire); // LoadLoad barrier
c1 = seqcount.load(std::memory_order_relaxed);
} while(c0&1 || c0 != c1); // retry if the counter changed or is odd
return tmp;
}
// TODO: a version of this that takes a lambda for the operation on tmp
T inc() {
unsigned orig_count = seqcount.load(std::memory_order_relaxed);
seqcount.store(orig_count+1, std::memory_order_relaxed);
ATOMIC_FENCE(std::memory_order_release);
// make sure the data stores appear after the first counter update.
T tmp = data; // load
++tmp;
data = tmp; // store
ATOMIC_FENCE(std::memory_order_release);
seqcount.store(orig_count+2, std::memory_order_relaxed); // Or use mo_release here, better on AArch64
return tmp;
}
void set(T newval) {
unsigned orig_count = seqcount.load(std::memory_order_relaxed);
seqcount.store(orig_count+1, std::memory_order_relaxed);
ATOMIC_FENCE(std::memory_order_release);
// make sure the data stores appear after the first counter update.
data = newval; // store
ATOMIC_FENCE(std::memory_order_release);
seqcount.store(orig_count+2, std::memory_order_relaxed); // Or use mo_release here, better on AArch64
}
};
/***** test callers *******/
#include <stdint.h>
struct sixteenbyte {
//unsigned arr[4];
unsigned long a,b,c,d;
sixteenbyte() = default;
sixteenbyte(const volatile sixteenbyte &old)
: a(old.a), b(old.b), c(old.c), d(old.d) {}
//arr(old.arr) {}
};
void test_inc(SeqLocked<uint64_t> &obj) { obj.inc(); }
sixteenbyte test_get(SeqLocked<sixteenbyte> &obj) { return obj.get(); }
//void test_set(SeqLocked<sixteenbyte> &obj, sixteenbyte val) { obj.set(val); }
uint64_t test_get(SeqLocked<uint64_t> &obj) {
return obj.get();
}
// void atomic_inc_u64_seq_cst(std::atomic<uint64_t> &a) { ++a; }
uint64_t u64_inc_relaxed(std::atomic<uint64_t> &a) {
// same but without dmb barriers
return 1 + a.fetch_add(1, std::memory_order_relaxed);
}
uint64_t u64_load_relaxed(std::atomic<uint64_t> &a) {
// gcc uses LDREXD, not just LDRD?
return a.load(std::memory_order_relaxed);
}
void u64_store_relaxed(std::atomic<uint64_t> &a, uint64_t val) {
// gcc uses a LL/SC retry loop even for a pure store?
a.store(val, std::memory_order_relaxed);
}
It compiles to the asm we want <a href="https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAM1QDsCBlZAQwBtMQBGAOgCYAOAJyCALAFYADPwDswucNIArLqVbNaoZAFJeAIR27SAZ1QBXYsg4ByAPQ2A1ACpHAVQByAFWwMvAEWdaEgCCgSG8AMx4tMisplj2WuEAwsxEALZ42uHYobkReFRYVPbuAJIACgBKAPJJ3gzVlaF29mys9sSYzFjERq206PYA7sSEmL0dprT2dPYECJj2Rsxpi2idzQ4kPVHA9lFGBMSmyAR4dH2p9mhpAA547HN4q/uXrO3zqWBWfbSYlkZlsQAJ55cJFKKLIIeaoAWVKSQA%2BgAxbBuOpLAjoEAgVKoDLIRFGPDAWhsRFUTDRTBgzCsIw04ItIIdLo9a7qewAI0Wx1otF2M2m6lQ83G1xIiyGCAeUtGZw0m3sqzSJGB3OYxFG4z6JHsQUqsIAbvxWsgAI6mPCdewOTrsZgM%2BysVDdPoOQ6S1o2v6YLDoMEQv76mHwpGo9HYTHY3HpTKI%2BadboUqmWWkDApK0zEjT2SnUvoHAhsmbFbiezoQQ4xlVqxHbcaI%2B1dBkASmGD3antucwWErusuISqoxHxc2YAGtBd0jeoCMxgItUMVmKyHQyALQVxZF46nc60UjOvATpfTIJBCwIABsIhmxHsAA9%2BNfcsElQxSm4AOIAGSjADqlSlF4lRCqwwLcFmDJ9EaqBqGcjwGBqWp4DqeZ6mK9joKkq6EAyrBUEerAnostydBuSQGAYnCcKExZ3AhiyJEkMSOn0HiJDkwRsYC9gMJg5q/qgyCngG77SPo775FQAyYMUpTfm4jTYIin4AFrYEkAASWkANKhPYRm2h6eAAF6YMuEAeO2XFLOZllUBAUzEqSfqtoZxmHKk8bseMBAQGAYDVjieIEixnHZDieBGIibBDMwwKxS6okUp0mBHjovAsgJ5r2MJokdvMrRzMCtyLEYaRtPYVJmMACBzKg3KLGFmRCvsBC6kM0wxY187vOqtyoFEBDsICR7qIMQyLOgdDfAQ9gIMwRotUs5XIHgbDmT5syfAtnS3GoAIdUVDWrocxDMCSCALSMzC3OVD40A%2BrVZEkkU5LwvAeeEUlhOEVL4FQb4ssZxktFReg0dI2GoJgRi0PNi3LatwCoBgwyJY1NWPuR8OXM6rqDHtsPw4jVgLSwNrMFyZgLcWmq7J5YMmUZQyEAgdM44dmSEBBwzyoKhCtH0aSmKwZyHRVVXvPYtyap1/TEwg2b2MAzz3LmWE3Pc7APgUJ0sNM2aLGoxCLg%2BLpusz9hwQhsr2B42G4YkugsyzLR2z5jxGJz4uDDyeaQu16jqhr7CrPQO3TNK4xkRRkP6HotGE6JY19Gg8QJX0Q2AngXKPOzxVe4hGU2%2B7FeVx7Dg1St0zzHVDW%2BLh9iVMwlglG7s3w60D380Qtvwd7iyzqMNNjfs0ylA09iJzRnBK86qTjFB77BGDIWxvimQsS5JJ/AG2RLIJaBTAQWiSRIF%2B%2BK7RktNgdfKl0tCwVt6BHhggyrOoosYAUmTR0nnLUcwBOiAlXqDKuUDoGVxaPYPebkpqXVuJcUcUxBhXFXENaaD5lz2F4O1VAK1iBUBdEMEGtxTAF0yCAG2TtFwBXbGgF%2BC0L5/Urggg%2B1wJBHmQHRX65djJOwIHcV2INK6zQSJJQR7tkASASOEXwx9zSn3oNwK26AqxYhxLWEE9ZiA9CbHSZgj53K31ZphGUGdOSB3OtokAujgT6MMe3S01pFjPWVOLM46wKpNQvFeW8HJjZOl/L4A0MiWbQjhAiFEaI6haJrJgVUeiGzEDihaK0nQfp/UiWDERPZEhKOsq2HC85zHV0Jt0cR0DolhjiZGRJOjkl1jSRktx2TzEtGEt0HpAdNTaiHOvaBfCFFKIZCoswaiNFNIcS01JBjGz2hMWYgRwz3bX2GDKdgEA5E6GvAvC%2BVFpBJG4fYIKRTricByZUzoRww7FG1lM4sD5kBLQ0H6e8rwZjoHEiEdZLM7nmHrqItZkCpE3zXuCloMJfDVBACVYhxJZh4PmL1Em85TwEzUGkLkOEMIPiwqgR6gDdqiIBU7KIyAIC2WkQCsGnDPkkBJIiVRrDFHKLZeoomszHHOKWcY0x6Ack1PdhMrl24IDMuAKy55NEjybz5W05ZQqRX0uMnU2JEYEmKvmU45VdIWyYDVeCsGLQqqniWOYRYWEyl2MlJcB6XQXpUBeb2Dx1pDgSjPuKUwtwymYAgZE4RdwxnO3Kb9O%2BDgNGROonoApYj1VGTtWGhNkbwYeiIBsKFldNXhnidgXler%2BXpObI6Y1ibTVeRPs88sWbMBStGDKtlBheAKvsUqxZpbBWrLdqzaoD4TbKlQEY9ciwFidCPDyAgbrZgBLebeUVgLMAEGBXMUF7DjLXyXXBPAgwGQBSdn8IYs5WC0s3e7RlgxpWyrPmG8VtaZm6pSfqrto6VnCsrZEh9Z862SkbSyltyd21JJfSW99qrK25tDFqgtRawMGrHSa2BDgLUVWte68N9qwG93KpqVorrxRYSoJ6ymzzfX%2BuXkGpNWGw3HtPRU1m24l1gzzQ0nVHbi2IaNchiuP61GSpvUB3QbbozNIQ2%2BlVvao32AHfAp0qpR1GsWnHKdK7Z3nkvAukQNtt1Qu3WCkGNhnAmbmPDSmbR0ImeszYMEVLYjxBYtWEa3AEBcRBhdE4C1iSPmLFSLkwJixSIvSZK93ohxiF0DpsQN8wUMpfvvT5LpcytFIFyUgyBSCHxCz5vztAAvFhpWGoozBvFQa8ngXzmB/OBYbcwr1JcHa5eq/l2rCReDXngp%2Bmj9gEXMClawdA3BmCtinQNobXJRvXHG9wZAU3NFde4MK4LenoU2AGTNgZ57VsGdycEXdxNzOIipRAXKBUxK7xGreBMdl9moC5Ioc9Rl7uKG4Cdm5q3ms1aC8WQ4iIGGncEudv0TnKt5YKzSI%2Bd2HtPaBcQaYL3uAA4%2B9ISFQQ7AHbM39g9gOhIiQu8kL7rXiy3Y6y9hVYOWsQ9tmwJ7iOcenpR2j0IVp6DXfpkdgHZ38cg%2BSKzgg7PSedZhytgFcOEcPaRyumlX7UdGYcJj16x3oiIlMNdiZrLDizNepdtnIgbtQ46yN4L7Wk66GYLfVb/P2fwOu1SiDfptdxjetb/X59DfXmN2wm2LRlgvC5KYW6HMuboFxShQZRgbbi/sIcvQrRuCUgIG8uKvyICcBA%2BJ1pkme2fsM3LqFruEy2/1xoh3mjN46751dt3Quvd0rW2rZAyB5M9zCZUbAAANXwR5aCinsIobMC02%2BxeRFHlda7mDcu6PBrPhipO59yfn/5QRMdq/19uMvTvt4u%2BrwbqM%2BzmBHkLwtRnouG/ACby3gmv5fw2AYKcu5IJCbEtrlSAlJVKE2mY%2BEUfALJ%2BStPQzzmQkznxzxNVWysFG1YBACsDECsFIFoBgIkHgNQBgLnjjxMHMA7h0HCE4HgIICQMgNGwnBAHCDED4H4DEH4BEAkF4DECoIkE4GkFUBgJEHgMQKsGQNIFQKsHgKMBAB4QIM4MgNIDgFgBgEQBQHxF1nGHIEoB1kHFxGIDSGAH4F4B4RIwlh1EoC5EILSyiE1GBBgLwIy3xEjgIGqFoAgj0KwCqg0HYBsPcX3BWn4OENIEwFMWQED2sBMJGjpD0JIi5EuhBCSAwB8PwNGDSGMJEJoHoCYDYA4B4AEGEHECkFkHkAUDUA0BQDjUMECP4MgFG2JTOAuBgI3F8EUmUnbzUlKE0h0n0nsA3AAmqiaI8KOFXA3GqHCEaOrCKVJFMFyKYMaLSGQEoSKXWGLEfA3GYE4DgOMDMAsESKgJgLmI4K4J4INFhEb2b34D4HsAgFwEID1BwPT1nmkMHHa1wPbHQN0HwMINbGINIIkG4GkF4DkAkGkGkAkAkBEF4E4EEHCBYKsDYNICiNeO4GvAQL0J4L4IENICEOQNGzEMkLpkoQIDkN2XOL1hUD9COOIBUDuluGiOWNgPYOhJgJONOn1ENG2PsF2N4DuOEIeNIAWG6HGEoBJJBKiN4GkG4BoO%2BIFMFIFLbTWJQJgNhMEPu