A language for expressing parallel processing based on an AssociativeMemory
data structure that provides a repository (mathematically a bag
; that is, a set that can contain multiple copies of the same item) for tuples (see TupleDefinition
). The structure has six primitive operations: put
, and eval
places a tuple into the bag. copy
finds and reads a tuple. take
is like copy
but also removes a tuple after reading it. Copy
and find block if they cannot find a tuple matching the query; try_copy
are the non-blocking versions. Eval
forks a new process.
is an AssociativeMemory
, which means that tuples are accessed not by their address
but rather by their content and type
The repository is "generative" in that (see: GenerativeCommunication
) because a tuple generated by a process has an independent existence
. Any process may remove the tuple, and the tuple is bound to no
process in particular. any process can reference any Tuple regardless of where that Tuple is stored. A TupleSpace
is a logical concept and does not require an underlying physical shared memory
One strength of the model is its ability to describe parallel algorithms without reference to any specific computer architecture.
The concept of tuple space was pioneered by DavidGelernter
at the YaleLindaGroup
with the LindaTupleSpaces
system and was initially offered in FortranLanguage
languages. However, various other TupleSpace
systems exist today such as LiPS or Actor
Spaces for many languages including the SmalltalkLanguage
). For some reason, the JavaLanguage
is experiencing an explosion of generative systems including JavaSpaces
, TSpace, Page
, and so on.
Systems based on the BlackboardMetaphor
can be simply created in TupleSpace
. Tuple spaces are more general than the BlackboardMetaphor
(which has specialists grabbing information scoped to their specialty), for example, they can also be used for ComputeServer
For criticisms of the TupleSpace
model relative to ConcurrentLogicProgramming?
https://www.cypherpunks.to/erights/history/clp/linda-letters.pdf (starting on page 4 of the PDF).
Putting mutable reference objects into a tuple may incur some problems that subvert some of the benefit for using tuple spaces.
What is a TupleSpace?; by comparing it to other similar things
vs message queue
Messages go from one person to another (or to many others). TupleSpace
is a bag with tuples in it. Each process can then find a tuple that matches their search criteria and remove it from the bag. If you choose to have as a field in your tuple a destination process, and if selection is done on that field, then tuples can sort of simulate message passing. By copying rather than removing tuples you can also simulate messages to groups, and broadcast messages.
However, one of the key differences between a TupleSpace
and a message queue involves the temporal aspect. Generally with a message queue once the message has reached the front of the queue and been processed it is gone. With a TupleSpace
that does not have to happen. A Tuple can be placed into the TupleSpace
and it may never be taken. Or it may be taken in a month. Or it might already have been taken by the time you read this sentence.
In addition, MessageQueue
has ordering of the messages, which is not maintained by a TupleSpace
vs SQL DB
is arguably a more primitive construct than a SQL DBMS. A SQL DBMS could conceivably be built using a TupleSpace
implementation as its storage layer. It would, however, be an awkward AbstractionInversion
to create a TupleSpace
using an SQL DBMS, especially in cases where TupleSpace
functionality is needed without requiring other aspects of SQL or DBMSes.
So it's almost like a database assembler language?
[I don't know. What's a database assembler language? There are various technologies that can underpin a relational database management system. TupleSpace
is one possibility, key/value stores (like the Berkeley DB) is another.]
Implementations of TupleSpaces
A question. On TupleSpaceScalability
, I just made the assertion that a distributed, persistent, transactional, fault tolerant AssociativeMemory
cannot scale linearly; that is, TupleSpace
won't work on the Internet, precisely because of its underlying "distributed shared memory" idea. Some assumptions have to be relaxed slightly; I've indicated one possible direction there, but there must be others. What do you think? -- VladimirSlepnev?
I am a HUGE fan of tuple spaces. They are the most powerful execution environment there is, since the guillotine has gone out of favor. No contest. I believe it can be optimally configured to model any execution environment. Static binding, no problem, dynamic binding, no problem, distribution, no problem, transactions, no problem, persistence, no problem, and so on. These Yale guys are sitting on a gold mine. All we need is a fast O/S mapping and this would take over. No doubt. Word :). -- RichardHenderson
A gold mine? Does YaleUniversity (or anyone else for that matter) hold patents on TupleSpace?
I'm thinking it wouldn't be such a hideous AbstractionInversion
to use an RDBMS to implement a TupleSpace
Instead of having different kinds of tuples, you simply have records in different tables. Since most modern RDBMSs provide in-memory tables, you have the possibility creating/consuming large numbers of them without hitting the disk space (so long as you can keep your tables small enough to fit into memory). The transactional nature of such things would seem to be a good fit for Linda's atomic operations. Additionally, if something happened which caused a transaction to fail, the ability to roll-back to a known state and try again would be very helpful. In this respect, I see quite a bit of the development surrounding Linda as re-inventing the RDBMS wheel. Naturally, you'd want to do it in a more lightweight fashion, so much as possible. Eliminating the SQL engine and providing an API to access data/perform operations would be a good start.
Many of the Linda tasks you create are based on "consume a tuple of this type, do X calculations with it, emit a tuple of this other type, repeat as necessary." This would require something repeatedly polling the database, looking for an input tuple to consume. If you could use triggers, which spawn or execute that process every time a record is written to the appropriate table, you get an interrupt-driven process instead of a polling-driven process; from there, it becomes a question of the DB server's ability to manage large numbers of spanwed tasks. Add in the fact that, for many, modern RDBMSs you can have a cluster of machines, sharding the data from a specific set of tables across multiple physical machines, and you have the ability to scale across multiple physical machines for increased performance. Indeed, while you may want some of the records persisted for actual consumption outside of the process, if the vast majority of "intermediate" results, working toward those records, can be done entirely in memory, it may be worth your while to have several physical machines which only shard the in-memory tables and handle the bulk of the triggered processes.
While such a system would be a bit of overkill, I suspect there are more people who are well-versed in the care and feeding of RDBMSs than there are developing with Linda. This could be a good "jumping off point" for introducing more people, inside and outside of various enterprises, to the concepts in Linda.
Are there papers out there about people already trying this? Or have tried this? I have to think I'm not the only person thinking in this direction, but I'm coming up short on hits on Google. It's possible I'm just not searching on the right terms, yet. -- Meower68
It's not clear to me what RAM has to do with it. That's more or less of an implementation detail, not a database "user" interface issue. Often it's best to focus on the UI/language/interface aspects of a "technology" first, and
then consider implementation issues. Of course it's not always seamless because machine/performance-issue trade-offs often dictate what's available or feasible.
To clarify, one of the performance advantages which a TupleSpace
frequently has over an RDBMS is that the TupleSpace
lives only in RAM. I'm merely suggesting that a modern RDBMS, with memory-backed tables, would bring the performance closer to parity.
[Any DBMS or TupleSpace
developed in the last few decades makes extensive use of caching, to the point that disk speed generally isn't an issue except (possibly) at startup whilst the cache is being populated, or if the pattern of retrievals results in a significant quantity of cache misses. The most notable conceptual distinction between a TupleSpace
and an RDBMS is that in an RDBMS, tuple storage must be organised into a finite quantity of predefined tables. TupleSpace
s do not have this structural constraint. Depending on the application, this may be a help or a hindrance. If there are a wide variety of tuple types, the overhead of creating one table per tuple type (or of using some schema that supports arbitrary tuple types, such as EAV) may prohibit using an RDBMS.]
(See related discussion at MultiParadigmDatabaseDiscussion
Wow. I'm almost sorry I brought this up.
When designing an application which will run in a TupleSpace
, you typically have some idea, up front, as to what kind of tuples you will need to create. In that case, it's pretty easy to just create tables and read/write records to them. To avoid getting hung up from the beginning on the whole MultiParadigmDatabase
argument, and what is and is not implemented, I'm asking if anyone has done something like this, using SQL and whatever stored procedure language the server implements. I recognize this is not what RDBMSs were created for, but it would appear that one could play around with this, possibly resulting in something useful. Does anyone know if this has been tried?
If the answer is no
, no one has tried this, fine; that is a valid answer. It's possible that what I'm asking is simply stretching/abusing such systems too much. If someone knows of someone who has
tried this, please provide some references where I can learn more about it. -- Meower68
I'm not aware of anyone who has tried this. I'm not sure what you expect the outcome to be, other than (perhaps) a somewhat slower than usual TupleSpace implementation.
Getting most Fortune 500 companies to provision systems, install a TupleSpace
and experiment with it is, in my experience, a bridge too far. All of them, however, have RDBMSs in place. Those are reasonably well-understood technologies. Many of them have clustered, sharded databases in place, both for production work and on test/development levels. Which is more likely to get a positive response from the typical F500 Pointy Haired Boss?
- We need a higher-performance way to do X. I have some ideas for how to do it, utilizing existing infrastructure, albeit in an unconventional fashion. Can I get some time to experiment with it?
- We need a higher-performance way to do X. Can I get some servers provisioned, on the development layer, and install a bunch of new software, as well as some time to experiment with it?
A dedicated TupleSpace
would perform better; on that we agree. Additional infrastructure, hardware and software, especially software with which no one currently has experience, is a harder sell to management. If, however, you can get the RDBMS solution working and show significant performance improvement, you may be able to lead them toward a dedicated TupleSpace
. -- Meower68
If your organisation does not have a culture of experimentation, it's no more likely to let you experiment on a DBMS than on provisioned servers or anything else. From the company's point of view, the difficult (and usually impossible) sell is "can I get some time to experiment with it" -- unless you were explicitly hired to conduct such experiments. And that's rare in organisations without an experiment-oriented culture. Usually, if "innovation" can't be bought from Oracle or Microsoft along with a full support contract, or implemented entirely within an Excel spreadsheet, it's not going to happen. On the other hand, if your organisation does support experimentation, it's as likely to support provisioning a server or two as it is to allow you your own experimental database. Indeed, with modern data centre infrastructure, provisioning servers may require no more expense and effort than launching a couple more virtual machine instances on the corporate private cloud.
Time for a Tuple-Ware party? :-)
Contributors include LukeGorrie