This feature allows the value of a database column to be set to a SQL expression instead of a literal value. It’s especially useful for atomic updates, calling stored procedures, etc. All you do is assign an expression to an attribute:
class SomeClass(object):
pass
mapper(SomeClass, some_table)
someobject = session.query(SomeClass).get(5)
# set 'value' attribute to a SQL expression adding one
someobject.value = some_table.c.value + 1
# issues "UPDATE some_table SET value=value+1"
session.commit()
This technique works both for INSERT and UPDATE statements. After the
flush/commit operation, the value
attribute on someobject
above is
expired, so that when next accessed the newly generated value will be loaded
from the database.
SQL expressions and strings can be executed via the
Session
within its transactional context.
This is most easily accomplished using the
execute()
method, which returns a
ResultProxy
in the same manner as an
Engine
or
Connection
:
Session = sessionmaker(bind=engine)
session = Session()
# execute a string statement
result = session.execute("select * from table where id=:id", {'id':7})
# execute a SQL expression construct
result = session.execute(select([mytable]).where(mytable.c.id==7))
The current Connection
held by the
Session
is accessible using the
connection()
method:
connection = session.connection()
The examples above deal with a Session
that’s
bound to a single Engine
or
Connection
. To execute statements using a
Session
which is bound either to multiple
engines, or none at all (i.e. relies upon bound metadata), both
execute()
and
connection()
accept a mapper
keyword
argument, which is passed a mapped class or
Mapper
instance, which is used to locate the
proper context for the desired engine:
Session = sessionmaker()
session = Session()
# need to specify mapper or class when executing
result = session.execute("select * from table where id=:id", {'id':7}, mapper=MyMappedClass)
result = session.execute(select([mytable], mytable.c.id==7), mapper=MyMappedClass)
connection = session.connection(MyMappedClass)
The ORM considers any attribute that was never set on an object as a “default” case; the attribute will be omitted from the INSERT statement:
class MyObject(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key=True)
data = Column(String(50), nullable=True)
obj = MyObject(id=1)
session.add(obj)
session.commit() # INSERT with the 'data' column omitted; the database
# itself will persist this as the NULL value
Omitting a column from the INSERT means that the column will have the NULL value set, unless the column has a default set up, in which case the default value will be persisted. This holds true both from a pure SQL perspective with server-side defaults, as well as the behavior of SQLAlchemy’s insert behavior with both client-side and server-side defaults:
class MyObject(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key=True)
data = Column(String(50), nullable=True, server_default="default")
obj = MyObject(id=1)
session.add(obj)
session.commit() # INSERT with the 'data' column omitted; the database
# itself will persist this as the value 'default'
However, in the ORM, even if one assigns the Python value None
explicitly
to the object, this is treated the same as though the value were never
assigned:
class MyObject(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key=True)
data = Column(String(50), nullable=True, server_default="default")
obj = MyObject(id=1, data=None)
session.add(obj)
session.commit() # INSERT with the 'data' column explicitly set to None;
# the ORM still omits it from the statement and the
# database will still persist this as the value 'default'
The above operation will persist into the data
column the
server default value of "default"
and not SQL NULL, even though None
was passed; this is a long-standing behavior of the ORM that many applications
hold as an assumption.
So what if we want to actually put NULL into this column, even though the
column has a default value? There are two approaches. One is that
on a per-instance level, we assign the attribute using the
null
SQL construct:
from sqlalchemy import null
obj = MyObject(id=1, data=null())
session.add(obj)
session.commit() # INSERT with the 'data' column explicitly set as null();
# the ORM uses this directly, bypassing all client-
# and server-side defaults, and the database will
# persist this as the NULL value
The null
SQL construct always translates into the SQL
NULL value being directly present in the target INSERT statement.
If we’d like to be able to use the Python value None
and have this
also be persisted as NULL despite the presence of column defaults,
we can configure this for the ORM using a Core-level modifier
TypeEngine.evaluates_none()
, which indicates
a type where the ORM should treat the value None
the same as any other
value and pass it through, rather than omitting it as a “missing” value:
class MyObject(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key=True)
data = Column(
String(50).evaluates_none(), # indicate that None should always be passed
nullable=True, server_default="default")
obj = MyObject(id=1, data=None)
session.add(obj)
session.commit() # INSERT with the 'data' column explicitly set to None;
# the ORM uses this directly, bypassing all client-
# and server-side defaults, and the database will
# persist this as the NULL value
Evaluating None
The TypeEngine.evaluates_none()
modifier is primarily intended to
signal a type where the Python value “None” is significant, the primary
example being a JSON type which may want to persist the JSON null
value
rather than SQL NULL. We are slightly repurposing it here in order to
signal to the ORM that we’d like None
to be passed into the type whenever
present, even though no special type-level behaviors are assigned to it.
New in version 1.1: added the TypeEngine.evaluates_none()
method
in order to indicate that a “None” value should be treated as significant.
As introduced in the sections Server-invoked DDL-Explicit Default Expressions and Marking Implicitly Generated Values, timestamps, and Triggered Columns, the Core supports the notion of database columns for which the database itself generates a value upon INSERT and in less common cases upon UPDATE statements. The ORM features support for such columns regarding being able to fetch these newly generated values upon flush. This behavior is required in the case of primary key columns that are generated by the server, since the ORM has to know the primary key of an object once it is persisted.
In the vast majority of cases, primary key columns that have their value
generated automatically by the database are simple integer columns, which are
implemented by the database as either a so-called “autoincrement” column, or
from a sequence associated with the column. Every database dialect within
SQLAlchemy Core supports a method of retrieving these primary key values which
is often native to the Python DBAPI, and in general this process is automatic,
with the exception of a database like Oracle that requires us to specify a
Sequence
explicitly. There is more documentation regarding this
at Column.autoincrement
.
For server-generating columns that are not primary key columns or that are not simple autoincrementing integer columns, the ORM requires that these columns are marked with an appropriate server_default directive that allows the ORM to retrieve this value. Not all methods are supported on all backends, however, so care must be taken to use the appropriate method. The two questions to be answered are, 1. is this column part of the primary key or not, and 2. does the database support RETURNING or an equivalent, such as “OUTPUT inserted”; these are SQL phrases which return a server-generated value at the same time as the INSERT or UPDATE statement is invoked. Databases that support RETURNING or equivalent include PostgreSQL, Oracle, and SQL Server. Databases that do not include SQLite and MySQL.
In this case, columns should be marked as FetchedValue
or with an
explicit Column.server_default
. The
orm.mapper.eager_defaults
flag may be used to indicate that these
columns should be fetched immediately upon INSERT and sometimes UPDATE:
class MyModel(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key=True)
timestamp = Column(DateTime(), server_default=func.now())
# assume a database trigger populates a value into this column
# during INSERT
special_identifier = Column(String(50), server_default=FetchedValue())
__mapper_args__ = {"eager_defaults": True}
Above, an INSERT statement that does not specify explicit values for “timestamp” or “special_identifier” from the client side will include the “timestamp” and “special_identifier” columns within the RETURNING clause so they are available immediately. On the PostgreSQL database, an INSERT for the above table will look like:
INSERT INTO my_table DEFAULT VALUES RETURNING my_table.id, my_table.timestamp, my_table.special_identifier
This case is the same as case 1 above, except we don’t specify
orm.mapper.eager_defaults
:
class MyModel(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key=True)
timestamp = Column(DateTime(), server_default=func.now())
# assume a database trigger populates a value into this column
# during INSERT
special_identifier = Column(String(50), server_default=FetchedValue())
After a record with the above mapping is INSERTed, the “timestamp” and “special_identifier” columns will remain empty, and will be fetched via a second SELECT statement when they are first accessed after the flush, e.g. they are marked as “expired”.
If the orm.mapper.eager_defaults
is still used, and the backend
database does not support RETURNING or an equivalent, the ORM will emit this
SELECT statement immediately following the INSERT statement. This is often
undesirable as it adds additional SELECT statements to the flush process that
may not be needed. Using the above mapping with the
orm.mapper.eager_defaults
flag set to True against MySQL results
in SQL like this upon flush (minus the comment, which is for clarification only):
INSERT INTO my_table () VALUES ()
-- when eager_defaults **is** used, but RETURNING is not supported
SELECT my_table.timestamp AS my_table_timestamp, my_table.special_identifier AS my_table_special_identifier
FROM my_table WHERE my_table.id = %s
A primary key column with a server-generated value must be fetched immediately upon INSERT; the ORM can only access rows for which it has a primary key value, so if the primary key is generated by the server, the ORM needs a way for the database to give us that new value immediately upon INSERT.
As mentioned above, for integer “autoincrement” columns as well as PostgreSQL SERIAL, these types are handled automatically by the Core; databases include functions for fetching the “last inserted id” where RETURNING is not supported, and where RETURNING is supported SQLAlchemy will use that.
However, for non-integer values, as well as for integer values that must be explicitly linked to a sequence or other triggered routine, the server default generation must be marked in the table metadata.
For an explicit sequence as we use with Oracle, this just means we are using
the Sequence
construct:
class MyOracleModel(Base):
__tablename__ = 'my_table'
id = Column(Integer, Sequence("my_sequence"), primary_key=True)
data = Column(String(50))
The INSERT for a model as above on Oracle looks like:
INSERT INTO my_table (id, data) VALUES (my_sequence.nextval, :data) RETURNING my_table.id INTO :ret_0
Where above, SQLAlchemy renders my_sequence.nextval
for the primary key column
and also uses RETURNING to get the new value back immediately.
For datatypes that generate values automatically, or columns that are populated
by a trigger, we use FetchedValue
. Below is a model that uses a
SQL Server TIMESTAMP column as the primary key, which generates values automatically:
class MyModel(Base):
__tablename__ = 'my_table'
timestamp = Column(TIMESTAMP(), server_default=FetchedValue(), primary_key=True)
An INSERT for the above table on SQL Server looks like:
INSERT INTO my_table OUTPUT inserted.timestamp DEFAULT VALUES
In this area we are generating rows for a database such as SQLite or more typically MySQL where some means of generating a default is occurring on the server, but is outside of the database’s usual autoincrement routine. In this case, we have to make sure SQLAlchemy can “pre-execute” the default, which means it has to be an explicit SQL expression. Again using the example of TIMESTAMP for MySQL, we unfortunately need to use our own explicit default:
class MyModel(Base):
__tablename__ = 'my_table'
timestamp = Column(TIMESTAMP(), default=func.cast(func.now(), Binary), primary_key=True)
Where above, we select the “NOW()” function and also cast to binary to be used with MySQL’s TIMESTAMP column, that is in fact a binary datatype. The SQL generated by the above is:
SELECT CAST(now() AS BINARY) AS anon_1
INSERT INTO my_table (timestamp) VALUES (%s)
(b'2018-08-09 13:08:46',)
The Core currently does not support a means of fetching the timestamp value after the fact without using RETURNING, so on MySQL must run a SELECT ahead of time to pre-select the value.
See also
Vertical partitioning places different kinds of objects, or different tables, across multiple databases:
engine1 = create_engine('postgresql://db1')
engine2 = create_engine('postgresql://db2')
Session = sessionmaker(twophase=True)
# bind User operations to engine 1, Account operations to engine 2
Session.configure(binds={User:engine1, Account:engine2})
session = Session()
Above, operations against either class will make usage of the Engine
linked to that class. Upon a flush operation, similar rules take place
to ensure each class is written to the right database.
The transactions among the multiple databases can optionally be coordinated via two phase commit, if the underlying backend supports it. See Enabling Two-Phase Commit for an example.
More comprehensive rule-based class-level partitioning can be built by
overriding the Session.get_bind()
method. Below we illustrate
a custom Session
which delivers the following rules:
master
.MyOtherClass
all
occur on the other
engine.slave1
or slave2
database.engines = {
'master':create_engine("sqlite:///master.db"),
'other':create_engine("sqlite:///other.db"),
'slave1':create_engine("sqlite:///slave1.db"),
'slave2':create_engine("sqlite:///slave2.db"),
}
from sqlalchemy.orm import Session, sessionmaker
import random
class RoutingSession(Session):
def get_bind(self, mapper=None, clause=None):
if mapper and issubclass(mapper.class_, MyOtherClass):
return engines['other']
elif self._flushing:
return engines['master']
else:
return engines[
random.choice(['slave1','slave2'])
]
The above Session
class is plugged in using the class_
argument to sessionmaker
:
Session = sessionmaker(class_=RoutingSession)
This approach can be combined with multiple MetaData
objects,
using an approach such as that of using the declarative __abstract__
keyword, described at __abstract__.
Horizontal partitioning partitions the rows of a single table (or a set of tables) across multiple databases.
See the “sharding” example: Horizontal Sharding.
Note
Bulk Operations mode is a new series of operations made available
on the Session
object for the purpose of invoking INSERT and
UPDATE statements with greatly reduced Python overhead, at the expense
of much less functionality, automation, and error checking.
As of SQLAlchemy 1.0, these features should be considered as “beta”, and
additionally are intended for advanced users.
New in version 1.0.0.
Bulk operations on the Session
include Session.bulk_save_objects()
,
Session.bulk_insert_mappings()
, and Session.bulk_update_mappings()
.
The purpose of these methods is to directly expose internal elements of the unit of work system,
such that facilities for emitting INSERT and UPDATE statements given dictionaries
or object states can be utilized alone, bypassing the normal unit of work
mechanics of state, relationship and attribute management. The advantages
to this approach is strictly one of reduced Python overhead:
relationship()
, and the topological sort of all operations to
be performed is completely bypassed. This reduces a great amount of
Python overhead.Session
, even when the operation is complete, meaning there’s no
overhead in attaching them or managing their state in terms of the identity
map or session.Session.bulk_insert_mappings()
and Session.bulk_update_mappings()
methods accept lists of plain Python dictionaries, not objects; this further
reduces a large amount of overhead associated with instantiating mapped
objects and assigning state to them, which normally is also subject to
expensive tracking of history on a per-attribute basis.Session.bulk_save_objects()
, when objects of different types are passed,
the INSERT and UPDATE statements are necessarily broken up into per-type
groups. In order to reduce the number of batch INSERT or UPDATE statements
passed to the DBAPI, ensure that the incoming list of objects
are grouped by type.executemany()
blocks, which
perform vastly better than individual statement invocations.executemany()
blocks can be used.The performance behavior of the bulk routines should be studied using the Performance example suite. This is a series of example scripts which illustrate Python call-counts across a variety of scenarios, including bulk insert and update scenarios.
See also
Performance - includes detailed examples of bulk operations contrasted against traditional Core and ORM methods, including performance metrics.
The methods each work in the context of the Session
object’s
transaction, like any other:
s = Session()
objects = [
User(name="u1"),
User(name="u2"),
User(name="u3")
]
s.bulk_save_objects(objects)
For Session.bulk_insert_mappings()
, and Session.bulk_update_mappings()
,
dictionaries are passed:
s.bulk_insert_mappings(User,
[dict(name="u1"), dict(name="u2"), dict(name="u3")]
)
The bulk methods offer performance that under particular circumstances
can be close to that of using the core Insert
and
Update
constructs in an “executemany” context (for a description
of “executemany”, see Executing Multiple Statements in the Core tutorial).
In order to achieve this, the
Session.bulk_insert_mappings.return_defaults
flag should be disabled so that rows can be batched together. The example
suite in Performance should be carefully studied in order
to gain familiarity with how fast bulk performance can be achieved.
The bulk insert / update methods lose a significant amount of functionality versus traditional ORM use. The following is a listing of features that are not available when using these methods:
relationship()
linkagesMapperEvents.before_insert()
, etc. The bulk
session methods have no event support.Features that are available include:
Session.bulk_save_objects.return_defaults
flag must be used,
which will greatly reduce the performance benefits