Thursday, July 29, 2010

Database work within C#

An application will almost certainly hold data within a data store of some kind. This data store typically is a database of some sort.

The database is designed to efficiently store on disk, data that is stored within the data store.

The database will provide a driver that is responsible for communications between applications and the data store.

The database type defines how that data is segmented. In our example the data store will be a database with a type of Relational (ie. RDBMS).

The RDBMS defines what objects exist within the database. In most cases, the primary objects stored within a database are tables. Other objects are store procedures, user defined functions, user defined views, system objects, etc.

The database will contain settings at a global level which usually define methods of access to its data. Primarily a data store needs to authenticate users and as such, a connection string must be passed to a data store with valid credentials. The data store is then responsible for authenticating the credentials and validating whether the user defined in the credentials has access to the underlying catalogue.

In an RDBMS (such as SQL Server or MySQL), the database engine defines a catalogue as a storage mechanism that holds data and metadata.

The data typically is the table row data that we need to access. The metadata typically is the name of the table, max number of rows, column schema, etc.

Within each catalogue, an 'engine' can be defined. A table engine is a mechanism that controls 'how' the data is actually persisted, and subsequently, how the data is actually retrieved. The engine is only responsible for encoding/decoding data directly off of the filesystem.

When accessing the data, a method to efficiently query, structure and modify that data is needed. As such, multiple languages have been established to perform such functions on the data retrieved by the table engine. In our case, SQL (more accurately, ANSI-SQL) provides a mechanism for translating human written commands into memory operations that produce an output that the developer requires.

SQL contains three portions, the Data Definition Language (DDL), the Data Control Language (DCL) and the Query Language itself. The DDL is responsible for managing tables and index structures. The DCL is responsible for managing authentication. The Query Language is responsible for retrieval of data or its manipulation.

DDL commands typically range in the CREATE TABLE, DROP TABLE arena. DCL commands typically range in the CREATE USER, GRANT ALL ON range. Query Language typically range in the SELECT, INSERT range.

All three command types can be executed within C# through the use of a SqlCommand object.

Ok, let's look at the .Net class libraries responsible for data access.

System.Data
System.Data.Sql
System.Data.SqlClient

These three libraries are the base libraries for using a database within a .Net application. System.Data provides top level classes resposible for accessing data across all database manufacturers (ie. Microsoft, MySQL, Oracle, etc.). System.Data.Sql provides Microsoft SQL Server specific objects that tie in to MSSQL (this is a very small class). System.Data.SqlClient have the Microsoft specific versions of the System.Data classes.

Since each manufacturer must implement most of the System.Data interface, the lessons learned here SHOULD be transferable to ALL manufacturers (ie. you use a SqlConnection object to make a connection to a MSSQL database, you would then use a MySqlConnection object to connect to a MySQL database).

The typical process for a database application is as follows:
  1. Load configuraion from somewhere (typically app.config/web.config/xml config file)
  2. Create a new SqlConnection object
  3. Set the connectionString property (or pass it in with the constructor on step 2) of the SqlConnection object
  4. Call the Open() method on the SqlConnection object. This will set the SqlConnection objects State property to a ConnectionState object representing whatever state the database reports based on the connectionString provided
  5. Test the State for ConnectionState.Open. If the State is Open then proceed otherwise handle the error
  6. Create a SqlCommand object
  7. Set the CommandText of the SqlCommand object to the SQL you wish to execute (or you could pass it in to the constructor in step 6)
  8. If you are running a SELECT (or anything that returns a result) then you will typically call the ExecuteReader() method. This will execute the command and return whatever results you are expecting into a SqlDataReader object ready for examination, otherwise if you are running a command that does not return a result (like INSERT, UPDATE or DELETE whereby the only results returned are the amount of rows affected), then you will typically call the ExecuteNonQuery() method
  9. The ExecuteReader() method will return a SqlDataReader which typically is iterated via the Read() method. The Read() method will return true whilst there is data to read and false when you have read all the data. A SqlDataReader is typically a 'forward only' object in that, you can only read the data once then it is removed from memory. These are very fast and efficient objects that are commonly used to read data rather sequentially rather than randomly access the data. A DataSet object is appropriate if you wish to retain the data after its has been read
  10. The SqlDataReader has an index property called this[string] (or this[int]). This indexer can be used to pull data from the reader if you know the column name otherwise you will retrieve the data via the GetString(int), GetInt32(int), etc. methods
  11. You will then be finished with the reader and will then call Close() on the reader object (otherwise the resource will be empty but still remain in memory)
  12. When you are finished executing your commands you will then need to close the Database object by calling the Close() method on the SqlConnection object. This will release the resource and allow another user to connect to the database (remember, the database server only has a limited number of connections it can have opened at any one time, be respectful and close your connection when you don't need them. You can alway re-open it if you must)
Wow...what a lot of crap going on and this was just the 'brief' version! Let's examine what's going on in more detail.
 
Load configuraion from somewhere (typically app.config/web.config/xml config file)

Depending on your application, it usually is smarter to allow your configuration settings to be accessed, outside of your application, otherwise you would have to recompile your application every time a database server is renamed, moved, credentials are changed, and so on.

Typically an app.config or web.config file is used to store the connectionString details. Whilst this is an awesome thing to do, you serverly compromise your database server as you are storing credentials IN PLAIN TEXT!

For a greater detailed review of encrypting your connection strings, see http://ondotnet.com/pub/a/dotnet/2005/02/15/encryptingconnstring.html (yes yes I totally get that this is FAR from a perfect solution, but obfuscating a password is almost always better than a clear text version, even if it's piss easy to crack!)

Create a new SqlConnection object

What is a SqlConnection object? Well it's a container for all the underlying methods that the database driver exposes. In fact, it wraps up all the crap that each database driver requires from a communications point of view that you don't need/have to worry about. So thankfully, the SqlConnection object is the primary method of communicating to and from a database server.

For every database server you need to connect to (and trust me, sometimes you need to connect to more than one to get all the data necessary to perform your action), you will need a seperate SqlConnection object.

This object holds the connectionString details required to authenticate you against that server. By calling the Open() method you are in fact calling a whole bunch of DCL commands necessary to authenticate you against Microsoft SQL Server. Sweet!

This object then needs to have its State property checked. This holds the state the database returns after the credentials have been checked. Typically the state will be Open or Closed. There are a few others like Broken, Connecting, Fetching, etc. but you typically will never use them.

If the State is Open then you have a green light to proceed. This simply means though, you have permission to connect to the server. This DOES NOT necessarily mean that you have permission to CRUD on the tables you need. This is a seperate level of permissions, one that needs to be granted on your users behalf for each bloody table.

Typically the DCL GRANT ALL is executed for a user which means that for every table, the user will have XYZ access and you don't need to do it for every single table, it's done in one hit, though after it's been done you can selectively remove access to certain tables or remove INSERT/UPDATE/DELETE access to tables leaving SELECT access only (effectively giving READ-ONLY access) but that is a DCL discussion for another time :)
 
Set the connectionString property (or pass it in with the constructor on step 2) of the SqlConnection object

Typically the connection string is passed in with the constructor of the SqlConnection object but you can choose to create the new object and then set the ConnectionString property. This is personal preference. :)

If you forget what the layout of a connection string should be, visit http://www.connectionstrings.com/
 
Call the Open() method on the SqlConnection object
I spoke about this just a few paragraphs ago. By calling the Open() method you are asking the SqlConnection object to issue the required Microsoft SQL Server Authentication DCL commands on your behalf. The server will return a response and from that response, the .Net framework will set the SqlConnection State property accordingly.

By checking the State property, you can know for certain whether you have a successful connection to the database or not.

Please note however, each connection has an overhead of time, memory and network resources associated to it. If the database is on another machine then you have network latency as well. Opening the connection means that the resources are locked to you until you call the Close() method on the SqlConnection object. Until then, no other person is able to use those resources on the server.

Since server resources are usually limited (approx. 150 simultaneous connections), keeping connections opened longer than necessary is considered bad practice and will typically get DBA's pissed off with you :)

Test the State for ConnectionState.Open. If the State is Open then proceed otherwise handle the error
Again, already covered this previously. The primary reason for checking the State property is to know before executing SQL commands, whether you have a server listening to you or not.

You COULD try to open the database connection and execute the SQL within a TRY/CATCH statement and listen for a whole bunch of Exceptions but this is poor practice and defeats the purpose of the State property altogether.

Debugging the reasons why the connection Open() call fails is a WHOLE other can of worms and requires a bit more knowledge on the SqlException Error Codes that Microsoft SQL Server return and therefore is beyond the scope of this tutorial.

Create a SqlCommand object
Ok, we've got our credentials, made our connection, connected successfully and now what?

Typically all DDL commands are handled through the SqlCommand object. The purpose of this object is to translate your SQL commands into the appropriate DDL commands necessary to execute your query (and handle the method of preparing that data for use later on).

The first thing you'll notice is that a SqlCommand object ALWAYS requires a SqlConnection object. Without it, the command has no where to execute against and therefore is a dud object. So typically a SqlCommand object constructor will take a SqlConnection object, though can can set a single command object to handle ALL your SQL command requests and set the Connection property.

'What do you mean by set a single command object?'

Well there is no real need to create multiple SqlCommand objects in order to execute multiple SQL commands. Each SqlCommand object uses resources and memory so reducing the number of SqlCommand objects will optimise your application. Though it is not necessary to use just one, it's typically a good idea. I'll leave that up to your personal judgement.

'How would you use just one command object to execute multiple commands tho?'

Good question. Let's take a look at some code (FINALLY!).
// Create SqlConnection object and pass connection string into constructor
SqlConnection con = new SqlConnection("Data Source=myServerAddress;Initial Catalog=myDataBase;User Id=myUsername;Password=myPassword");

// Attempt a connection
con.Open();

// Test whether the connection was successful or not
if (con.State == ConnectionState.Open) {
    // Create the single SqlCommand object that we will re-use
    SqlCommand com = new SqlCommand();
   
    // Set the connection property to the newly opened SqlConnection object
    com.Connection = con;
   
    // Set the command text of the SqlCommand object to the first SQL command to execute
    com.CommandText = "SELECT id, user, password FROM users;";
   
    // Execute the command and return the data into a SqlDataReader object
    SqlDataReader reader = com.ExecuteReader();
   
    // Do something more useful with the data than I have done in this example
    while (reader.Read()) {
        int id;
        string username, password;
       
        if (!reader.IsDBNull(0)) {
            id = reader.GetInt32(0);
        } else {
            id = -1;
        }
       
        if (!reader.IsDBNull(1)) {
            username = reader.GetString(1);
        } else {
            username = "Unknown";
        }
       
        if (!reader.IsDBNull(2)) {
            password = reader.GetString(2);
        } else {
            password = "";
        }
    }
   
    // Release the reader object but don't destroy it. This frees the object for future re-use
    reader.Close();
   
    // Set the command text of the SqlCommand object to the next SQL command to execute
    com.CommandText = "SELECT id, user, password FROM users WHERE id > 4;";
   
    // Populate the existing reader with the data from the next SQL command
    reader = com.ExecuteReader();

    // Again, do something more useful with the data than I have done in this example
    while (reader.Read()) {
        int id;
        string username, password;

        if (!reader.IsDBNull(0)) {
            id = reader.GetInt32(0);
        } else {
            id = -1;
        }

        if (!reader.IsDBNull(1)) {
            username = reader.GetString(1);
        } else {
            username = "Unknown";
        }

        if (!reader.IsDBNull(2)) {
            password = reader.GetString(2);
        } else {
            password = "";
        }
    }
   
    // Release the reader resources
    reader.Close();
   
    // Release the resources (being respectful :P)
    con.Close();
} else {
    // Handle when connection to database fails for whatever reason
}

All the other items have now been addressed within the example code so I don't need to explain them further.

Let's look at the code in a bit more detail and explain what's going on.

// Create SqlConnection object and pass connection string into constructor
SqlConnection con = new SqlConnection("Data Source=myServerAddress;Initial Catalog=myDataBase;User Id=myUsername;Password=myPassword");

// Attempt a connection
con.Open();

// Test whether the connection was successful or not
if (con.State == ConnectionState.Open) {

You should recognise what's going on here. Creation of a new SqlConnection object, population of the connectionString, attempting to connect and verifying whether that connection was successful.

Assume it is we then...

    // Create the single SqlCommand object that we will re-use
    SqlCommand com = new SqlCommand();
   
    // Set the connection property to the newly opened SqlConnection object
    com.Connection = con;

Here we are creating the SqlCommand object that we are going to re-use and associating the SqlConnection object with it. This is required because the SqlCommand object creates the DDL commands directly and does not know what connection to send them down without being told specifically which SqlConnection object it must use.

The conncetion object remains the same unless you specify a different SqlConnection object.

    // Set the command text of the SqlCommand object to the first SQL command to execute
    com.CommandText = "SELECT id, user, password FROM users;";

Here we are specifying the first command we wish to execute by setting the SQL to the CommandText property. When calling Stored Procedures, we have to adjust our approach slightly but we'll go into that later in the tutorial (sorry guys, didn't get around to it this time!)

    // Execute the command and return the data into a SqlDataReader object
    SqlDataReader reader = com.ExecuteReader();

At this point we still haven't done anything other than set up the SqlCommand object. This line actually forms the DDL's, and sends them through the associated SqlConnection object. The SqlConnection object then prepares the commands by breaking them down into network packets and physically sends the data across the network.

When received the server will execute the commands and based on whether you screwed the SQL up or not, will return an answer (whether it's 'you're shit out of luck' or, 'here's the data you requested' is another story).

You will be given a SqlDataReader object. Whether it's filled with any data or not is beside the point, you will be given the SqlDataReader object period.

    // Do something more useful with the data than I have done in this example
    while (reader.Read()) {

We then call the Read() method on the reader. This will tell us whether there are any more 'unread' rows of data queued in the reader for us to have or not. So we use a while() statement to handle the looping as we get the free bail out when there's nothing more to read and the continuance if there is stuff to read all in one command.

        int id;
        string username, password;
       
        if (!reader.IsDBNull(0)) {
            id = reader.GetInt32(0);
        } else {
            id = -1;
        }
       
        if (!reader.IsDBNull(1)) {
            username = reader.GetString(1);
        } else {
            username = "Unknown";
        }
       
        if (!reader.IsDBNull(2)) {
            password = reader.GetString(2);
        } else {
            password = "";
        }

Ok, pretty useless stuff here but it does explain how to get access to the data within the reader on a 'row-by-row' basis. Because we are accessing the data directly, an intimate knowledge of the data schema is required. Otherwise we would have to examine the data type and row count of each item and decide how to handle the data which requires a lot more code. So for this example, let's assume that we know we are operating on a database table called users and the schema looks like this:

Users
-----
id int not null auto_increment primary key,
user varchar(100) not null,
password varchar(100) not null

We are also going to assume that there are 10 user records of various data in the table.

Since C# is a strongly typed language we must be specific with our data mappings. We can use the Sql Types themselves instead of the int and string types but in this example we're going to stick with C# data types.

So we define our expected data types:

        int id;
        string username, password;

In this example, these are just placeholders for values that are returned from the database whereas IRL (in real life) you would typically create an object of type User and store the values in the properties of the object, but that's another tutorial.

Since some fields can be null, it's typically safest to verify that the column you are going to retrieve is not null before trying to access it because the database is happy to give you a null value, but C# is NOT happy to accept a null value when it's expected a string, int or other data type. So for type safety we need to check if the column in the current reader row is null or not.

        if (!reader.IsDBNull(0)) {
            id = reader.GetInt32(0);
        } else {
            id = -1;
        }

Here we ask the reader to tell us whether column zero IsDBNull or not. This returns a true or false result. If it's true, you know you have to handle the field when you are presented with a null value. Since the code is written, 'if reader.IsDBNull(column zero) IS NOT true then id = reader.GetInt32(column zero) otherwise id = -1'.

My logic simply expects to handle the TRUE state first (hence the ! at the beginning of the reader.IsDBNull(0)) then the FALSE state second.

This is repeated for each of the columns in the row except the data type that is being accessed is different. This allows us to handle the data type mapping from the database to C#. This is the trade off between having strongly typed objects accessing data that can have a varied number of types. From an optimisation POV, this process CAN be improved but for the sake of brevity, we'll demand that the dev know the structure of the table intimately (in all honesty, a dev that doesn't know the table structure is going to have more problems that this!).

    // Release the reader object but don't destroy it. This frees the object for future re-use
    reader.Close();

Ok so we've looped through the reader and we've processed every row that was received by the reader, now what? Well in order to re-use the reader we must Close() it first. This releases the resource and allows new data to be added to its internal data structures. If we don't close the reader first we'll receive an Exception which will kill your app :)

    // Set the command text of the SqlCommand object to the next SQL command to execute
    com.CommandText = "SELECT id, user, password FROM users WHERE id > 4;";
   
    // Populate the existing reader with the data from the next SQL command
    reader = com.ExecuteReader();

    // Again, do something more useful with the data than I have done in this example
    while (reader.Read()) {
        int id;
        string username, password;

        if (!reader.IsDBNull(0)) {
            id = reader.GetInt32(0);
        } else {
            id = -1;
        }

        if (!reader.IsDBNull(1)) {
            username = reader.GetString(1);
        } else {
            username = "Unknown";
        }

        if (!reader.IsDBNull(2)) {
            password = reader.GetString(2);
        } else {
            password = "";
        }
    }
   
    // Release the reader resources
    reader.Close();

This code basically does the same thing as the first SQL request except it issues a different command. By re-using the SQLCommand and SQLDataReader objects we can reduce the overhead consumed by the application whilst improving performance by not needing to do all the memory allocation and object initialisation that is required just to get these objects ready to use.

    // Release the resources (being respectful :P)
    con.Close();

Once all work has been done on the SqlConnection object we need to Close() it. This will release the network connection to the server and subsequently put the SqlConnection object (and all child objects such as the SqlCommand) in a state ready for the Garbage Collector to clean up after you.

Conclusion

At this point we've gone through the entire process of connecting, quering and doing something with the data from a data connection.

Examine the source code associated with this tutorial and step through the TutorialCode entry point. When you have finished, comment the TutorialCode entry point and uncomment the BetterCode entry point and step through this one. From a logical, pragmatic and maintainable POV the BetterCode entry point represents a better way of handling the repetitivness of code, and allows to easier maintainability in the future. Something you will appreciate when you go back to your own source code a week from now and think, WTF was I thinking when I wrote this?!?!

No comments:

Post a Comment