11. Persistence The use of files, streams and serialization for storing object model data
Storing Application Data � Without some way of storing data off-line computers would be virtually unusable � imagine a Word Processor which forced you to complete a document and print it in a single session – who’d use it? � Most programs involve several types of data � Status information – e.g. index of current item in a list � Convenience information – e.g. location and size of main window (just another type of status information) � Object Model data – the state of every live object in a running program � Some or all of this can be saved, either in a single site or spread among any of the available storage mechanisms
Storage Mechanisms � Windows Registry � Part of the operating system (files owned by the O/S) � Good for storing small amounts of data � Files � Standard way of persisting information � Can be highly structured or very simple, depending on data being stored � XML � Files again, but based on standards that make it possible for different systems to share the data � Databases � Very structured and with a lot of program overhead, but very efficient for saving large amounts of data (we will cover this in detail next chapter)
The Registry � The Windows Registry is a large text database, that stores data in a hierarchical structure � Application data is stored in a tree structure– typically � The application name is the top level – e.g. MyProgram � A section name to group related items of data together – e.g. RecentFiles � A key name, that specifies a name and a single item of data – e.g. File1=“C:\MyData\Datafile1.dat” � The registry is an operating-system wide resource, and so must be treated with care � NO STORING LARGE AMOUNTS OF DATA, because potentially every program in windows will use the registry � Use only the standard functions – GetSetting() and SaveSetting() for reading and writing
e.g. Saving a form’s size and position in the registry Private Sub frmRegistry_Load(ByVal sender As System.Object, _ ByVal e As System.EventArgs) _ Handles MyBase.Load Me.Left = CInt(GetSetting("RegDemo", "Position", "Left", _ CStr(Me.Left))) Me.Top = CInt(GetSetting("RegDemo", "Position", "Top", _ CStr(Me.Top))) Me.Width = CInt(GetSetting("RegDemo", "Size", "Width", _ CStr(Me.Width))) Me.Height = CInt(GetSetting("RegDemo", "Size", "Height", _ CStr(Me.Height))) End Sub Private Sub frmRegistry_Closing(ByVal sender As Object, _ ByVal e As System.ComponentModel.CancelEventArgs) _ Handles MyBase.Closing SaveSetting("RegDemo", "Position", "Left", CStr(Me.Left)) SaveSetting("RegDemo", "Position", "Top", CStr(Me.Top)) SaveSetting("RegDemo", "Size", "Width", CStr(Me.Width)) SaveSetting("RegDemo", "Size", "Height", CStr(Me.Height)) End Sub
File Storage � All computer data (including registry data, database data) is stored in files if it needs to be persisted � Various device types (Disks, Hard Disks, CD-R/Ws, Mag-Tape, Flash cards etc.) have data stored in them by the OS/ File System, so that all appear the same to a program – simple File devices � There are only 4 basic operations to worry about when using files � Opening a file – prepares it for Read and Write operations � Reading from a file – extracts an item of data and moves on to prepare to read the next item � Writing to a file – inserts new data at the end of the file � Closing a file – files that are open are vulnerable to corruption. Closing a file puts it into a safe state
Files and Streams � Because of the way a file works we can think of it as having a flow of data � Data is read from a file in exactly the same order it was written to it � The name used to indicate this is a stream (although streams can also be to a network, memory, a modem or other devices) � In .NET, most files are treated as streams � StreamReader class defines objects that know how to read from a stream � StreamWriter class defines objects that know how to read from a stream � Data sent to a stream can be ambiguous, because there is no automatic way to separate one item from the next � e.g. save 10, 20, 30 and 40, and it will be written as 10203040 – all crunched together � To deal with this, we use delimiters to mark the end of each item of data � CSV – Comma Separated Variables, so the 4 numbers are saved as “10, 20, 30, 40” � Other delimiters (e.g. Tab, Space) can be used instead, but comma is normal
Structured Data and Streams � Saving Objects to a stream brings Class BankAccount new problems Name : String � How to separate the individual object Address : String member fields AccountNo : Long Balance : Decimal � How to separate the different objects � Best approach is to precede each object with a header, indicating its :BankAccount Object class – do this for EVERY class, Joe Bloggs including individual member 1 High St., Sometown variables 12345678 £550.00 � When reading objects from a stream, start by reading the Stream BANKACCOUNT~STRING header, then create the object and Joe Bloggs~STRING 1High read the data into it (up to the next St.,Sometown~LONGINT12 header) 345678~DECIMAL550.00 � This process is called Serialization Note ~ is used as a header prefix in this example
Serialization � There are two ways to do serialization in .NET � Write Load() and Save() methods for each class, including code to handle structure (collections etc.) � Use the .NET <Serializable()> attribute and the BinaryFormatter or XMLFormatter class to store the data � The first of these is likely to produce output that is easier for a human to read, but involves a lot of work � The second requires less work, but produces Binary or XML output. Binary can be difficult to fix if the data gets corrupted; XML contains a lot of redundant information.
XML � Serialization in general is not based on any specific standards � All programs/programmers/environments have their own variations, based on ease of programming, efficiency (in storage) and other preferences � This makes it difficult to exchange data between programs � Two programs written by the same programmers can share data without too much difficulty, but… � What about programs written by different programmers, in different languages, or for different environments (e.g. .NET and Linux) � XML was created as a standard way of serializing data into files � XML uses plain text, so no problems about binary compatibility � XML documents are ‘self-describing’, so the content of a document is easy to interpret � XML is not a rigid language, but a format that allows new types of document to be designed easily so that their content is described adequately for any given domain (e.g. finance, CAD) within the rules of XML
XML Format and Rules � An XML document has a tree structure with a single root node (e.g. customer) � Each element of data is encolsed in an <customer ID=”12345”> opening and a closing tag <name>Fred Bloggs</name> <tag>Data</tag> <address> <street>25 Glen Road</street> � Null data can be represented by an <town>Ayr</town> empty pair of tags <tag></tag> or an <postcode>KA11 1BG</postcode> empty tag <tag/> </address> � Elements can be nested, but this must <lastorderdate>17/12/2002</lastorderdate> be done correctly e.g. <email/> <x><y>data</y></x>, not </customer> <x><y>data</x></y> � Tag names are case-sensitive e.g. <Tag> is not the same as <TAG> or Note empty email tag <tag> � Elements can have attributes, which appear within the opening tag as a name and value – the value must be in quotes
System.XML � The System.XML namespace in .NET provides a number of classes for reading, writing and formatting XML � Use XmlTextWriter class to create a XML document � The XmlDocument class is used to read data from a Xml file, and provides methods for extracting elements and attributes � The XmlNode class is used to accept single nodes extracted from a XmlDocument or create new nodes � Since a XML element can be a complex item containing collections and hierarchy, a XmlNode can house anything from an entire XML document to a single element containing one item of data
XML and Object Models � Best approach is to provide each class in an application with methods for dealing with XML data � WriteXML() method can be used to pack the class member data into a XML element and return it as a string � An overloaded New() method can be created to accept an XmlNode as a parameter, and construct an object from it � Using this approach, even complex hierarchies can be dealt with easily in an application, since each class that needs to be persisted to and retrieved from XML can fend for itself
Recommend
More recommend