Wednesday, October 24, 2007

Embedded Databases

An embedded database is a software component that is generally a part of the application. The ones that I have looked at:

  • Have a small footprint, about 2 megabytes for the base engine and embedded JDBC driver.
  • Based on Java, JDBC, and SQL standards.
  • Require zero or limited human administration

Click here to find out more about embedded databases.


HSQLDB

HSQLDB support is available from HyperXtremeSQL, and is available under the BSD license. It is a pure Java database, has a small footprint, and comes bundled with third party products like OpenOffice, JBoss, RSA crypto libraries.

Observations

  • It has an 8Gb database limit, it does not support blobs greater than 4Kb, and does not support server side cursors.
  • Database Directory
    • Each HSQLDB database consists of between 2 to 5 files; all named the same but with different extensions, located in the same directory. For example, the database named "test" consists of test.properties, test.script, test.log, test.data, test.backup, test.lck.
      • The script file contains the definition of tables and other database objects, plus the data for non-cached tables.
      • The log file contains recent changes to the database.
      • The data file contains the data for cached tables.
      • The backup file is a zipped backup of the last known consistent state of the data file.
      • The lck file is also used to record the fact that the database is open.
      • The log and lck files are deleted at normal SHUTDOWN.
  • Deployment Options
    • HSQLDB can be run in a number of different ways. In general these are divided into Server Modes and In-Process Mode (also called Standalone Mode).
    • In-Process Mode
      • Faster as the data is not converted and sent over the network.
      • More secure as it is not possible by default to connect to the database from outside your application. As a result you cannot check the contents of the database with external tools such as Database Manager while your application is running.
      • In 1.8.0, you can run a server instance in a thread from the same virtual machine as your application and provide external access to your in-process database.
  • In-Memory Only Database
    • It is possible to run HSQLDB in a way that the database is not persistent and exists entirely in random access memory.
  • Shutdown
    • In 1.8.0, a connection property, shutdown=true, can be specified on the first connection to the database (the connection that opens the database) to force a shutdown when the last connection closes.
    • When SHUTDOWN is issued, all active transactions are rolled back.
    • A special form of closing the database is via the SHUTDOWN COMPACT command. This command rewrites the .data file that contains the information stored in CACHED tables and compacts it to size
    • The SHUTDOWN should also be executed when accessing the database using the HSQL Database Manager tool.
    • If the DB is not shutdown properly, SHUTDOWN via JDBC, the occasionally HSQLDB all data changes (updates, inserts, deletes) made during the session. This is despite the fact that the:
      • Database is not configured as “In-Memory” type
      • Database is configured to frequently flush data to disk
      • JDBC transactions were executed in an “Auto-Commit” mode
    • If the DB is not shutdown properly, SHUTDOWN via JDBC, the physical lock files have to be manually deleted. Otherwise, upon JVM restart, HSQLDB complains that the database files are locked by another process.
  • Boot Database
    • When a server instance is started, or when a connection is made to an in-process database, a new, empty database is created if no database exists at the given path. This feature has a side effect that can confuse new users. If a mistake is made in specifying the path for connecting to an existing database, a connection is nevertheless established to a new database.
  • SQL Standard
    • The SQL dialect used in HSQLDB is as close to the SQL92 and SQL200n standards as it has been possible to achieve so far in a small-footprint database engine. Not all the features of the Standard are supported and there are some proprietary extensions. In 1.8.0 the behavior of the engine is far more compliant with the Standards than with older versions.
  • Types of Tables
    • Memory tables are the default type when the CREATE TABLE command is used. Their data is held entirely in memory but any change to their structure or contents is written to the .script file. The script file is read the next time the database is opened, and the MEMORY tables are recreated with all their contents.
    • CACHED tables are created with the CREATE CACHED TABLE command. Only part of their data or indexes is held in memory, allowing large tables that would otherwise take up to several hundred megabytes of memory. Another advantage of cached tables is that the database engine takes less time to start up when a cached table is used for large amounts of data. The disadvantage of cached tables is a reduction in speed.
    • It lets you use CSV files directly as Database tables, which is very handy for testing.
  • Write Delay
    • The purpose of this command is to control the amount of data loss in case of a total system crash.
    • This property can only be set (permanently) by executing the SET WRITE_DELAY SQL command, using the Database Manager Tool.
    • The default is TRUE and indicates that the changes to the database that have been logged are synched to the file system once every 20 seconds. FALSE indicates there is no delay and at each commit a file synch operation is performed. This will slow the engine down to the speed at which the file synch operation can be performed by the disk subsystem.
    • Values down to 10 milliseconds can be specified by adding MILLIS to the command, but in practice a delay of 100 milliseconds provides 99.99999% reliability with an average one system crash per 6 days.

Additional Documentation



Derby

  • It was first released as Cloudscape in 1997.
  • Derby and JavaDB are offshoots. Support for Apache Derby is available as Cloudscape from IBM and JavaDB from Sun
  • Derby is under the Apache license
  • Derby has a small footprint -- about 2 megabytes for the base engine and embedded JDBC driver.
  • Derby is based on the Java, JDBC, and SQL standards.
  • Derby provides an embedded JDBC driver that lets you embed Derby in any Java-based solution. In the default configuration there is no separate database server to be installed or maintained by the end user.
  • Apache Derby is a big community of developers that includes people from Sun and IBM, which makes it backed by 2 major companies that are already major open source contributors
  • The user and developers mailing lists are very active ones and you can get good assistance from these too
  • Supports server side cursors

Observations

  • Derby has some reserved tables that complain if you name your tables the same. Like USER. Hypersonic doesn't have this problem.
  • Deployment Options: Derby software distribution provides two basic deployment options
    • Embedded: Refers to Derby being started by a simple single-user Java application. With this option Derby runs in the same Java virtual machine (JVM) as the application. Derby can be almost invisible to the end user because it is started and stopped by the application and often requires no administration.
    • Server (or Server-based): Refers to Derby being started by an application that provides multi-user connectivity to Derby databases across a network. With this option Derby runs in the Java virtual machine (JVM) that hosts the Server. Applications connect to the Server from different JVMs to access the database. The Derby Network Server is part of the Derby software distribution and provides this type of framework for Derby. Derby also works well with other, independently developed Server applications.
  • Database Shutdown
    • If an application starts the Derby engine, the application should shut down all databases before exiting. The attribute “;shutdown=true” in the Derby connection URL performs the shutdown. When the Derby engine is shutdown, all booted databases will automatically shut down. The shutdown process cleans up records in the transaction log to ensure a faster startup the next time the database is booted. You can shut down individual databases without shutting down the engine by including the database name in the connection URL. NOTE: A successful shutdown always results in an SQLException to indicate that Derby has shut down and that there is no other exception.
  • Boot Database
    • The DriverManager class loads the database using the Derby connection URL stored in the variable connectionURL. This URL includes the parameter “;create=true” so that the database will be created if it does not already exist.
  • Database Directory: A Derby database is stored in files that live in a directory of the same name as the database. A database directory contains
    • log directory: Contains files that make up the database transaction log, used internally for data recovery (not the same thing as the error log)
    • seg0 directory: Contains one file for each user table, system table, and index (known as conglomerates).
    • service.properties file: A text file with internal configuration information.
    • tmp directory: (might not exist.) A temporary directory used by Derby for large sorts and deferred updates and deletes. Sorts are used by a variety of SQL statements. For databases on read-only media, you might need to set a property to change the location of this directory. See "Creating Derby Databases for Read-Only Use".
    • jar directory: (might not exist.) A directory in which jar files are stored when you use database class loading.
  • Management: Included with the product are some standalone Java tools and utilities that make it easier to use and develop applications for Derby. There are no GUI management tools.
    • ij: ij is Derby's interactive JDBC scripting tool. It is a simple utility for running scripts against a Derby database. You can also use it interactively to run ad hoc queries. ij provides several commands for ease in accessing a variety of JDBC features. ij can be used in an embedded or a client/server environment.
    • The import and export utilities: These server-side utilities allow you to import data directly from files into tables and to export data from tables into files. Server-side utilities can be in a client/server environment but require that all files referenced be on the Server machine.
    • Database class loading utilities: These utilities allow you to store application logic in a database.
    • Sysinfo: sysinfo provides information about your version of Derby and your environment.
    • dblook: dblook is Derby's Data Definition Language (DDL) Generation Utility, also called a schema dump tool. It is a simple utility for the dumping the DDL of a user-specified database to either a console or to a file. The generated DDL can then be used for such things as recreating all or parts of a database, viewing a subset of a database's objects (for example, those which pertain to specific tables and schemas), or documenting a database's schema.

Additional Documentation



Conclusion

The two products are fairly comparable it terms of their feature sets. The tooling support is again quite similar, though HSQLDB provides a GUI based management tool. Both products have their own quirks, which requires some getting used to. In my opinion either of these products will work well for a typical single-user, Swing based desktop application.

Apache Derby has a big community of developers that includes people from Sun and IBM, which makes it backed by 2 major companies that are already major open source contributors. Sun has adopted Apache Derby as its 100% Java DB, and included it in the Java SE 6 (Mustang) JDK. Sun is also using Java DB for some of its own products and other open source projects that it is heavily involved in. Customers can now purchase Sun Software Service Plans for Java DB including two levels of support — Premium (around-the-clock) and Standard (extended business hours).

So, if support, paid or community-based, is a big concern then Apache Derby definitely stands out.



Sunday, October 7, 2007

Integration Technologies in Practice

The following is based on what I have seen and experienced.

Point-to-Point using Sockets
  • Generally seen with legacy systems that have been in existence for a while.
  • Typically these communicate over an ASCII text based protocol
  • Over time you end up with a plethora of such brittle point-to-point interfaces, which are tricky to manage and to modify. (Integration Spaghetti)
  • There also exist variations of interfaces that are quite similar in nature and in intent.
  • Each interface builds it own failure recovery mechanism
  • Some of these interfaces are synchronous in design but are functionally asynchronous.

RMI or RMI/IIOP
  • Suitable when both the Service Provider and the Consumers are implemented in Java
  • RMI being a binary protocol makes it easier to transport large amounts of data
  • I wouldn't use it for large-scale deployments, as it gets tricky to manage resources like RMI Service Threads
  • It requires a registry, to access a service reference. There is another way of accessing RMI service, without using a registry, by using reference objects. In my view, going through the registry is the easier route
  • Firewall issues around ports being enabled. RMI over HTTP attempts to work around that issue
  • RMI activation framework helps increase availability of RMI Services, and helps better manage resources on the server

Object Serialization over HTTP
  • Service providers and consumers interact by serializing binary objects over HTTP.
  • Leverages the scalability and availability of standard HTTP servers like Apache.
  • Leverages the ubiquity and firewall-friendly nature of the HTTP protocol
  • Suitable when both the Service Provider and the Consumers are implemented in Java
  • Being a binary protocol makes it easier to transport large amounts of data

Integration using MOM
  • Publish/Subscribe
    • Pub/Sub is a highly advertised feature of MOMs.
    • In my view, this model fits in a world where the subscribers are highly dynamic.
    • If you have a high number of subscribers, pay careful attention to how messages a published over the network. Multicast reduces the latency in publishing, and has a lower network impact.
  • Point-to-Point
    • Things to watch out for are: Queue Depth, Sequence of Messages, Message Correlation, Duration in Queue and Message Expiration.
    • Pro-active monitoring and management of the dead-letter queue.

Web Services
  • These are typically accessed over HTTP, which is a synchronous protocol. This introduces Availability Coupling between the Service Provider and the Consumers.
  • The better way to communicate is by exchanging documents.
  • Stick to industry standard XML schemas
  • Authentication and Authorization is a tricky problem to solve
  • There is some value in attempting to solve the Service Versioning issue, by following the same approach as that for Component Versioning
  • The plethora of WS standards reminds me what happened with CORBA. In the end, only a handful of standards remained standing, and the vendors ended up implementing only those standards.
  • Then you have the whole set of arguments around REST vs. Web Services.

Enterprise Service Bus (ESB)
  • It is an attempt to emulate (conceptually) within the middleware world, what the computer hardware vendors ended up doing, to solve their integration issues.
  • The typical value-add of such products are: Protocol Transformation, Process Orchestration/Choreography, Monitoring & Management, Service Versioning, SLA management, etc

Sunday, June 3, 2007

Calculating the Value Added by a Good Development Framework

In one of my previous blogs, I had expressed my opinion on frameworks and data mapping tools. I had highlighted some of the benefits of using them, which are:
  • Allow for cleaner separation of layers
  • Automatically apply certain well established design patterns
  • Externally configurable
  • Structured to support plug-and-play of components
For some time now, I have been thinking about ways to calculate the value added by good development frameworks like Spring, Struts, etc. The way I look at it, every framework comes with its associated costs and benefits. Seeking out value is just a matter of being able to quantify these costs and benefits. Now that can be a challenge in itself.

From what I have seen, frameworks come with the following types of costs and benefits.

Costs
  1. Costs associated with the learning curve. These can be recurring as more resources are ramped onto a project or as resources are periodically cycled through maintaining a product
  2. Costs associated with conflicting life-cycles of the product being developed and the development framework. These involve either upgrading the framework to its newer version or replacing it with another framework
Benefits
  1. Savings on account of increased productivity
  2. Savings on account of decreased number of bugs. There is a direct correlation between the number of lines of code written and the number of associated bugs
  3. Savings on account of increased product life, mainly attributed to slower degradation of the core product architecture over time
  4. Savings on account of quicker time to market

Monday, May 28, 2007

The Long-Tail Effect


You can always tell the cohesiveness of the architectural vision of a product, by analyzing its architecture. From what I have seen, a lot of products suffer from what I describe as the Long-Tail Effect. And the longer the product has been out in the market, the longer the length of this tail.

The long-tail is a graphical depiction of how deeply steeped a product is in its legacy past. In my view, this can be primarily attributed to a lack of a cohesive vision over time.

It is important to understand this aspect of product evaluation, especially if you are considering hanging on to your acquisition for a long period of time. You have to find the right balance between product maturity and its architectural cohesiveness. In my view, the long-tail ends up negatively skewing the Total Cost of Ownership (TCO). The more steeped a product in its past, i.e., the longer the tail, the more the effort that is required to maintain the product over time.

Sunday, May 27, 2007

Aligning Architecture with Business Strategy

Over time, I have come to realize that architecture is not purely Information Technology (IT) functionality, in the traditional sense of viewing IT. It is strategic in intent; hence it needs to draw directly from business strategy. The other thing that I have realized is that architecture ends up lasting longer than business strategy. The fact that decades old legacy systems like Mainframe/COBOL, etc are still deeply entrenched within organizations supports that claim.

Business strategy, on the other hand, continually evolves over time, in response to the changing competitive landscape. If architectural efforts are not properly tuned with the business strategy, architecture ends up constraining instead of enabling the achievement of business strategy. Once again, taking the example of Mainframe/COBOL, such systems are geared towards batch processing of information. Going forward, if the organization wants to start monitoring its activities in real to near-real time, it will be constrained by the capabilities of its existing Mainframe/COBOL systems.

My understanding is that the following should occur, to ensure the proper alignment of architectural efforts with business strategy.
  1. Architects should participate in the business strategy process
  2. Architects should have their interpretation of business strategy, validated by business strategists
  3. Architects should focus on tackling only those challenges that directly affect the implementation of the business strategy
  4. Architects should continually assess the relevance of their initiatives against the business strategy

Saturday, May 19, 2007

Frank Gehry - Pritzker Prize Winning Architect

Software Architecture is a relatively new field when compared to Building Architecture. Needless to say, software architects frequently look at the field of Building Architecture, to seek inspiration and to draw analogies. One of the notable building architects that I have been impressed with is Frank Gehry. It is funny the way that I first became aware of him. I was watching Arthur, a show on public television, with my son. There is this episode where Frank Gehry helps Arthur and his friends design a new tree house. Their tree house gets destroyed, and Arthur and his friends start debating over what to do. They run into Frank Gehry at the local ice cream shop, who then asks each one of them to come up with the new design ideas for their new tree house. The kids go about doing that and then present their design ideas to Frank. Some of the ideas are quite out there, which get ridiculed by the group. This is where Frank makes a statement that really stuck to me. He says, "Who says that a building has to look like a box?".

If you look at some of his work, like the Pritzker pavilion, Dancing house in Prague, Guggenheim Museum, Chiat/Day Building, you would really understand what he means by that statement. His work makes you drop all preconceived notions of what a building should look like. I think that this is equally relevant in the area of software architecture. On top of that Frank has a reputation of completing his projects on time and in budget. This can be attributed to his approach that states:
  1. Prevent political and business interests from interfering with design, to arrive at an outcome that is as close as possible to the original design
  2. Get a detailed and realistic cost estimate before proceeding
  3. Maintain close relationship with the implementers

What is Enterprise Architecture?

The last few years, I have been predominantly working on large scale projects. These projects have a high impact either on account of their scope (enterprise wide) or on account of their cost. Previously, the area of Enterprise Architecture (EA) had been somewhat of a mystery to me. Even though I was dealing with some of the aspects that were related to EA, I wasn't clear on what it really meant. On top of that, there was that ongoing debate on SOA vs EA, which still continues. So, I started looking at the various frameworks like Zachman and TOGAF, which definitely helped put things into perspective.

In that same spirit of inquisitiveness, last week (May 15-18, 2007), I attended the Enterprise Architecture Workshop by Bredemeyer Consulting. My biggest takeaways from that experience were:
  1. Enterprise Architecture is nothing else but Architecture done at the enterprise level.
  2. There will always be challenges around you. Focus on tackling only those challenges that directly affect the implementation of the business strategy.
  3. There is no such thing as too low a level. If delving to the lowest level of detail is important for the success of your initiative; Do it.

Saturday, May 5, 2007

Role of Architecture in these times of Outsourcing

Outsourcing in the IT world is a reality. Given that the IT department has always been viewed as a cost center, more and more companies are looking at outsourcing, to contain and to better manage IT costs.

In my view, Architecture is a strategic capability. There is tremendous risk in outsourcing this capability. Outsourcing a strategic capability will deplete the associated knowledge within an organization. It will impact an organization's ability to leverage its strategic capabilities to quickly react to new opportunities. If the Architecture capability does not exist within an organization or if it is not mature, work with the vendor(s) to develop this capability.

Also, outsourcing contracts are never open-ended. They are for a specific duration. If SLAs are not met, organizations have the option to not renew the contracts. Given that, it only makes sense to retain strategic capabilities and to leverage them in contract negotiations.

The following paragraph is from an article by IBM that talks about the benefits and the risks associated with outsourcing.

"The IT experts brought technology standardization and disciplined project methodologies to their clients. Indeed, all the firms valued learning from their vendors about standard technology components and project methodology. But they also noted that, while vendors were learning about their business, vendors could never know their business as well as their own people. Thus, the firms needed to retain—or develop—a competency in applying technology to meet strategic goals."

Sunday, March 11, 2007

Can Service-Oriented Architecture fit together with Enterprise Architecture?

In one of my previous blogs, I had put together my thoughts on the inter-relatedness between Enterprise Architecture (EA) and Service-Oriented Architecture (SOA). I posted a similar question on LinkedIn, and I was surprised by the number of responses that I received. One of the responses was from James McGovern. Anyway, the question was, " Can service-oriented architecture fit together with enterprise architecture?".

From the answers that I received, there seemed to be a general sense of optimism on this topic of inter-relatedness between SOA and EA. Generally everybody seemed to agree that SOA and EA could fit together though they had different approaches to actually go about doing it. What was noteworthy was the variety of definitions of SOA. SOA was defined as:

  • An architectural style which can be used within an Enterprise Architecture to organize business or technology components.
  • An instance of an enterprise architecture
  • SOA brings a paradigm shift in looking at the Enterprise Architecture.
  • SOA being used as a band-aid to bring together older systems...Some folks seem to be so focused on building for SOA that they surpass solutions that may have been better/easier/less costly.
  • SOA is a concept / pattern that can be used for the integration of dissimilar systems within an enterprise.
Even though it wasn't explicitly stated in the answers but there seemed to be common understanding of the definition of EA. That does make sense as EA has been around for quite some time now. The best diagram that I have seen so far, on this topic of inter-relatedness, is in one of the presentations by Ken Orr. It is the slide titled "The Merging of Concerns" (page 8).

Sunday, January 7, 2007

Typical Architectural Deliverables

  1. Key architectural decisions document that captures the key decisions that were made, and the other alternatives that were considered. To download a template click here or go to http://bredemeyer.com/papers.htm and look for "Key Architecture Decisions Template".
  2. Architecture document that captures the following:
  • Architecture strategy as defined by the key principles, concepts, style, etc
  • Conceptual architecture; modeling the areas that are architecturally significant.
  • Architecture risk analysis
  • Logical architecture of the architecturally significant areas, focusing on interface definition, and relationships amongst the entities. This also establishes the various layers and tiers.
  • Define the various domain objects entering into and flowing through and out of the system.
  • Analysis of the non-functional requirements, to derive the key characteristics of the run-time environment.
  • Run-time view of the system

What it take to manifest an Architecture Plan

  1. Buy in from top management.
  2. A good understanding of the history, the current context, and the future direction of the solution space.
  3. A good understanding of the existing business processes, and the willingness to change these processes.
  4. A good understanding of the existing technical architecture.
  5. A clearly defined architecture strategy.
  6. A collaborative environment, where the team members are empowered to contribute. This not only helps in maturing the architecture but also helps in getting the implementers on-board with the architecture strategy.
  7. A clearly defined architecture change management process.

Saturday, January 6, 2007

Inputs to consider when designing an architecture

  1. The history, the current context, and the future direction of the solution space.
  2. A good understanding of the existing business capabilities of the organization in terms of people, processes, and technology. Along with that, a good understanding of the organization’s willingness, and its ability to accept change.
  3. Results of the various proofs of concepts.
  4. Inputs from different subject matter experts.
  5. A good enough understanding of the business requirements and a thorough understanding of the various non-functional requirements.
  6. Industry trends and standards.

Aspects to consider when performing a technology evaluation

  1. The goodness of the fit of the product within the existing business capabilities (people, processes, and technology) of the organization
  2. Evaluate the product against the expected or agreed upon non-functional requirements like availability, scalability, extensibility, robustness, etc.
  3. The support structure provided by the vendor's organization.
  4. The degree to which the product is compliant with open industry standards.
  5. Popularity of the development platform
  6. Hardware and software platform flexibility
  7. Complexity on account of dependency of the product on other 3rd party products, and the associated licensing costs.
  8. The long-tail effect

The Architecture Process

Here are the steps that I take, that help me define the architecture for any solution. Previously, I had attended a Software Architecture Workshop by Bredemeyer consulting, which helped me validate my ideas and helped me better structure the process.


  1. Understand the history, the current context, and the future direction of the solution space. This helps me better understand the motivation behind why a solution is required.
  2. Understand the available set of requirements and validate them against the ideas gathered in step 1.
  3. Once I have a good understanding of steps 1&2, I establish the architectural strategy by defining the key principles, concepts, style, etc that would guide the architecture. These aspects of the strategy are then validated through brainstorming sessions and prototypes.
  4. Identify the functional requirements that are architecturally significant. This helps me focus my attention on the key areas of the solution. The other thing that I have realized from experience is that it helps in the long-term manageability of the solution, if the structural approach were consistent across both architecturally significant and non-significant requirements.
  5. Identify the non-functional requirements. I usually add this information to the corresponding use cases, and then trace them all the way down to the design documents.
  6. Develop a conceptual architecture, to identify the various systems and sub-systems, their responsibilities, and their collaborations. This is again validated either through prototypes or through brainstorming sessions.
  7. Define the logical architecture, to come up with the various interface specifications, and validate them.
  8. Have the team build a reference implementation of the architecture.
  9. Analyze the non-functional requirements, to derive the key characteristics of the run-time environment. This would relate to high availability, failover, load-balancing, number of processes, etc.
  10. Incorporate any feedback from the development team, into the architecture. These set of activities fall under architecture change management.

Friday, January 5, 2007

How to manage expectations in “politically charged” situations?

In my view, the best way to work in a politically charged environment is to first have a good understanding of the different perspectives, identify what would influence those perspectives, and build relationships to influence a change. All the time, maintaining an objective standpoint by sticking to hard facts.