Latest Entries

Sequences and triggers in Derby DB

Context

For our unit testing (and data access classes in particular) we use an in-memory Derby Database and auto-map JPA to database objects (DDL). Using EclipseLink this is achieved by setting the property eclipselink.ddl-generation to drop-and-create-tables in Derby DB startup properties. This allows you to test JPA mappings without having to struggle with the difficulties of setting up, maintaining and cleaning up (in our case) an Oracle database. While for the most part the DDL-generation has proven sufficient to test our application logic, not all DDL can be inferred by JPA mappings.

Problem

JPA does not support using sequences or auto generated fields on anything other than primary keys.  In our case we wanted to use a sequence on a non-primary key. More specifically, we want to store a modification counter that increments each time when any of our entities is updated and use this to calculate the delta in a synchronization job.

Solution

To solve the lack of JPA support for sequences on non-primary keys we created our own sequence and a BEFORE INSERT OR UPDATE trigger in Oracle to inject the next sequence value in the modification counter field before each insert or update. This caused Derby DB to misrepresent actual application behavior in our unit tests because we were now using non-JPA to set application state. As a result we had to mimic the sequence and trigger behavior in Derby DB as well. Doing so turned out to be somewhat difficult because the Derby documentation section on triggers is limited. The example given in the documentation is as follows:
CREATE TRIGGER trig1 AFTER UPDATE ON flights REFERENCING OLD AS UPDATEDROW FOR EACH ROW MODE DB2SQL INSERT INTO flights_history VALUES (UPDATEDROW.FLIGHT_ID, UPDATEDROW.SEGMENT_NUMBER, UPDATEDROW.ORIG_AIRPORT, UPDATEDROW.DEPART_TIME, UPDATED ROW.DEST_AIRPORT, UPDATEDROW.ARRIVE_TIME, UPDATEDROW.MEAL, UPDATEDROW.FLYING_TIME, UPDATEDROW.MILES, UPDATEDROW.AIRCRAFT,'INSERTED FROM trig1');
After struggling with syntax errors for a long time while creating a BEFORE INSERT in Derby DB (and failing to find any documentation on how to do this), I got the suspicion Derby DB doesn't support this use case.

As a workaround create an AFTER INSERT trigger that updates the modification counter in Derby DB like this:
Because entity state is now determined by database logic we do not have the modification counter value available within the application. Normally when using a generated value in JPA the ORM will automatically fetch this for you and update your managed entity. Now you will have to refresh your entity after persisting it:

Multitenant or not?

Having worked on a couple of multi-tenant projects I thought I'd share my experiences with them.

For those unfamiliar with multitenant systems; the idea is to have a single production platform servicing multiple clients (tenants) at the same time rather than setting up a new production platform for each one of them. Mulit-tenancy promises better economies of scale and simplify deployment at the expense of additional development complexity. When the application is being developed to run on (expensive) application servers and databases it makes sense to utilize them fully before scaling out. Not having to deploy your application for every customer will save money.

The trade-off is development complexity. Here is a list of challenges you might face when implementing multi-tenancy:

  • Database design: characteristic of multitenant systems are their database designs. There are roughly four different approaches, each with its pros and cons:
    • Separate databases: use physically separated databases for different tenants. Connections are made to the different databases based on the tenant identifier. This configuration maximimizes data security, is fairly easy to setup but is also the most expensive solution for obvious reasons.
    • Separate schemas: the database has one schema for each tenant on a single database instance. Per tenant different connections are made with -usually- different credentials. Data is secured separately from other tenants and only a single DB instance has to be maintained. Some notable cons are: database schema changes become a hassle (you'll need to upgrade multiple schemas) and for each tenant a new schema must be provisioned (some databases advice against scaling out by creating more schemas because this tends to introduce a lot of overhead).
    • Separate tables within a single schema: this option simplifies your database connection management -only one set of credentials required- and separates data within that schema fordifferent tenants. This option provides little in the way of data security and database management becomes more of a burden when many tenants are active at the same time. Partitioning or sharding the data will be important and a clean table/collection naming convention is required to keep data management overhead in check. If the system is not able to provision it's own new tables or collections for each new tenant some additional configuration is required within the database for each tenant.
    • Single table: stores all tenant data within a single table and separates this with an additional tenant discriminator field. This setup is the least secure of all four (a program error could easily cause access to data of other tenants!) but also requires the least amount of configuration for each new tenant. Some databases support data hiding on a row level (for example, PostgreSQL) allowing you to hide data based on the tenant discriminator. The database will automatically append to each query "WHERE <discriminator_column> =  '<tenant_discriminator_value>'" where the discriminator column is one of the columns in the target table. This hides other tenant's data from your current session. When you're using an ORM look  if that ORM supports "Row Level Security (RLS)" or "Virtual Private Database (VPD)", it can make your life a lot easier.
  • Security: security is tricky at the best of times but multi-tenancy takes it to the next level. A mature multi-tenant system ensures no state of any tenant is accessible to any other tenant. This involves not only your database but also memory, caches, sessions, queuing and any other statefull resource or communication mechanism your application may have. Consider something as simple as triggering an event in an eventing system; you only want to notify subscribers of the same tenant rather than all subscribers of all tenants. To deal with these problems try to create a tenant context (or "tenant-sandbox") in the codebase that is difficult to break out of; meaning all resources accessible to the developer are guaranteed to be filtered to the current tenant context as opposed to requiring the developer to filter the resource manually. You can achieve this by wrapping the resource in your own framework and forbidding/preventing further direct access.
  • Deployment: upgrading a mission critical production platform becomes a whole lot more difficult if it involves a multi-tenant system. True, the deployment team only has to do the upgrade once for all tenants. However, the difficulty lies not in the technical details but with those affected by the upgrade; your customers. The upgrade will affect all tenants, whether they wanted the upgrade or not. This can be a difficult sell, particulary if backwards incompatibilities are introduced. In addition a multi-tenant deployment is likely to receive a whole lot more traffic than a single-tenant system; coordinating downtime between tenants (assuming you can't do a live upgrade) is painful and, depending on the nature of the system, might even be impossible. You'll need a good deployment strategy to cover this.
  • Versioning: apart from the security and database complexity you should also assume you can't do backwards incompatible changes and have to support old versions for very long, growing the codebase over time. If you don't organize this growth properly, regardless of the nice initial architecture design, it will end up as a big ball of mud. Version your code either in namespaces or directories to keep different versions of the code separated.
  • Logging: easily overlooked, but consider adding multi-tenancy to your log files either by writing log output to different files for each tenant or adding tenant identifiers to your log entries. If you're writing logs to a database add an additional tenant identifier in there as well. This makes troubleshooting customer-specific issues on a production platform a whole lot easier.
  • Testing: testing a multitenant system is significantly more complex. The security complexities require testing to cover a lot more scenarios than usual. Try to create automated tests than can be configured to run for different tenants, preferably at the same time and in parallel to reproduce a real-world scenario. Because resources (particulary caches) cannot always be shared in a multi-tenant environment for security reasons they may fill up a lot sooner than you'd expect.

Before starting a multitenant system do consider the development complexities involved. Multitenant systems make sense when buying expensive licenses for your application servers and databases, but they make much less sense when you're able to provision the system in the cloud or on cheap open source platforms.

The IKEA Programmer

Recently I stumbled on an excellent marketing paper titled "The IKEA Effect: When Labor Leads to Love" of Micheal I. Norton, Daniel Mochon, and Dan Ariely.

"When Labor Leads to Love".

Micheal et al. posit the theory people are inclined to overvalue their work if:

  • ... the labor was succesfull.
  • ... there is an effort/result imabalance. Even if the labor was gruesome and the result terrible; we humans tend to justify our effort psychologically. If there is a gap between this effort and the (minor) result we achieved with it, we tend to psychologically justify this gap by simply taking pride in our work, no matter the result. 

The authors explain this "IKEA effect" as coming to value your hard work despite the (objectively) dreadful result. The longer it will take you to assemble the parts of your new closet the more product pride you experience after you've finished it.

The Swedes have already given their name to the Stockholm Syndrome, it only seems fitting one of their best known brands is associated with being a psychological hostage as well.

I've experienced the IKEA effect quite a few times. It took me years to realize the PHP framework I had ported from PHP 4 (a more or less procedural language) to PHP 5 (an object oriented one) was fundamentally flawed. It took months, most of my free evenings and daily headaches to port the thing. And it worked!

I ditched it two years later.

The most damaging about the IKEA effect is it renders you blind for better alternatives.
Which reminded me of the following tweet:

"Component strategy?: use what's out there; realise it sucks; write your own; wait until others suck less; dump your own; use other." - Fabian Potencier

REST and JSON best practices

For a long time I've been struggling with REST API conventions. REST was -and is- a difficult concept which has been poorly understood for a long time. Nowadays there are lots of excellent style guides. Below I've gathered some guidelines and best practices from various sources.

In all guidelines I've assumed the REST service returns JSON because it is the most common format these days. Most guidelines should work for other formats (XML in particular) as well.

HTTP methods

  • The collection must be in plural, /users/bob... instead of /user/bob...
  • If you need to do something that may yield different results if you call it multiple times after each other (i.e. the service is not idempotent) use POST.
  • GET and DELETE do what you think they do.
  • POST and PUT might not do what you think they do. POST and PUT can both be used to create and update depending on the circumstances.
  • Use POST when creating a new resource and the server determines the resource identifier.
  • Use POST if you run queries with very large inputs (only as a last resort because of limited URI length).
  • Use PUT when creating a new resource and the client determines the resource identifier/URI. For example, when the business key is a UUID set by the client, use PUT to create the resource.
  • Use PUT if you want to overwrite an existing resource (e.g. update all properties of /users/bob).
  • Use PATCH if you want to partially update an existing resource (e.g. only update the first and last name of a user entity). While PATCH is supported fairly well, it is not a formal HTTP standard and should be avoided if service consumers are not able to make PATCH requests.
  • Use POST if you want to partially update an existing resource (e.g. change the password of /users/bob but leave the rest intact) 
  • If the client is limited to making POST and GET requests add an additional query parameter to the url, for example /users/bob?method=DELETE. Be warned that this breaks with every principle of HTTP! A perceived safe and idempotent GET request would no longer be safe nor idempotent. Software that is HTTP-aware may no longer work propery, for example caching. Best to avoid this practice at all costs.
  • Avoid custom HTTP headers if they are important for your request; include them in the request body instead to keep a clean interface. If you do decide to include custom HTTP headers do no longer use "X-" as a prefix (e.g. "X-Powered-By"), the "X-" is a convention that is now deprecated.

HTTP result codes

The following list contains the more relevant REST-service HTTP status codes.
A common mistake is to return a 200 OK when something went wrong. REST services ought to utilize what HTTP has on offer. 
  • 200 OK: when a request was processed successfully, for example when a partial update succeeded.
  • 201 CREATED: when creating a new resource instance using POST succeeded. Send the new resource representation in the body.
  • 202 ACCEPTED: when a POST or DELETE initiated an asynchronous request and it was accepted. The response body should contain the current status of the asynchronous job and provide a link to track the progress of the request. Make a clear distinction between the asynchronous job and the resource that is being created as a result of that job. The job itself was accepted and should return a 200 OK result with status of the asynchronous job (for example, running, deleted or failed). If the job was successfull and produced a new resource return 303 SEE OTHER with a Location header containing the new resource url.
  • 204 NO CONTENT:  when the request was processed successfully but no response body is available.
  • 400 BAD REQUEST: when a syntactical error occurred, for example when a validation constraint failed.
  • 401 UNAUTHORIZED: when user is not authorized and authenticated. When user is authenticated but just not authorized issue a 403 FORBIDDEN.
  • 403 FORBIDDEN: when the user is authenticated but has insufficient priviliges to access the resource.
  • 404 NOT FOUND: when a resource was not found.
  • 405 NOT ALLOWED: when a HTTP method is not allowed, for example when a GET request is received but only POST is returned. The standard says a "Allow-header" MUST be present when issueing a 405, e.g.: Allow: GET, HEAD, PUT.
  • 406 NOT ACCEPTABLE: when the client sends a list of Accept-Encoding values and none of the encodings are available for the requested resource return 406.
  • 409 CONFLICT: when a resource already exists (i.e. the business key exists). The response body should describe how to resolve the conflict.
  • 410 GONE: when the resource no longer exists. Use 404 NOT FOUND when you don't keep track of removed resources.
  • 412 PRECONDITION FAILED: when a PUT requests contains If-Unmodified-Since or If-Match HTTP headers do not match the ETag values on the server. These tags are used to implement concurrency control for PUT requests.
  • 413 REQUEST ENTITY TOO LARGE: when the body of a POST or PUT request is too large.
  • 415 UNSUPPORTED MEDIA TYPE: when a client sent a message body in a format the service does not understand.
  • 500 INTERNAL SERVER ERROR: when an exception occurred on the server. This is a good "catch-all" or "fallback" error code when an uncaught exception occurred.
  • 503 SERVICE UNAVAILABLE: when the server cannot fulfill a request for some time. Add a Retry-After HTTP header if possible.

URL format

  • Keep the base API url short, for example http://api.example.com.
  • Keep urls lowercase, avoid camelCase (even for slugs!) and use hyphens in your slugs rather than underscores. Treating /users/bob and /users/Bob as two different instances is confusing (and most likely incorrect as well).
  • Use nouns in urls, not verbs.
  • Pagination: /users?limit=20&offset=60
  • Search:  /search?q=my+search 
  • Ordering:  /posts?sortByAsc=date and /posts?sortByDesc=date
  • Search within a specific collection: /users/search?q=bob.
  • Versioning: use /v1/users (an alternative is to change the media type but I advise against this; it is too obscure, you would be able to use the same url for two or more different representations)
  • Request partial result: /users/bob?fields=firstName,lastName
  • Avoid communicating minor version numbers (like 1.1), minor versions must be backwards compatible and can be published without worrying about consumer compatibility. Only backwards incompatibile changes warrent releasing a new version in the url of your service (also, limiting the number of supported versions reduces the maintenance burden).
  • Don't expose surrogate keys like generated sequence numbers in the interface, use a slug or some other generated string.

Response body

  • Write JSON in camelCase, never snake_case. JSON is most likely consumed by front-end JavaScript and JSON originated from JavaScript; stick to the JavaScript convention.
  • Send as much data as possible about a resource instance, even if the consumer is unlikely to consume it all. Only hide data properties that are security sensitive or used internally only (for example, a surrogate key).
  • Avoid unnecessary data envelopes. The client knows he is requisting a user when the url is /users/bob, wrapping it like { "user": { "firstName": "Bob" } } is meaningless. 
  • Forget about UNIX timestamps, use ISO8601 timestamps without exception in your service protocol. UNIX timestamps do not support timezones nor milliseconds.
  • Every resource instance in a service response must contain a href attribute containing its location. The developer consuming your service should get accustomed to these href attributes as much as possible so he feels comfortable using them to link to other resources.
  • Prefer linking resources to embedding resources unless the linked resource is almost always required while fetching the main resource. This is context-dependent. For example, a GET /users/bob might return the following:
        {
            "firstName": "Bob",
            "lastName": "Marley",
            "address": {
                "href": "http://example.com/users/bob/address"
            }
        }
    When accessing the address is fairly uncommon this response works well for service consumers; the client has sufficient details to retrieve the address if it needs to. However, if in say 80% of requests the consumer also requests the user's address you can lower the number of network requests by embedding the resource within the GET /users/bob response body. 
  • When a resource is available under under two different urls (a scenario common when  embedding resources) consider using partial result queries instead (e.g. GET /users/bob?fields=firstName,lastLame. Having two different urls for the same resource may improperly invalidate a HTTP cache causing two different representations of the same resource to exist at the same time. 
  • Be descriptive in error messages and include a code that can be interpreted by a machine. This is useful for machine interpretation, and internationalization in particular.
        {
            "status": 401,
            "message": "Insufficient priviliges",
            "code": 1205,
            "more_info": "http://www.someresult.com/docs/errors/1205"
        }
    

JSON/REST API documentation *

  • Mention title, url, method, url parameters (if any), data parameters (if any), success response, error responses and a request sample.
  • For each url and data paramter state whether the argument is optional or required.
  • For each url and data parameter state which other constraints apply (formatting, relationships, etc)
  • For each success and error response message state which HTTP status accompanies it.

* Most JSON/REST API style guides stress you should put in the effort to make understandable error response messages. I tend to disagree; a custom error code has sufficed in many applications for decades, why not for JSON/REST services? This avoids the issue of system error message internationalization and you need proper JSON/REST API documentation anyway because JSON/REST lacks a schema definition. Without documentation any service constraints will be a matter of development trial & error (which makes for an unhappy programmer).

Some sources I've bookmarked about these subjects:

@PostConstruct swallows exceptions; catch and log them instead

I was having difficulty analyzing a non-working web service that I had deployed on WebLogic 10.3.6. No error showed up in the logs and since the webservice was implemented using an in-only message exchange pattern (fire & forget) I didn't have a response message I could verify. After a more careful look in the logs I noticed the logging completely stopped after the web service invoked an EJB. Unable to debug any further I changed the in-only MEP to an in-out MEP (request & response) and noticed the following error in the response message:

"ExceptionEJB Exception: com.oracle.pitchfork.interfaces.LifecycleCallbackException: Failure to invoke private void ... on bean class  ... with args: []"

At some point I figured out it had to do with the @PostConstruct of one of my EJB's not working properly. As it happens an exception thrown while executing @PostConstruct is swallowed by WebLogic. A quick  search showed I am not the only one having this problem and similar behavior is observed in JBoss and GlassFish as well.

As a solution I've started wrapping my initialization code within a @PostConstruct in a try-catch block, logging any exception that occurs and rethrow the exception. Because I don't know what type of throwable I'm catching and the exception is lost anyway I rethrow the original exception wrapped in a RuntimeException. Checked exceptions are not allowed in methods annotated with @PostConstruct.

Example:

@Stateless
public class MyEJB implements MyEJBLocal {

    private static final Logger LOGGER = LoggerFactory.getLogger(MyEJB.class);

    @PostConstruct
    private void init() {
        try {
            // -- do stuff
        } catch (Throwable e) {
            LOGGER.error("Error occurred during initialization of MyEJB", e);
            throw new RuntimeException(e);
        }
    }
}

Oracle DB session analysis

During development I experienced some troubles with blocking sessions in Oracle DB. Not having much experience with the gv$sess and g$sess views in Oracle it took me quite some time to figure out which session was causing the problem.

I made a query showing the more interesting session details and metrics:
- Session details like machine, user, sid, ...
- Blocking status
- Number of hours running
- Average hourly IO count (our DBA is always pointing to limiting our IO, but I've never actually measured it)
- Average hourly processing time in milliseconds
- Program memory usage (PGA only)

(tested on Oracle Database 11g Enterprise Edition Release 11.2.0.3.0) 

Installing deployable libraries (e.g. JSF/JSLT) on weblogic 10.3

Weblogic comes with a set of deployable libraries available in  {WL_HOME}/wlserver_10.3/common/deployable-libraries that can easily be deployed as shared libraries. The deployable libraries available on my WLS 10.3.5 installation are:

  • active-cache-1.0.jar          
  • jsf-1.2.war            
  • jstl-1.2.war
  • jackson-core-asl-1.1.1.war
  • jsf-2.0.war rome-1.0.war
  • jackson-jaxrs-1.1.1.war      
  • jsf-myfaces-1.1.7.war  
  • toplink-grid-1.0.jar
  • jackson-mapper-asl-1.1.1.war  
  • jsf-ri-1.1.1.war       
  • weblogic-sca-1.1.war
  • jersey-bundle-1.1.5.1.war    
  • jsr311-api-1.1.1.war
  • jettison-1.1.war              
  • jstl-1.1.2.war

Because I prefer my EAR, WAR and EJB archives to be self-contained I've experimented little with these shared libraries thus far. However I ran into a problem where my @EJB wouldn't load in a managed bean. The cause was the following startup error:

<Jan 3, 2013 11:27:42 PM CET> <Error> <javax.enterprise.resource.webcontainer.jsf.application> <BEA-000000> <JSF1030: The specified InjectionProvider implementation 'com.bea.faces.WeblogicInjectionProvider' cannot be loaded.> 

The solution was to install the shared JSF 2.0 weblogic deployable and update my weblogic.xml like this:


This made my application use the shared library instead of using its own JSF bundled library. Not sure why this made any difference at all, possibly a versioning problem, but haven't found out since and it has been running without issues.

To install and configure the JSF 2.0 Weblogic deployable archive have a look here:
http://docs.oracle.com/cd/E21764_01/web.1111/e13712/configurejsfandjtsl.htm

The idea is to use weblogic.Deployer located in {WL_HOME}wlserver_10.3/server/lib/weblogic.jar. Because I had some problems getting the weblogic.jar in my classpath I included it directly in the java command like this:

java -cp /u01/app/oracle/Middleware/wlserver_10.3/server/lib/weblogic.jar weblogic.Deployer -adminurl t3://localhost:7001 -user weblogic -password Welcome1 -deploy -library /u01/app/oracle/Middleware/wlserver_10.3/common/deployable-libraries/jsf-2.0.war
 

RSS Feed. This blog is proudly powered by Blogger and uses Modern Clix, a theme by Rodrigo Galindez.