Create your own Telegram bot with Django on Heroku – Part 9 – Creating a model for your messages

This entry is part 9 of 9 in the series Create your own Telegram bot with Django on Heroku

Django_Pony

In the previous part of this series, I explained what a database is good for in general and Django in special. I also told about what relational database systems (RDBS) are supported by Django, what migrations and models are and how to create and apply them. Further, I introduced and explained what the Django Admin-Backend is and how to use it to create, alter or delete data in tables resulting from having applied the migrations to SQL databases from a model definition.

Today, we will create another database model to hold the message-data forwarded to our webhook by the Telegram – bot in the future. I will try my best to make this a play-along part which invites everyone to follow step by step in another console. Hopefully, it gives you an idea what thoughts and considerations are involved in writing a model for a real-world problem and how to involve Django’s documentation resources.

Er… what did we want to do again?

In case you lost track: Incredible 8 articles ago in Part 1 of this series, I described this article series scope. In case you forgot in the meantime or just want to give it a fresh up in general, please pause here and do so now by reading that part once more.

In Part 3, we investigated the JSON data structure of messages the Telegram bot sends to webhooks. We will work with this data structures in this article, so there will be a natural repetition I guess and we will also explain the details of the fields inside that data structure. But we won’t repeat the general description of it. If you want to know more about this, please revisit that article once more as well.

Preparational thoughts

First, let’s remember what we want to achieve:
We want every registered user to be able to send text messages to our bot. We do not need other message types, like photos, audio or whatever. Messages of a different type/content or from foreign users, we want to not progress or store it any further than dropping it upon receiving.
Valid messages should be stored in our database. These messages should be processed and analyzed for containing some pre-defined pattern (like: “the first non-whitespaced part must be a positive or negative float, followed by any string“).

Let’s look at a typical Telegram message again:

  • Each message data structure contains two keys:  update_id , which contains an integer and  message , which contains everything else.
  • update_id is an incrementing number which identifies an “update” uniquely.  That means: This can be used to clearly identify any message and if this was already received or not.
    Attention: It might happen that messages reach your bot multiple times (for example, if Telegram could not clearly confirm the message was delivered). It might also happen, that messages arrive at your hook in any order. The later is the reason why dropping messages of a lower  update_id than the “highest” which is already in your database might lead to lost messages. Better do not implement a logic like that. ⚠
  • While update_id is just a meta information, which helps us to determine if we still need to process that update or not, message contains multiple data elements we want to receive, store and process. It consists of 5 keys:
    • text is somewhat obvious: This contains the text of the message.
    • message_id contains an integer value, which is unique for the scope of this chat.
    • date is also an integer. It marks the date this message was received; more on this in a minute.
    • The two remaining structures  from and  chat mainly contain the same info. Unless you plan to be able to send replies multi-language, you most certainly do not need the value of message['from']['language_code'] . The same is true for  message['from']['is_bot'] , since we only will process registered IDs anyways. Also, our bot does not work in any multi-user chat, which is why we do not need  message['chat']['type'] either.
      Since the rest of the fields are identical, it does not matter which one we use for our code.

      • first_name and  last_name are obvious, I hope.
      • id is also an integer, informing about the unique user-id of the sender of the message. This can be used to decide if this was received by one of our registered users or not (I anonymized mine here by replacing it with “REMOVED” as a spam prevention measure).

That’s pretty much it! Having this figured out, we can make up our mind on which of these data we need for doing what in our bot.

The good must be put in the dish,
The bad you may eat if you wish.

In our use case, we do not need every single value of each message. Even though you could, you do not even need to store every single message. In the end, deciding what you want to keep and what not is up to you. For several reasons, it makes sense to limit the amount of data to store, generally:

  1. Waste just increases the required amount of storage you need for your database.
  2. The bigger your database, the bigger are your backups, too.
  3. A bigger database also tends to become slower than slim ones.
  4. In case of a data leak, the fewer details you stored, the fewer confessions you need to make to have your customer’s private details exposed.
  5. so many more …

If the previously described pattern is not found, one could argue to not store that message preferably, for formerly mentioned reasons. In the early phase of your project, you should save these anyways, to not risk losing something, if it turns out you forgot to implement a common pattern or your pattern expression doesn’t work as you intended. You can have these be dropped later when your code is recognized mature and stable.

So, let’s have look at the example data-structure and compare that to our initial goal definition:

  • update_id is needed since we need to know if we processed the message we received already or not.
  • message['date'] we also need to tell when a message was received by the Telegram infrastructure; even when that message hits our bot’s webhook at a different time.
  • For obvious reasons, we also need the content of a message from  message['text'] .
  • Finally, we need one of  message['chat']['id'] or  message['from']['id'] to tell who we received that message from. It doesn’t matter which one we choose since these are absolutely redundant and always the same.

The rest of that data structure is not needed for our sake.

So, let’s open up our editor and begin creating a model for this. Let’s start with the easy stuff: Defining fields with a matching data type for the four data sources from a Telegram JSON structure we want to store in our database. To do that, have a look at the characteristics of each and read your way through the list of Django’s field types to find the best match to craft that into our model. The following changes need to be applied to  bot/models.py file.

update_id

The  update_id consists of an integer. The best match in Django’s field typed repertoire is IntegerField. Since no two messages coming from Telegram’s infrastructure, these integers are unique, so we reflect that within our model definition as well:

message[‘text’]

Since we are only interested in text messages, we need a data type, which can store text. Telegram’s messages have a limit of 4096 UTF-8 characters. So, we need something that can store this amount of text.

Crawling through the list of Django’s field types, we will find that there are at least two possible field types for that: CharField and TextField. So – which one to choose?

For the concerns of the database layout, these two are identical. The difference is that they are rendered different by Django: CharField is rendered as a TextInput by default, TextField is rendered as a Textarea. Both field types know the  max_length attribute, but only CharField enforces that at the model- and database level. TextField just doesn’t accept larger inputs in the generated form fields (eg. at the admin backend), but from your code, you can add longer texts to the database, since there’s no restriction on the SQL level.

Let’s use TextField for  message['text']  and define the  max_length=4096 attribute for it since, for all that we know, we do not need to expect larger texts coming from Telegram:

message[‘date’]

To store this datatype, we need to understand in what format this is.  1533248578 ; looks more like an integer than a date, isn’t it? You could also make this of type IntegerField again, but that way you need to convert it, again and again, to be able to work with it in your code. But the worst issue with doing so is that Django offers some methods and possibilities for the different data types, you just could not make use of if you do not choose the best matching data type to it in your model’s definition, reflecting the nature of the data best. You can compare this with what is built into Pythons  str() or  int() – objects:

Technically, you can make  12345 a string and store that object in a variable instead of an integer like this:

That is perfectly valid code and there’s no such thing like a Python-Police, hindering you to do so – but: Does it make sense? In the majority of cases, the answer is “no”, I think. You can’t use it for math, you can’t say if that is greater or smaller than 5 since you can’t use the comparison operators and so on. But: You can do great things like  .upper() to the string  "12345" now – impressive, yes? … not really 😋
That’s the same with Django’s methods; the better the match is for your data type to the “real world”-meaning of your data, the more useful it will be for your code.

Well – the integer-like value of  1533248578 somehow represents a date – how’s that? In fact, it does not only represents a date but a date and time. It’s known as Unix time, POSIX time, or Unix Epoch time / seconds since the epoch and defines a date and time by the number of seconds passed since 00:00:00 UTC on January 1st, 1970; it’s quite popular in the Unix world, actually.
So: A timestamp of  1533248578 refers to  Thu Aug 2 22:22:58 2018 :

Back to our database field type, this means that we need to find a type in Django’s model field type reference which supports dates including time; TimeField? No, no date support. DateField? No support for a time. DateTimeField looks like that’s it! Let’s add that to our model. Since we want to define a default value this time, we also need to add another import for  timezone from  django.utils to it:

message[‘from’][‘id’]

This will be a bit more tricky since we will define this as a so-called “foreign key“. In SQL, this is a field, which uniquely identifies another row in either the same table or another. That’s why each table needs a “primary key“; to have something which allows for a row being identified uniquely since no other row can possibly contain that same criterion. In a customer table, this may be the customer number, for example. It is also possible to declare multiple columns as being a primary key since they are only unique together, for example with a song, which is possibly interpreted by different artists only the combination of “artist” + “song name” are unique.

Back to our application: In the previous part of this article series (Part 8), we already created a  User model to hold the Telegram user-id and their names. Since we most certainly want to associate the messages we receive with the names of our registered users, let’s associate the user id with each message’s  message['from']['id'] field with the one from our  User model to extract the user details like first- and last name from that; maybe we even add additional details to that table in the future, like an E-Mail- or postal address which then can be associated with each message of the same user-id. Adding a foreign key to our  Message model is done using the ForeignKey field type like this:

Also, we need to define what to do with each message if the user they are associated with is deleted from the table of our  User model. We have a few choices here, predefined by how SQL works and explained in the “Arguments” section of the ForeignKey field type documentation. Let’s define that all messages which are associated with the deleted user record from the  User model are deleted as well by changing the field definition to this:

Polishing and admin backend registration

Like we did with the  User model already, let’s change the way the model’s elements are represented in the admin backend and register the model to the admin backend.

First, change the representation of our model to display our text by overwriting the definition for  __str__ like this:

This is our final form of that model definition so far, satisfying all our current needs.

Next, open the file  bot/admin.py and make the following changes:

  1. Add  Message to the list of imports from  .models like this:  from .models import User, Message
  2. Add the line  admin.site.register(Message) to the file.

Now, save everything and let’s head for applying the migrations and double-check the results before with our final step for this article part we will do our deployment to production.

Creating and applying the migrations

Open up a shell, navigate and optionally active your virtualenv for your project and execute  python manage.py makemigrations :

Checking Git what this has caused shows that one migration file was created as desired, apart from which files we have changed manually before:

Let’s fire up  migrate to have the migrations applied to our database:

Our table fits into the SQL architecture like this:

Validating the results in the admin backend, you now should see that “Messages” shows in the “BOT” section and adding a record using the admin backend presents nice rendered fields to manipulate our data in a convenient way:

The date and time can be picked using convenient selectors, the up to 4096 characters long text is rendered as a Textarea instead a single-height input field and so on. What’s also pretty interesting: We can make selections for associating users we registered manually already – that might give you a nice idea that what we just tried to achieve does work like a charm! 👍

Deploy the changes to production

Using Heroku for our hosting, we have quite a few convenient tools at hands for our hosting. One of them is the ability to connect to our remote production database without needing to add any credentials or recognize names. While in your project folder, simply connect to your production database like this:

As we can see, there are no tables in our production database yet; fair, since we never applied any migrations to it. Let’s do this now for the first time.
Let’s first deploy our latest files by committing the changes to Git and push the commits to our Heroku remote, triggering a new deployment:

That’s it! Our files are deployed, but still, that does not apply any migration to our database. Let’s do that as a final step for this part of the series.

Applying migrations to our production environment

Basically, you can execute any command you wish using the way I’m about to show you now; this time, we will apply our migrations to the production database. We will do that using a so-called “One-Off Dyno” with our Heroku account.
Unlike the regular dynos which are part of the dyno formation defined by the Procfile to operate our app’s regular business, One-Off Dynos do only have a limited life-span to execute specific administrative or maintenance tasks and are deleted upon logout again, leaving behind nothing but the permanent changes to the database or similar resources.

You are creating and logging in to them in one easy step:

Awesome, isn’t it? We created and logged into a remote server-like environment, which only lives to fulfill a small task and after that will be deleted again with nothing left of it, without the need to care for login-security or doing anything manually. If we would logout of that now (Ctrl+d or typing “exit”, Enter), nothing would have changed; we only issued read operations. If your app would be operating in production already, this additional system would not affect it in any way. It would neither slow down your app by consuming resources like I/O or CPU, nor being unavailable for a few seconds or something for any reasons. If you would manipulate the database or destroy it, that would affect the production site for obvious reasons.
For now, imagine you just had the same database configured on a second instance of your code, just like you had these credentials entered into the local copy on your workstation.

Let’s initiate the database for Django, apply all outstanding migrations and create a superuser for it:

Let’s see if that did anything to our database. Logout from that One-Off Dyno and reconnect to the remote PostgreSQL database 🐘:

BAAM; there they are: Our precious tables! 👊
This out of the way, nothing should prevent us from login into the admin backend using the credentials of the superuser we just created. As a shortcut: Execute  heroku apps:open . That should spawn a browser and navigate it directly to your application’s site without the need to recognize anything.

Outlook for the next part of the series

You just learned how to craft models tailored for your use-case and how to deploy and apply these to your Heroku remote production environment. Also, you saw a few more handy heroku command-line client commands and learned how to make use of them for your development.

In the next article of this series, we will create a view which receives, filters and stores that messages in the database and a URLconf to expose that view, serving as an interface to provide to your Telegram bot as it’s webhook.
That means, that by the end of that part, our Telegram bot will be ready to be launched in production and already receive any amount of messages you and trustworthy users will send to it. 🤯

If you liked or disliked this article, I’d love to read that in the comments!

🐍 Enjoy coding! ❤

Series Navigation << Go back to previous part of this series (Part 8)

Born in 1982, Marc Richter is an IT enthusiastic since 1994. He became addicted when he first put hands on their family’s pc and never stopped investigating and exploring new things since then.
He is married to Jennifer Richter and proud father of two wonderful children, Lotta and Linus.
His current professional focus is DevOps and Python development.

An exhaustive bio can be found at this blog post.

Found my articles useful? Maybe you would like to support my efforts and give me a tip then?