Monday, September 14, 2015

Setting up nginx + HHVM (hacklang) on OpenShift

I have recently moved my personal website from Google App Engine to OpenShift. I was looking at the list backend technologies supported by OpenShift and HHVM piqued my interest. I remembered how Facebook introduced a new language a few years ago and I never tried it out. So I decided to give it a try now. Fortunately, there is an OpenShift quickstart available for HHVM and we don't need to do any complex configuration to set it up.

First of all, you need to sign up for OpenShift (it's free!). Now, that you have an OpenShift account, go ahead and create a new application. You will be shown a list of popular cartridges and quickstarts available. Select the HHVM quickstart. The next page will ask more details about your application like the sub-domain name, scaling options etc. Fill in the details as you wish and create the application.

Now that your application is ready, you need a way to make changes to it. For that, you need to install rhc (OpenShift client tools) and manage your source code with git. Please refer to OpenShift's help section for a detailed explanation.

At this point, I faced a problem. I added some Hack code and pushed to the server. To my surprise, it didn't work as expected. In fact, the page displayed nothing! On the contrary, PHP code worked fine. So, I logged into the application gear and examined the HHVM log file. The following error was being thrown:
Hack typechecking failed: typechecker command not found
For some reason, the typechecker command hh_client is unavailable in the current path. I wasn't able to figure out why this was so but I figured out a workaround for this problem. We can fix it by disabling automatic typechecking. Add the following line to config/hhvm.d/config.ini.erb in your application code repository:
hhvm.hack.lang.auto_typecheck = false
This will disable automatic typechecking which means that before pushing your code to the server, it is advisable to run typechecking on your local machine.

Sunday, February 22, 2015

Introduction to tries through Facebook Hacker Cup 2015 Round 1 - Autocomplete

In the last week's post I discussed the solution to the problem "Homework" from Facebook Hacker Cup 2015 Round 1. This post is about the problem "Autocomplete" from the same round.

You can  read the problem statement available on the Facebook website. If you don't have a Facebook account, you can download problem statements from Codeforces Gym.

I will re-state the problem in fewer lines. You have a new phone which has an auto-complete feature. But, it will complete your word only if there is there is no ambiguity on how the word should be completed. Formally, it will complete a word iff there is exactly one way to complete the word. It so happens that the phone's dictionary is empty by default. So, for the auto-complete feature to work, before typing a word you need to add that word to the dictionary. You need to send a message containing \(N\) words. What is the minimum number of characters must you type to send all \(N\) words? (Note that the typing involved in adding a word to the dictionary is not counted.) You need to solve \(T\) instances of this problem.

Here are the constraints:
  • \(1 \le T \le 100\)
  • \(1 \le N \le 100,000\)

The solution idea is pretty straightforward. For any word \(w\), the minimum characters that have to be typed is equal to length of the shortest prefix \(p\) of \(w\) such that there is no word in the dictionary that has a prefix \(p\). It turns out that there is a very elegant solution using tries.

What is a trie?

A trie is a data structure used to store string keys for fast look-up. There are many resources online that have explained tries better than I can. So, I am just gonna give an brief overview here and point you to useful online resources. Later, I will how tries can be implemented easily in a programming contest environment.

Typically, a trie looks like this.

Notice that every node contains 26 pointers corresponding to each letter in the English alphabet. The topmost node in the above figure is called the root node. We see that only the pointers corresponding to M, P and T are depicted. We can assume the other pointers to be NULL i.e. there are no words starting with characters other than M, P and T. When we follow the M pointer we reach another node which has pointers only for A and E. This means that there are words in the trie with prefixes MA and ME. If we continue following the E branch we reach a node with a Δ symbol. The Δ symbol signifies that a word ends there. This means that the word \(\text{MENDEL}\) is present in the trie. This node also has a pointer corresponding to E. This is not the beginning of a new word. It is just a continuation of \(\text{MENDEL}\). Following the pointers leads us to the word \(\text{MENDELEEV}\). Similarly, the words \(\text{MAXWELL}\), \(\text{PASTEUR}\), \(\text{PAVLOV}\), \(\text{PEANO}\), \(\text{POINCARE}\), \(\text{POISSON}\) and \(\text{TURING}\) are present in the trie.

To know more about tries, follow these links:

If you have visited above links you would have noticed that tries have complicated and messy implementation involving dynamic memory allocation. This is unsuitable for programming contests. Here is a more contest-friendly trie implementation.
int a[N][26];
char s[N];
int n = 1;

scanf("%s", s);
int t = 1;
for (int i = 0; s[i]; ++i) {
  int p = s[i] – 'a'
  if (a[t][p] == 0) {
    n++;
    a[t][p] = n;
  }
  t = a[t][p];
}
The above shows how to implement the insert operation in a simple way. All we are doing here is simulating dynamic memory allocation in place of actually using it. In the above code a[][] denotes the trie and n denotes the index of the next available node. If you have read the GeeksforGeeks article it should not be very difficult to understand how the above code works. Can you figure out how to write code for the look-up operation? Here is my implementation:
bool isPresent(char *s) {
  int t = 1;
  for (int i = 0; s[i]; ++i) {
    int p = s[i] - 'a';
    if (a[t][p] == 0) return false;
    else t = a[t][p];
  }
  return true;
}
Know that we know how tries work and how to implement them, let's solve the original problem. As I pointed out earlier, we need to find, for every word, how much of the word has to be typed such that the rest of the word can be uniquely identified. This is the same as finding the shortest prefix of the word that is not present in the trie. Let's follow this algorithm:
for every word w do:
  add it to the trie in the following way:
    during adding count the length of the longest prefix that is already present
    call this length L
  add L + 1 to the answer
repeat
This algorithm leads to a simple implementation. You don't even need to write a separate function for look-up. Here is my implementation:
const int N = 1000050;

int a[N][26];
char s[N];

int main() {
  int t;
  scanf("%d", &t);
  long long ans;
  for (int tt = 1; tt <= t; ++tt) {
    printf("Case #%d: ", tt);
    int nn;
    scanf("%d", &nn);
    ans = 0;
    int n = 1;
    for (int i = 0; i < nn; ++i) {
      scanf("%s", s);
      int t = 1, ct = 0;
      bool end = false;
      for (int j = 0; s[j]; ++j) {
        int p = s[j] - 'a';
        if (!end) ct++;
        if (a[t][p] == 0) {
          n++;
          a[t][p] = n;
          end = true;
        }
        t = a[t][p];
      }
      ans += ct;
    }
    printf("%lld\n", ans);
    memset(a, 0, sizeof(a));
  }
  return 0;
}
You can test your solution by submitting it in the Codeforces Gym or by downloading the official input and output files. Check the Hacker Cup Facebook page for links.

Thanks for reading!

Saturday, February 14, 2015

Problem discussion: Facebook Hacker Cup 2015 Round 1 - Homework

It's been a long time since I last wrote on this blog and a lot has happened since then. The most important of all is that my personal website is up at anirudhrb.com and consequently this blog can now be accessed at blog.anirudhrb.com. Also, you can now reach through email at mail@anirudhrb.com.

Let's get down to business now. The online part of Facebook Hacker Cup 2015 concluded recently and the top 25 contestants have been selected to compete in the onsite finals. The problem I am writing about in this post appeared in the Round 1 (there were 3 rounds).

The title of the problem is "Homework" and here is the abridged statement:
Given three integers \(A\), \(B\) and \(K\), find the number of integers in range \(\big[A, B\big] \) that have a "primacity" of \(K\). Primacity of a number \(N\) is defined as the number of distinct prime factors of \(N\). You have to process \(T\) such queries.
The constraints are as follows:
  • \(2 \le A \le B \le 10^{7}\)
  • \(1 \le K \le 10^{9}\)
  • \(1 \le T \le 100\)
The first thing to notice here is that the number of prime factors of \(10^{7}\) is not very large. In fact, it is at most \(8\). So, the maximum primacity of a number within the given constraints is \(8\). To know more about this read about Primorial on Wikipedia.

We can modify the sieve algorithm to calculate the primacity of each number in the range \( \big[1, 10^{7}\big] \) in the following way: whenever you "cross-out" a number designating it as non-prime, increment the value at that position. This works because for any number \(x\), the primacity is equal to the number of primes less that \(x\) that divide it which is the same the number of times it is crossed out in the sieve algorithm.

Now that we know the primacity every possible number within the constraints, we need to think about how to answer the queries efficiently. Notice that the answer can be expressed as follows
answer = answer for \(\big[1, B\big]\) \(-\) answer for \(\big[1, A - 1\big]\)
We can answer queries in \(\mathcal{O}(1)\) time with some pre-computation. The pre-computation can be done this way
Let c[n][p] denote the number of integers in [1, n] with primacity p then,
c[n][p] = c[n - 1][p] + 1   if primacity of n is p
c[n][p] = c[n - 1][p]       if primacity of n is not p
We do this pre-processing for all possible values of primacity and answer each query in \(\mathcal{O}(1)\) time.

You can test the correctness of your solutions by submitting it in the Codeforces Gym (login required) or you can download the official input and output files and test your solution against them. Here's my C++ code (shortened). You can find the full code on GitHub.

const int N = 10000001;

bool mark[N];
int p[N], ct[N][8];

int main() {
  mark[1] = false;
  for (int i = 2; i < N; ++i) {
    if (mark[i]) continue;
    for (int j = 1; i * j < N; ++j) {
      mark[i * j] = true;
      p[i * j]++;
    }
  }
  for (int i = 2; i < N; ++i) {
    for (int j = 0; j < 8; ++j)
      ct[i][j] = ct[i - 1][j];
    ct[i][p[i] - 1]++;
  }
  int t, a, b, k;
  scanf("%d", &t);
  for (int tt = 1; tt <= t; ++tt) {
    scanf("%d %d %d", &a, &b, &k);
    printf("Case #%d: ", tt);
    if (k > 8) printf("0\n");
    else {
      int ans = ct[b][k - 1] - ct[a - 1][k - 1];
      printf("%d\n", ans);
    }
  }
  return 0;
}

I will be back next week with the next problem "Autocomplete" from the same round.

Thanks for reading!

Saturday, December 13, 2014

Problem discussion: ShufflingCardsDiv2 - Topcoder SRM 641 900pt

Topcoder SRM 641 happened a few days ago and featured an unusually easy problem set for division 2. Before continuing reading the post, please read the problem statement (login required).

Now that you have read the problem statement, let me state the most important fact in the problem statement

The cards are shuffled exactly twice.

This is very important because it makes the problem much easier. Since there are only two shuffles, there are a limited number of steps to be followed.

Let’s analyze the shuffling procedure closely. In the beginning we have \(2N\) cards numbered from \(1\) to \(2N\). These cards are divided into two piles: one containing the first half cards \(1 \cdots N\) and the other containing the remaining cards. Let’s denote the cards which have number less than or equal to \(N\) as \(A\) and the others as \(B\). Now, the two piles will look like this

\(\text{A A A A A …}\)
\(\text{B B B B B …}\)

After performing the interleaving, we will get a deck that looks like

\(\text{A B A B A B A B A B . . .}\)

Now, we need to shuffle this deck again. Let’s assume that \(N\) is even. (We will cover the odd case later.) Divide the deck into two piles.

\(\text{A B A B A B …}\)
\(\text{A B A B A B …}\)

The result after interleaving the above piles is given to us as input. Note that the two piles shown above are not exact i.e for any of the above piles we can take any permutation of the cards for interleaving. But, one condition still holds:

In any pile, the number of \(A\)s is equal to the number of \(B\)s.

After interleaving, all cards in the top pile will appear at even indices (assuming 0-based indexing). So, we need to check all even indices of the given array/vector and count the number of \(A\)s and \(B\)s. If they are equal, then return "Possible" otherwise return "Impossible".

As I have said before, this solution is valid only if \(N\) is even. When \(N\) is odd the number of \(A\)s and \(B\)s need not be same. In fact, the number of \(A\)s will be 1 greater than the number of \(B\)s.

Here’s the final code in C++:

string shuffle (vector<int> permutation) {
int countA, countB;
int n = permutation.size();
for (int i = 0; i < n; i += 2) {
if (permuation[i] <= (n / 2))
++countA;
else
++countB;
}

if ((n / 2) % 2) {
if (countA - 1 == countB)
return "Possible";
else
return "Impossible";
} else {
if (countA == countB)
return "Possible";
else
return "Impossible";
}
}

Wednesday, October 8, 2014

Problem discussion: Timus OJ 1225 Flags

This was one of the interesting problems I solved last week. I was just checking out Timus OJ when I stumbled across it. Here is the abridged problem statement (read the original statement at Timus OJ):
Given \(n\), the number of stripes, find the number of distinct ways the stripes can be colored if the only colors that can be used are white, blue and red. These constraints should be satisfied:
  • No two consecutive stripes should be of the same color
  • A blue stripe must always be between a red and a white one or a white and a red one.
If you want to try to come up with a solution yourself first, now is a good time. I am going to explain my solution in the next paragraph. If you want to submit your solution you need to register at http://acm.timus.ru.

The first thing to notice is that the last stripe cannot be blue since a blue stripe can appear only between a red and a white stripe. So, the solution to the problem is \[\text{number of ways in which red is last + number of ways in which white is last}\] But how do we find those quantities?

If the last stripe is red, the stripe before it has to be either white or blue. Suppose we have a number of arrangements in which the second stripe from the last is white. All these arrangements contribute to the number of arrangements with last stripe red because we can color the last stripe red in all such arrangements. But, this is not the only way we can get arrangements with the last stripe red because we haven’t yet considered the case in which the second to last stripe is blue.

Now, suppose that the second to last stripe is blue. The stripe that comes before this can be either red or white. If that stripe is red then the last stripe cannot be red so we ignore that case. So, all arrangements in which the third stripe from the last is white can have the last stripe red if we add a blue and a red stripe (in that order). Observe that there are no more ways of getting the last stripe as red other than the two discussed (convince yourself of this).

Similar arguments can be made for finding the number of ways in which the last stripe is white. It is clear that we have managed to come up with a recursive solution to the problem. Let’s state the recurrence relation in a bit more formally.
Let \(f(i, c)\) be the number of ways of coloring the stripes \(1,\, 2,\, ...\, ,\,i\) such that the \(i^{th}\) stripe has the color \(c\).
\[ \begin{equation} f(i, c) = \begin{cases} f(i - 1, red) + f(i - 1, white) & \text{if } c = blue, \\ f(i - 1, white) + f(i - 2, white) & \text{if } c = red, \\ f(i - 1, red) + f(i - 2, red) & \text{if } c = white. \end{cases} \end{equation} \] So, the final answer will be \[ \begin{equation} answer = f(n, red) + f(n, white) \end{equation} \] where \(n\) is the total number of stripes.
The next step is to observe that this recursive solution has overlapping sub-problems. If it is not obvious, you can convince yourself of this by drawing a recursion tree. So, we need to use dynamic programming to speed up our solution. We have \(3N\) dp states (why?) and hence the time complexity is \(\mathcal{O}(3N) = \mathcal{O}(N)\).

As always I will leave the implementation to you. But, if you are stuck you can have a look at my solution.

Have a nice day!

Saturday, September 27, 2014

Introduction to Disjoint-set data structure through UVa 10583 – Ubiquitous Religions

Its been quite a while since I last wrote and I feel terrible about it. So, I have decided that, starting from today, I will try to publish at least one post every week. Through these weekly posts, I will discuss the most interesting algorithmic problem I solved during the week. Some of them, like this one, will be accompanied with an introduction to a new data structure or algorithm I had to learn to solve the problem. Now, lets start disjoint-set data structure without further ado.

Lets first analyze the problem and understand why we need disjoint-set data structure. Here is an abridged version of the problem statement (link to original problem statement):
There are n \((1 \leq n \leq 50000 )\) students in a university and for some given m \(( 0 \leq m \leq \frac{n(n - 1)}{2} )\) pairs of students you know that the students in each pair follow the same religion. Find the maximum number of religions that the students in the university believe in assuming that one student can follow exactly one religion.
After reading the statement, we can make the following observations:
  • If a and b follow the same religion and b and c follow the same religion, then a and c follow the same religion
  • All the n students of the university can be divided into different sets such that all students in the same set follow the same religion. Initially (i.e. before scanning the input), each student is in a different set.
  • For a pair \( (a, b) \) in the input we need to find sets \(A\) and \(B\) such that \(a \in A\) and \(b \in B\) and combine them i.e. we have to find \( C = A \cup B\) and replace both \(A\) and \(B\) with \(C\). (Note that since a student cannot follow more than one religion, the sets \(A\) and \(B\) are either equal or disjoint.)
So, we need a need a data structure that can, given \(a\) and \(b\), find sets \(A, B\) such that \(a \in A\) and \(b \in B\) and then find \(C = A \cup B\) efficiently. This is where disjoint-set data structure comes in.
A disjoint-set data structure, also called a union-find set or merge-find set, is a data structure that keeps track of a set of elements partitioned into a number of disjoint sets.
The most efficient implementation is using rooted trees and a couple of heuristic and that’s the only implementation I am going to discuss here although I urge you to explore the linked list implementation and understand why it is slow. Each set in the disjoint-set forest is represented by a tree in which each node points to its parent and the root of each tree is called the representative of the set. A representative of a set is an element of the set that is used as an identifier for the set i.e every set can be identified by its representative. This data structure should be able to perform these operations efficiently:
  • make-set(x) : creates a new set with \(x\) as its only member
  • find(a) : finds the set that contains \(a\) i.e returns the representative of the set that contains \(a\)
  • union(a, b) : merges (union) the sets containing a and b i.e it computes \(A \cup B\) where \(a \in B\) and \(b \in B\). This is achieved by finding the representative of \(a\) and \(b\) and then making one of them the parent of the other.
Assuming that you explored the linked list implementation, you will notice that this is not faster than that in the worst case. To make it faster we need to use two heuristics: union by rank and path compression.

Union by rank In the union() operation above there are no specific instructions on which set should be the parent. It is shown that optimal performance is achieved when the set with the higher rank is made the parent. Rank can be defined as an upper bound on the number of edges between in the longest path between the root and a leaf.

Path compression During the find() operation, we travel from a given node to the root of the tree. During this traversal we can set the parent for each node in the path to be the representative element thereby reducing the height of the tree and making the subsequent find() operations faster. Note that the application of this heuristic does not change the rank because the rank is just an upper bound (which is useful for proving time complexity).

It’s now time to implement this data structure. Here’s the pseudo code:
 Let parent[1 ... n] and rank[1 ... n] be arrays such that
 parent[x] stores the parent of node x and,
 rank[x] stores the rank of node x.

 PROCEDURE make-set(x):
 1.    parent[x] := x
 2.    rank[x] := 0

 PROCEDURE find(x):
 1.    if (parent[x] != x) // path compression
 2.       parent[x] := find(parent[x])
 3.    return parent[x]

 PROCEDURE union(x, y):
 1.    xRoot := find(x) // find the set containing x
 2.    yRoot := find(y) // find the set containing y
 3.    if (xRoot == yRoot)
 4.        return
 5.
 6.    // use union by rank heuristic to merge
 7.    if (rank[xRoot] < rank[yRoot])
 8.        parent[xRoot] := yRoot
 9.    else if rank[xRoot] > rank[yRoot]
10.        parent[yRoot] := xRoot
11.    else
12.        parent[yRoot] := xRoot
13.        rank[xRoot] := rank[xRoot] + 1

It can be proved that the time complexity for union and find operations is \(\mathcal{O}(\alpha(n))\) where \(\alpha(n)\) is the very slowly growing inverse Ackermann function. The value of \(\alpha(n)\) is less than 5 for all practical values of \(n\).

Now that we know how the data structure works and how to implement it, solving the problem is straightforward. But, there’s a problem. The problem requires us to calculate the number of disjoint sets that have formed. To do this we just need to use a variable to keep track of the number of sets. Whenever make-set() is called, the variable has to be incremented by 1 and whenever union() is called the variable is decremented by 1 (Why?). Take it as a challenge to add this feature to your implementation. Here’s the algorithm (without the set counter) to solve the problem:
 PROCEDURE Ubiquitous-Religions():
 1.    // add each student to an individual set
 2.    for i = 1 to n:
 3.        make-set(i) // add set {i} to the forest
 4.
 5.    // process each of the m pairs
 6.    for each pair (a, b):
 7.        union(a, b) // a & b follow the same religion
 8.
 9.    // the answer is the number of disjoint sets
10.    answer := getNumberOfSets()
11.    
12.    return answer

As always, I will leave out the implementation specific details and encourage you to implement the data structure and the above algorithm yourself instead of copying someone’s code off the internet. If you don’t know where or how to start you can have a look at my solution to the problem (in Java). Also, if you want to submit your solution at UVa Online Judge be mindful of the input and output specifications.

You can find more problems related to disjoint sets at UVa Online Judge.

Have a nice day!

Wednesday, April 2, 2014

DotA 2 offline with bots (without Steam)

It is very strange and annoying that to play DotA 2 you have to log in to Steam and be connected to the internet. But this might not always be possible. In my case, I use my college internet and the ports required for Steam are blocked. So, I can’t use Steam (except for when I am at home). And since I can’t use Steam, I can’t download and play DotA 2 the usual way. So, in this post I am going to describe what I did to make DotA 2 start without Steam.

First of all, I asked my friend to give me a copy of his SteamApps folder. After I got it, I copied the folder into my computer and navigated to the DotA 2 directory and started dota.exe. After a few seconds the intro video (that bald guy) showed up followed by the loading screen. Then, suddenly, the loading stopped and a message appeared on the screen: “Steam missing or out of date”.

Now, to play offline with bots, the first thing you need to do is to cause the console to open whenever the game starts up. Once you get the console, you can use some commands to start an offline game. To cause the console to show at game startup you can execute dota.exe with the command line argument -console. You can do this through a command prompt or by creating a shortcut. In this article, I will use the command-line approach. Type this command into the Windows command prompt.

dota.exe -console -novid

The argument -novid is just stop the intro video from appearing. The loading screen will directly appear. Now, you should see the loading screen. After a few seconds, you will get a black screen with nothing but the console displayed. Now that you have the console, type the following commands one after the other.

sv_lan 1
sv_cheats 1
dota_start_ai_game 1
dota_bot_set_difficulty 1
map dota


After the last command the game will load and you will be taken to the all pick screen. Just pick your favorite hero and start playing!

NOTE: This works even if you start Steam in offline mode. But, if you don’t have Steam, please make sure you are not connected to the internet before doing this. For some reason it doesn’t work if you are connected to the internet. Weird things start happening like: your hero not moving, all bots spawning on radiant side etc.