Yifu's Blog

记一个Kafka Producer丢消息问题

Posted on 2020-03-31 Edited on 2020-04-03 In Programming

背景

通过Spark SQL从Hive读取大约1M行的数据，一次性写入Kafka时，大概会丢失20%的数据。

producer的设置只设置了acks=all，使用的是异步的send()。

Kafka Cluster有4个brokers。

排查

首先，注意到producer在程序结束时没有调用flush()，加上flush()之后，问题依旧。

从Kafka The Definitive Guide中了解到，设置retries参数可以令producer自动重试retriable errors，设置retries=100，问题依旧。

重试也不能解决这个问题，那么将异常打印出来试试，在send()中传入一个失败时打印异常的Callback，顺便记录下失败次数：

客户端删除DOM节点的几种方式

Posted on 2020-01-31 In Programming

B站首页有一些板块实在是辣眼睛，所以得想办法处理一下。

准备工作

我常用的浏览器是Chrome，所以就用Tampermonkey来加载脚本。

原理也很简单，等页面加载完成后，我们可以用Tampermonkey再跑一段自定义的JavaScript，用来操作DOM树。

然后通过Chrome的审查元素，得知，需要去除的元素id分别为bili_report_live和reportFirst2。

ForkJoinPool简介

Posted on 2019-12-30 Edited on 2019-12-31 In Programming

TL;DR

ForkJoinPool是实现了work stealing的线程池，其中所有线程都是daemon thread。

几个栗子🌰

// Java 7+
CompletableFuture.supplyAsync(() -> 10)
  .thenCombineAsync(CompletableFuture.supplyAsync(() -> 32),
                    Integer::sum)
  .thenAcceptAsync(x -> System.out.println("The answer is " + x + "!"));
// The answer is 42!

// Java 8+
LongStream.rangeClosed(1, 1000000)
  .parallel()
  .sum()
// 500000500000

1
2
3

// Scala
(1 to 1000000).toVector.par.fold(0)(_ + _)
// 500000500000

三个例子分别是：

使用CompletableFuture组合异步计算：其中每个以Async结尾的方法，都再次将任务提交给了线程池，大概率会在另外一个线程中执行；
使用.parallel()并行处理Stream：.sum()对并行Stream有优化，可以提升效率；
使用.par将Vector变成并行的进行fold：Monoid满足结合律，所以可以并行fold。

承载这些异步、并行计算的线程池，默认会使用一个JVM为我们生成的ForkJoinPool，可以用ForkJoinPool.commonPool()得到实例。

除此之外，Scala中的scala.concurrent.Future一般会使用到scala.concurrent.ExecutionContext.Implicits.global，而后者就是包装了这个common pool。

Monad in FP, Part I

Posted on 2019-10-31 In Programming

Monad ---- “the m-word”，“a monoid in the category of endofunctors”.
see A monad is just a monoid in the category of endofunctors, what’s the problem? for fun

在FP的路上，不可避免地会碰到monad这个拦路虎。绕是绕不过去的，那就学咯。

“我看了几十篇关于monad的文章，还是没懂。” – 某不知名FP爱好者

这篇文章也不是silver bullet，我只希望读者在读过以后，对monad能有个大致的、模糊的印象，今后能够持续地从多个角度去审视这个概念，加深认识。

什么是monad

我们在说functor的时候，有一个不那么准确的定义：有map的，就是functor。

当我们这样定义的时候，我们其实是在泛化map，将它们的共性抽象出来，这个抽象就是functor。

同样地，flatMap也出现在很多地方（比如List，Option，Future，Either等等等），我们自然也想把共性抽象出来，这个抽象，就是monad。所以可以类似地说：有flatMap的，就是monad。

好吧，如果要正式一点，引用下scala with cats里的定义：monad是一种将计算按顺序排列起来的机制（a mechanism for sequencing computations）。

Functor in Functional Programming

Posted on 2019-09-30 Edited on 2019-10-31 In Programming

什么是Functor？

先看一段代码：

// js
Array(1,2,3)
  .map(i => i ** 2)
  .map(i => i ** i)
// [1, 16, 81]

// scala
Right(3).map(n => n + " cats!")
// res2: Either[Nothing, String] = Right("3 cats!")

// java
Optional.empty().map(n -> "hi, " + n);
// Optional.empty

这些类型都有map，而且看上去map的作用好像都相同。

事实上，它们确实相同。

不那么准确地说，任何东西只要有map，我们就可以将它视作functor。

要研究functor，我们需要转变一下（命令式）思路。

可以将functor想象成一个容器，容器里放了一些元素。

map并不是对容器进行一次遍历（traverse），而是对容器内的元素做一个变换（transform）。

如果有多个map被串起来了，则会按照先后顺序，进行变换。（这里顺序是很重要的，下面会细说）

浅谈covariance和contravariance

Posted on 2019-08-30 Edited on 2019-09-01 In Programming

一些定义

考虑类型A和C，以及泛型构造器F<T>：

如果我们在一个需要A的地方，总能使用C，我们就可以说C是A的子类型（subtype）^[1]，记作C <: A。
如果F是协变的（covariant），且C <: A，则有F<C> <: F<A>。
如果F是逆变的（contravariant），且C <: A，则有F<A> <: F<C>。
如果F是不变的（invariant），无论A与C是什么关系，F<C> 与 F<A>都没有关系。

Collection视角

先举用例子来看看Collection的型变规则。

Duck是Bird的子类，Bird是Animal的子类，记作Duck <: Bird <: Animal。

TypeScript中的Array是协变的（covariant）：

interface Animal { name: String }
interface Bird extends Animal {
    // not all animals can fly
    fly: () => String 
}
interface Duck extends Bird {
    // not all birds can swim
    swim: () => String 
}
const aDog: Animal = {
    name: ""
}
const aBird: Bird = {
    name: "Owl",
    fly: () => name + " is flying!"
}
const aDuck: Duck = {
    name: "Donald", 
    fly: () => name + " is flying!",
    swim: () => name + " is swiming!"
}

// Legal; Array is covariant, expecting an Array<Animal>, given an Array<Bird> 
const animals: Array<Animal> = [aDuck, aBird]; 
// Illegal; expecting an Array<Bird>, given an Array<Animal>
const birds: Array<Bird> = [aDuck, aDog];

Monoid in Functional Programming

Posted on 2019-07-28 Edited on 2020-01-04 In Programming

Monoid 是什么？

我们先看两组例子：

// string concatenation
concat("foo","bar") == "foobar"
concat("", "latte") == concat("latte", "") == "latte"
concat("a", concat("b", "c")) == concat(concat("a", "b"), "c") == "abc"

// integer addition
3 + 5 == 8
0 + 42 == 42 + 0 == 42
1 + (2 + 3) == (1 + 2) + 3 == 6

我们可以发现，这两组操作其实有着相同的模式：

有一个“零值”，记作zero，例子中分别是空串和0
有一个二元操作符，记作op，例子中分别是concat和+
op满足结合律（associativity），即op(x, op(y, z)) == op(op(x, y), z)
零值是单位元，即op(zero, x) == op(x, zero) == x

Yet Another a Introduction to Y Combinator in Scheme

Posted on 2017-05-02 Edited on 2019-07-28 In Programming

前言

在Lambda演算中，函数是没有名字的（都是匿名函数），那么如果函数没有名字，也就无法在函数体内显式地调用自身，也就无法定义递归函数，Y combinator就是用来解决这个问题的。
这篇文章想抛开那些数学概念，用程序语言（Scheme）的形式来讲解我们是如何推导出Y combinator的。

运行环境：
IDE：DrRacket

Programming in Haskell Chapter10 Exercises Solutions

Posted on 2016-09-13 Edited on 2016-09-14 In Haskell

Programming in Haskell是一本入门Haskell的好书，介绍页面以及配套的slides, vedios, codes都在这里。

第九章的习题暂时跳过了，先更第十章。
开学了，事情多了起来，还要找工作，加把劲最近把这本书刷完吧！:P

其实从第8章开始，这本书对于monad就讲的太少，过几天这本书要出第二版，希望能在这方面改进改进。。。
我下单了一本《Haskell趣学指南》，打算结合起来看，然后再补上跳过的习题。

定义函数 mult :: Nat -> Nat -> Nat

data Nat = Zero | Succ Nat
nat2int :: Nat -> Int
nat2int Zero = 0
nat2int (Succ n) = 1 + nat2int n

int2nat :: Int -> Nat
int2nat 0 = Zero
int2nat n = Succ (int2nat (n - 1))

add :: Nat -> Nat -> Nat
add Zero n = n
add (Succ m) n = Succ (add m n)

mult :: Nat -> Nat -> Nat
mult Zero _ = Zero
mult (Succ m) n = add (mult m n) n

-- *Main> nat2int(mult (int2nat 2) (int2nat 3))
-- 6
-- *Main> nat2int(mult (int2nat 0) (int2nat 3))
-- 0
-- *Main> nat2int(mult (int2nat 1) (int2nat 3))
-- 3
-- *Main> nat2int(mult (int2nat 10) (int2nat 13))
-- 130

重新定义`occurs :: Int -> Tree -> Bool`

需要使用标准库data Ordering = LT | EQ | GT, 以及
compare :: Ord => a -> a -> Ordering。

data Tree = Leaf Int | Node Tree Int Tree
tr :: Tree
tr = Node (Node (Leaf 1) 3 (Leaf 4)) 5 (Node (Leaf 6) 7 (Leaf 9))
occurs :: Int -> Tree -> Bool
occurs m (Leaf n) = m == n
occurs m (Node l n r) = case compare m n of
                            LT -> occurs m l
                            EQ -> True
                            GT -> occurs m r

Programming in Haskell Chapter8 Exercises Solutions

Posted on 2016-09-08 Edited on 2016-09-14 In Haskell

Programming in Haskell是一本入门Haskell的好书，介绍页面以及配套的slides, vedios, codes都在这里。

这章信息量简直爆炸，书上给的东西太少了，讲的又太多了。找了一堆资料看了好久才弄明白。

结合的参考资料如下：
Chapter8讲课视频
 Monadic Parsing in Haskell
“Programming In Haskell” error in sat function

按照书上的Parser定义，是没法使用do notation的，所以下面的习题全部用>>=完成。

完整的代码我放在这里。

`int :: Parser Int`

int :: Parser Int
int = natural +++
      (symbol "-" >>= \_ ->
       natural >>= \x ->
       return (-x))

-- *Main> parse int "123da"
-- [(123,"da")]
-- *Main> parse int "sdada"
-- []
-- *Main> parse int "-213sdada"
-- [(-213,"sdada")]

`comment :: Parser ()`

comment :: Parser ()
comment = symbol "--" >>= \_ ->
          many (sat (/= '\n')) >>= \_ ->
          many (char '\n') >>= \_ ->
          return ()  -- why return ()?
-- *Main> parse comment "foo"
-- []
-- *Main> parse comment "--foo"
-- [((),"")]
-- *Main> parse comment "--foo\nbar"
-- [((),"bar")]

背景

排查

准备工作

TL;DR

几个栗子🌰

什么是monad

什么是Functor？

一些定义

Collection视角

Monoid 是什么？

前言

定义函数 mult :: Nat -> Nat -> Nat

重新定义occurs :: Int -> Tree -> Bool

int :: Parser Int

comment :: Parser ()

重新定义`occurs :: Int -> Tree -> Bool`

`int :: Parser Int`

`comment :: Parser ()`